vllm - 💡(How to fix) Fix [Usage]: Failed to run Qwen3 Eagle3 speculate [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#37273Fetched 2026-04-08 00:48:24
View on GitHub
Comments
4
Participants
2
Timeline
8
Reactions
0
Timeline (top)
commented ×4labeled ×1mentioned ×1renamed ×1

Error Message

(EngineCore_DP0 pid=309) (EngineCore_DP0 pid=309) INFO 03-17 07:45:25 [default_loader.py:293] Loading weights took 0.35 seconds (EngineCore_DP0 pid=309) INFO 03-17 07:45:26 [gpu_model_runner.py:4275] Model loading took 17.17 GiB memory and 7.165730 seconds (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] EngineCore failed to start. (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] Traceback (most recent call last): (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return func(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in init (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] super().init( (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in init (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return func(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return func(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return func(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] self.model_runner.profile_run() (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] hidden_states, last_hidden_states = self._dummy_run( (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return func(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] self.drafter.dummy_run( (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return func(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] self.model(**kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] return forward_call(*args, **kwargs) (EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states' (EngineCore_DP0 pid=309) Process EngineCore_DP0: (EngineCore_DP0 pid=309) Traceback (most recent call last): (EngineCore_DP0 pid=309) File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=309) self.run() (EngineCore_DP0 pid=309) File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=309) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1083, in run_engine_core (EngineCore_DP0 pid=309) raise e (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core (EngineCore_DP0 pid=309) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=309) return func(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in init (EngineCore_DP0 pid=309) super().init( (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in init (EngineCore_DP0 pid=309) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches( (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=309) return func(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches (EngineCore_DP0 pid=309) available_gpu_memory = self.model_executor.determine_available_memory() (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory (EngineCore_DP0 pid=309) return self.collective_rpc("determine_available_memory") (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc (EngineCore_DP0 pid=309) result = run_method(self.driver_worker, method, args, kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method (EngineCore_DP0 pid=309) return func(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=309) return func(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory (EngineCore_DP0 pid=309) self.model_runner.profile_run() (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run (EngineCore_DP0 pid=309) hidden_states, last_hidden_states = self._dummy_run( (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=309) return func(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run (EngineCore_DP0 pid=309) self.drafter.dummy_run( (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (EngineCore_DP0 pid=309) return func(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run (EngineCore_DP0 pid=309) self.model(**kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl (EngineCore_DP0 pid=309) return self._call_impl(*args, **kwargs) (EngineCore_DP0 pid=309) File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl (EngineCore_DP0 pid=309) return forward_call(*args, **kwargs) (EngineCore_DP0 pid=309) TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states' [rank0]:[W317 07:45:27.604235298 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator()) (APIServer pid=191) Traceback (most recent call last): (APIServer pid=191) File "/opt/venv/bin/vllm", line 10, in <module> (APIServer pid=191) sys.exit(main()) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=191) args.dispatch_function(args) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=191) uvloop.run(run_server(args)) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/uvloop/init.py", line 69, in run (APIServer pid=191) return loop.run_until_complete(wrapper()) (APIServer pid=191) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/uvloop/init.py", line 48, in wrapper (APIServer pid=191) return await main (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=191) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=191) async with build_async_engine_client( (APIServer pid=191) File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=191) return await anext(self.gen) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=191) async with build_async_engine_client_from_engine_args( (APIServer pid=191) File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in aenter (APIServer pid=191) return await anext(self.gen) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=191) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config (APIServer pid=191) return cls( (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 152, in init (APIServer pid=191) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=191) return func(*args, **kwargs) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client (APIServer pid=191) return AsyncMPClient(*client_args) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=191) return func(*args, **kwargs) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 842, in init (APIServer pid=191) super().init( (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 496, in init (APIServer pid=191) with launch_core_engines(vllm_config, executor_class, log_stats) as ( (APIServer pid=191) File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 142, in exit (APIServer pid=191) next(self.gen) (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines (APIServer pid=191) wait_for_engine_startup( (APIServer pid=191) File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup (APIServer pid=191) raise RuntimeError( (APIServer pid=191) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Root Cause

The error is

(EngineCore_DP0 pid=309) 
(EngineCore_DP0 pid=309) INFO 03-17 07:45:25 [default_loader.py:293] Loading weights took 0.35 seconds
(EngineCore_DP0 pid=309) INFO 03-17 07:45:26 [gpu_model_runner.py:4275] Model loading took 17.17 GiB memory and 7.165730 seconds
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] EngineCore failed to start.
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] Traceback (most recent call last):
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     super().__init__(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model_runner.profile_run()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.drafter.dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model(**kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
(EngineCore_DP0 pid=309) Process EngineCore_DP0:
(EngineCore_DP0 pid=309) Traceback (most recent call last):
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=309)     self.run()
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=309)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1083, in run_engine_core
(EngineCore_DP0 pid=309)     raise e
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309)     super().__init__(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309)     self.model_runner.profile_run()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309)     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309)     self.drafter.dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309)     self.model(**kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
[rank0]:[W317 07:45:27.604235298 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=191) Traceback (most recent call last):
(APIServer pid=191)   File "/opt/venv/bin/vllm", line 10, in <module>
(APIServer pid=191)     sys.exit(main())
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=191)     args.dispatch_function(args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=191)     uvloop.run(run_server(args))
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=191)     return loop.run_until_complete(wrapper())
(APIServer pid=191)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=191)     return await main
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=191)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=191)     async with build_async_engine_client(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=191)     async with build_async_engine_client_from_engine_args(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=191)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config
(APIServer pid=191)     return cls(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 152, in __init__
(APIServer pid=191)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client
(APIServer pid=191)     return AsyncMPClient(*client_args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 842, in __init__
(APIServer pid=191)     super().__init__(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 496, in __init__
(APIServer pid=191)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=191)     next(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines
(APIServer pid=191)     wait_for_engine_startup(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup
(APIServer pid=191)     raise RuntimeError(
(APIServer pid=191) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Fix Action

Fix / Workaround

============================== CPU Info

Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: ARM Model name: Cortex-A78AE Model: 1 Thread(s) per core: 1 Core(s) per cluster: 4 Socket(s): - Cluster(s): 3 Stepping: r0p1 CPU max MHz: 2201.6001 CPU min MHz: 115.2000 BogoMIPS: 62.50 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg L1d cache: 768 KiB (12 instances) L1i cache: 768 KiB (12 instances) L2 cache: 3 MiB (12 instances) L3 cache: 6 MiB (3 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-11 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; __user pointer sanitization Vulnerability Spectre v2: Mitigation; CSV2, but not BHB Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

The error is

(EngineCore_DP0 pid=309) 
(EngineCore_DP0 pid=309) INFO 03-17 07:45:25 [default_loader.py:293] Loading weights took 0.35 seconds
(EngineCore_DP0 pid=309) INFO 03-17 07:45:26 [gpu_model_runner.py:4275] Model loading took 17.17 GiB memory and 7.165730 seconds
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] EngineCore failed to start.
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] Traceback (most recent call last):
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     super().__init__(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model_runner.profile_run()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.drafter.dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model(**kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
(EngineCore_DP0 pid=309) Process EngineCore_DP0:
(EngineCore_DP0 pid=309) Traceback (most recent call last):
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=309)     self.run()
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=309)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1083, in run_engine_core
(EngineCore_DP0 pid=309)     raise e
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309)     super().__init__(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309)     self.model_runner.profile_run()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309)     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309)     self.drafter.dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309)     self.model(**kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
[rank0]:[W317 07:45:27.604235298 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=191) Traceback (most recent call last):
(APIServer pid=191)   File "/opt/venv/bin/vllm", line 10, in <module>
(APIServer pid=191)     sys.exit(main())
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=191)     args.dispatch_function(args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=191)     uvloop.run(run_server(args))
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=191)     return loop.run_until_complete(wrapper())
(APIServer pid=191)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=191)     return await main
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=191)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=191)     async with build_async_engine_client(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=191)     async with build_async_engine_client_from_engine_args(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=191)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config
(APIServer pid=191)     return cls(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 152, in __init__
(APIServer pid=191)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client
(APIServer pid=191)     return AsyncMPClient(*client_args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 842, in __init__
(APIServer pid=191)     super().__init__(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 496, in __init__
(APIServer pid=191)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=191)     next(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines
(APIServer pid=191)     wait_for_engine_startup(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup
(APIServer pid=191)     raise RuntimeError(
(APIServer pid=191) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (aarch64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : 14.0.0-1ubuntu1.1
CMake version                : version 3.31.10
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0
Is debug build               : False
CUDA used to build PyTorch   : 12.6
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.10.19 (main, Feb 12 2026, 00:42:24) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-5.15.148-tegra-aarch64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.6.85
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : GPU 0: Orin (nvgpu)
Nvidia driver version        : 540.4.0
cuDNN version                : Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.3.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                       aarch64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
CPU(s):                             12
On-line CPU(s) list:                0-11
Vendor ID:                          ARM
Model name:                         Cortex-A78AE
Model:                              1
Thread(s) per core:                 1
Core(s) per cluster:                4
Socket(s):                          -
Cluster(s):                         3
Stepping:                           r0p1
CPU max MHz:                        2201.6001
CPU min MHz:                        115.2000
BogoMIPS:                           62.50
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache:                          768 KiB (12 instances)
L1i cache:                          768 KiB (12 instances)
L2 cache:                           3 MiB (12 instances)
L3 cache:                           6 MiB (3 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; CSV2, but not BHB
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.4 (/opt/venv/lib/python3.10/site-packages)
[pip3] numpy==2.2.6 (/opt/venv/lib/python3.10/site-packages)
[pip3] nvidia-cudnn-frontend==1.14.1 (/opt/venv/lib/python3.10/site-packages)
[pip3] nvidia-cutlass==3.9.2 (/opt/venv/lib/python3.10/site-packages)
[pip3] nvidia-ml-py==13.590.48 (/opt/venv/lib/python3.10/site-packages)
[pip3] onnx==1.20.1 (/opt/venv/lib/python3.10/site-packages)
[pip3] pyzmq==27.1.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torch==2.10.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torch_memory_saver==0.0.9 (/opt/venv/lib/python3.10/site-packages)
[pip3] torchaudio==2.10.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torchcodec==0.10.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torchvision==0.25.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] transformers==4.57.3 (/opt/venv/lib/python3.10/site-packages)
[pip3] triton==3.5.1 (/opt/venv/lib/python3.10/site-packages)
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.16.0rc2.dev479+g15d76f74e.d20260226 (git sha: 15d76f74e, date: 20260226)
vLLM Build Flags:
  CUDA Archs: 8.7; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=void
CUDNN_LIB_INCLUDE_PATH=/usr/include
PYTORCH_FORCE_BUILD=
TORCH_NVCC_FLAGS=-Xfatbin -compress-all -compress-mode=balance
TORCH_CUDA_ARCH_LIST=8.7
NVIDIA_DRIVER_CAPABILITIES=all
TORCH_NCCL_USE_COMM_NONBLOCKING=0
CUDA_BIN_PATH=/usr/local/cuda/bin
CUDAARCHS=87
CUDA_INSTALLED_VERSION=126
CUDACXX=/usr/local/cuda/bin/nvcc
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
MAX_JOBS=
CUDA_NVCC_EXECUTABLE=/usr/local/cuda/bin/nvcc
CUDNN_LIB_PATH=/usr/lib/aarch64-linux-gnu
LD_LIBRARY_PATH=/usr/local/cuda/compat:/usr/local/cuda/lib64:
TORCH_HOME=/data/models/torch
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
CUDA_MODULE_LOADING=LAZY
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1
CUDA_ARCHITECTURES=87
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root

---

sudo docker run -it --rm --network host --runtime=nvidia \
    -v /mnt/data/cache/models:/root/.cache/huggingface \
    ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
    vllm serve /root/.cache/huggingface/Qwen3-8B \
    --speculative-config '{"model":"/root/.cache/huggingface/Qwen3-8B-eagle", "num_speculative_tokens": 5}' \
    --gpu-memory-utilization 0.7 \
    --max-model-len 4096 \
    --enforce-eager

---

{
  "architectures": [
    "LlamaForCausalLMEagle3"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 12288,
  "max_position_embeddings": 40960,
  "max_window_layers": 36,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 1,
  "num_key_value_heads":8 ,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936,
  "draft_vocab_size": 32000
}

---

(EngineCore_DP0 pid=309) 
(EngineCore_DP0 pid=309) INFO 03-17 07:45:25 [default_loader.py:293] Loading weights took 0.35 seconds
(EngineCore_DP0 pid=309) INFO 03-17 07:45:26 [gpu_model_runner.py:4275] Model loading took 17.17 GiB memory and 7.165730 seconds
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] EngineCore failed to start.
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] Traceback (most recent call last):
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     super().__init__(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model_runner.profile_run()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.drafter.dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model(**kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
(EngineCore_DP0 pid=309) Process EngineCore_DP0:
(EngineCore_DP0 pid=309) Traceback (most recent call last):
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=309)     self.run()
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=309)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1083, in run_engine_core
(EngineCore_DP0 pid=309)     raise e
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309)     super().__init__(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309)     self.model_runner.profile_run()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309)     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309)     self.drafter.dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309)     self.model(**kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
[rank0]:[W317 07:45:27.604235298 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=191) Traceback (most recent call last):
(APIServer pid=191)   File "/opt/venv/bin/vllm", line 10, in <module>
(APIServer pid=191)     sys.exit(main())
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=191)     args.dispatch_function(args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=191)     uvloop.run(run_server(args))
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=191)     return loop.run_until_complete(wrapper())
(APIServer pid=191)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=191)     return await main
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=191)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=191)     async with build_async_engine_client(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=191)     async with build_async_engine_client_from_engine_args(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=191)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config
(APIServer pid=191)     return cls(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 152, in __init__
(APIServer pid=191)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client
(APIServer pid=191)     return AsyncMPClient(*client_args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 842, in __init__
(APIServer pid=191)     super().__init__(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 496, in __init__
(APIServer pid=191)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=191)     next(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines
(APIServer pid=191)     wait_for_engine_startup(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup
(APIServer pid=191)     raise RuntimeError(
(APIServer pid=191) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
RAW_BUFFERClick to expand / collapse

Your current environment

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 22.04.5 LTS (aarch64)
GCC version                  : (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0
Clang version                : 14.0.0-1ubuntu1.1
CMake version                : version 3.31.10
Libc version                 : glibc-2.35

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0
Is debug build               : False
CUDA used to build PyTorch   : 12.6
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.10.19 (main, Feb 12 2026, 00:42:24) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-5.15.148-tegra-aarch64-with-glibc2.35

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.6.85
CUDA_MODULE_LOADING set to   : LAZY
GPU models and configuration : GPU 0: Orin (nvgpu)
Nvidia driver version        : 540.4.0
cuDNN version                : Probably one of the following:
/usr/lib/aarch64-linux-gnu/libcudnn.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_adv.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_cnn.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_precompiled.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_engines_runtime_compiled.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_graph.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_heuristic.so.9.3.0
/usr/lib/aarch64-linux-gnu/libcudnn_ops.so.9.3.0
HIP runtime version          : N/A
MIOpen runtime version       : N/A
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                       aarch64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
CPU(s):                             12
On-line CPU(s) list:                0-11
Vendor ID:                          ARM
Model name:                         Cortex-A78AE
Model:                              1
Thread(s) per core:                 1
Core(s) per cluster:                4
Socket(s):                          -
Cluster(s):                         3
Stepping:                           r0p1
CPU max MHz:                        2201.6001
CPU min MHz:                        115.2000
BogoMIPS:                           62.50
Flags:                              fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp uscat ilrcpc flagm paca pacg
L1d cache:                          768 KiB (12 instances)
L1i cache:                          768 KiB (12 instances)
L2 cache:                           3 MiB (12 instances)
L3 cache:                           6 MiB (3 instances)
NUMA node(s):                       1
NUMA node0 CPU(s):                  0-11
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass:    Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:           Mitigation; __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; CSV2, but not BHB
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.4 (/opt/venv/lib/python3.10/site-packages)
[pip3] numpy==2.2.6 (/opt/venv/lib/python3.10/site-packages)
[pip3] nvidia-cudnn-frontend==1.14.1 (/opt/venv/lib/python3.10/site-packages)
[pip3] nvidia-cutlass==3.9.2 (/opt/venv/lib/python3.10/site-packages)
[pip3] nvidia-ml-py==13.590.48 (/opt/venv/lib/python3.10/site-packages)
[pip3] onnx==1.20.1 (/opt/venv/lib/python3.10/site-packages)
[pip3] pyzmq==27.1.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torch==2.10.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torch_memory_saver==0.0.9 (/opt/venv/lib/python3.10/site-packages)
[pip3] torchaudio==2.10.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torchcodec==0.10.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] torchvision==0.25.0 (/opt/venv/lib/python3.10/site-packages)
[pip3] transformers==4.57.3 (/opt/venv/lib/python3.10/site-packages)
[pip3] triton==3.5.1 (/opt/venv/lib/python3.10/site-packages)
[conda] Could not collect

==============================
         vLLM Info
==============================
ROCM Version                 : Could not collect
vLLM Version                 : 0.16.0rc2.dev479+g15d76f74e.d20260226 (git sha: 15d76f74e, date: 20260226)
vLLM Build Flags:
  CUDA Archs: 8.7; ROCm: Disabled
GPU Topology:
  Could not collect

==============================
     Environment Variables
==============================
NVIDIA_VISIBLE_DEVICES=void
CUDNN_LIB_INCLUDE_PATH=/usr/include
PYTORCH_FORCE_BUILD=
TORCH_NVCC_FLAGS=-Xfatbin -compress-all -compress-mode=balance
TORCH_CUDA_ARCH_LIST=8.7
NVIDIA_DRIVER_CAPABILITIES=all
TORCH_NCCL_USE_COMM_NONBLOCKING=0
CUDA_BIN_PATH=/usr/local/cuda/bin
CUDAARCHS=87
CUDA_INSTALLED_VERSION=126
CUDACXX=/usr/local/cuda/bin/nvcc
CUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda
MAX_JOBS=
CUDA_NVCC_EXECUTABLE=/usr/local/cuda/bin/nvcc
CUDNN_LIB_PATH=/usr/lib/aarch64-linux-gnu
LD_LIBRARY_PATH=/usr/local/cuda/compat:/usr/local/cuda/lib64:
TORCH_HOME=/data/models/torch
CUDA_HOME=/usr/local/cuda
CUDA_HOME=/usr/local/cuda
CUDA_MODULE_LOADING=LAZY
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE=1
CUDA_ARCHITECTURES=87
PYTORCH_NVML_BASED_CUDA_CHECK=1
TORCHINDUCTOR_COMPILE_THREADS=1
TORCHINDUCTOR_CACHE_DIR=/tmp/torchinductor_root

How would you like to use vllm

I want to run eagle3 speculate decode on vllm(which successfully works on tensorrt-edge-llm), but errors occurs The command is

sudo docker run -it --rm --network host --runtime=nvidia \
    -v /mnt/data/cache/models:/root/.cache/huggingface \
    ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin \
    vllm serve /root/.cache/huggingface/Qwen3-8B \
    --speculative-config '{"model":"/root/.cache/huggingface/Qwen3-8B-eagle", "num_speculative_tokens": 5}' \
    --gpu-memory-utilization 0.7 \
    --max-model-len 4096 \
    --enforce-eager

The eagle3 model config is

{
  "architectures": [
    "LlamaForCausalLMEagle3"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "head_dim": 128,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 12288,
  "max_position_embeddings": 40960,
  "max_window_layers": 36,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 1,
  "num_key_value_heads":8 ,
  "rms_norm_eps": 1e-06,
  "rope_scaling": null,
  "rope_theta": 1000000,
  "sliding_window": null,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.51.0",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936,
  "draft_vocab_size": 32000
}

The error is

(EngineCore_DP0 pid=309) 
(EngineCore_DP0 pid=309) INFO 03-17 07:45:25 [default_loader.py:293] Loading weights took 0.35 seconds
(EngineCore_DP0 pid=309) INFO 03-17 07:45:26 [gpu_model_runner.py:4275] Model loading took 17.17 GiB memory and 7.165730 seconds
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] EngineCore failed to start.
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] Traceback (most recent call last):
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     super().__init__(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model_runner.profile_run()
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.drafter.dummy_run(
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return func(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     self.model(**kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079]     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) ERROR 03-17 07:45:26 [core.py:1079] TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
(EngineCore_DP0 pid=309) Process EngineCore_DP0:
(EngineCore_DP0 pid=309) Traceback (most recent call last):
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=309)     self.run()
(EngineCore_DP0 pid=309)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=309)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1083, in run_engine_core
(EngineCore_DP0 pid=309)     raise e
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 1069, in run_engine_core
(EngineCore_DP0 pid=309)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 813, in __init__
(EngineCore_DP0 pid=309)     super().__init__(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 115, in __init__
(EngineCore_DP0 pid=309)     num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 249, in _initialize_kv_caches
(EngineCore_DP0 pid=309)     available_gpu_memory = self.model_executor.determine_available_memory()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 128, in determine_available_memory
(EngineCore_DP0 pid=309)     return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/executor/uniproc_executor.py", line 75, in collective_rpc
(EngineCore_DP0 pid=309)     result = run_method(self.driver_worker, method, args, kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/serial_utils.py", line 459, in run_method
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 371, in determine_available_memory
(EngineCore_DP0 pid=309)     self.model_runner.profile_run()
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 5229, in profile_run
(EngineCore_DP0 pid=309)     hidden_states, last_hidden_states = self._dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 4970, in _dummy_run
(EngineCore_DP0 pid=309)     self.drafter.dummy_run(
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(EngineCore_DP0 pid=309)     return func(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/spec_decode/eagle.py", line 1617, in dummy_run
(EngineCore_DP0 pid=309)     self.model(**kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
(EngineCore_DP0 pid=309)     return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=309)   File "/opt/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1787, in _call_impl
(EngineCore_DP0 pid=309)     return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=309) TypeError: Eagle3LlamaForCausalLM.forward() missing 1 required positional argument: 'hidden_states'
[rank0]:[W317 07:45:27.604235298 ProcessGroupNCCL.cpp:1553] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(APIServer pid=191) Traceback (most recent call last):
(APIServer pid=191)   File "/opt/venv/bin/vllm", line 10, in <module>
(APIServer pid=191)     sys.exit(main())
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=191)     args.dispatch_function(args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=191)     uvloop.run(run_server(args))
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 69, in run
(APIServer pid=191)     return loop.run_until_complete(wrapper())
(APIServer pid=191)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=191)     return await main
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=191)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=191)     async with build_async_engine_client(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=191)     async with build_async_engine_client_from_engine_args(
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=191)     return await anext(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=191)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 223, in from_vllm_config
(APIServer pid=191)     return cls(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 152, in __init__
(APIServer pid=191)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 125, in make_async_mp_client
(APIServer pid=191)     return AsyncMPClient(*client_args)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=191)     return func(*args, **kwargs)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 842, in __init__
(APIServer pid=191)     super().__init__(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 496, in __init__
(APIServer pid=191)     with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=191)   File "/root/.local/share/uv/python/cpython-3.10-linux-aarch64-gnu/lib/python3.10/contextlib.py", line 142, in __exit__
(APIServer pid=191)     next(self.gen)
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 925, in launch_core_engines
(APIServer pid=191)     wait_for_engine_startup(
(APIServer pid=191)   File "/opt/venv/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 984, in wait_for_engine_startup
(APIServer pid=191)     raise RuntimeError(
(APIServer pid=191) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The error message indicates that the Eagle3LlamaForCausalLM.forward() method is missing a required positional argument hidden_states.

To fix this issue, you need to modify the dummy_run method in the eagle.py file to pass the required hidden_states argument to the self.model call.

Here's an example of how you can modify the dummy_run method:

def dummy_run(self, **kwargs):
    #... existing code...
    hidden_states = torch.zeros((1, self.model.config.hidden_size))  # initialize hidden states
    self.model(input_ids=torch.zeros((1, 1)), hidden_states=hidden_states)  # pass hidden states to the model
    #... existing code...

Alternatively, you can also modify the forward method of the Eagle3LlamaForCausalLM class to make the hidden_states argument optional:

class Eagle3LlamaForCausalLM(nn.Module):
    #... existing code...
    def forward(self, input_ids, hidden_states=None):
        #... existing code...
        if hidden_states is None:
            hidden_states = torch.zeros((input_ids.shape[0], self.config.hidden_size))
        #... existing code...

Verification

To verify that the fix worked, you can run the vllm serve command again with the modified code and check if the error message is resolved.

Extra Tips

  • Make sure to update the eagle.py file in the correct location, which is likely in the vllm package.
  • If you are using a virtual environment, make sure to activate it before running the vllm serve command.
  • If you are still experiencing issues, try to debug the code by adding print statements or using a debugger to inspect the values of the variables.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING