vllm - 💡(How to fix) Fix [Bug]: Possible to get GPU OOM for DP/EP

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

(APIServer_DP1 pid=2646203) INFO: 127.0.0.1:32908 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO 05-20 10:42:11 [loggers.py:271] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2226.2 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1% (APIServer_DP0 pid=2646202) INFO 05-20 10:42:11 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1312.6 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1% (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:48392 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:48604 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:48406 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:48612 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO 05-20 10:42:21 [loggers.py:271] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1% (APIServer_DP0 pid=2646202) INFO 05-20 10:42:21 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1% (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:57484 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:37980 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:57488 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:37982 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:46168 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:46328 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:46170 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:46330 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:40140 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:38350 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:38352 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:40156 - "GET /health HTTP/1.1" 200 OK (APIServer_DP1 pid=2646203) INFO: 127.0.0.1:60698 - "GET /health HTTP/1.1" 200 OK (APIServer_DP0 pid=2646202) INFO: 127.0.0.1:38352 - "GET /health HTTP/1.1" 200 OK (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] WorkerProc hit an exception. (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] Traceback (most recent call last): (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] output = func(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/worker_base.py", line 345, in execute_model (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self.worker.execute_model(scheduler_output) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_worker.py", line 843, in execute_model (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] output = self.model_runner.execute_model( (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return func(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_model_runner.py", line 4200, in execute_model (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] model_output = self._model_forward( (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_model_runner.py", line 3677, in _model_forward (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self.model( (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1692, in forward (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] hidden_states = self.model( (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/compilation/decorators.py", line 507, in call (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self.forward(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1315, in forward (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] hidden_states, residual = layer( (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1174, in forward (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] hidden_states = self.self_attn(**attn_kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return forward_call(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1067, in forward (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self.mla_attn(positions, hidden_states, llama_4_scaling) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] return self._call_impl(*args, **kwargs) (Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

[ ...... ]

(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=809, num_waiting_reqs=0, num_skipped_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.47516605700935377, prefix_cache_stats=PrefixCacheStats(reset=False, requests=12, queries=11530, hits=11408, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={}, cudagraph_stats=None, perf_stats=None) (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] EngineCore encountered a fatal error. (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] Traceback (most recent call last): (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1152, in run_engine_core (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] engine_core.run_busy_loop() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1831, in run_busy_loop (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] executed = self._process_engine_step() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1232, in _process_engine_step (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] outputs, model_executed = self.step_fn() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 548, in step_with_batch_queue (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] exec_model_fut.result() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 90, in result (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] return super().result() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] return self.__get_result() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] raise self._exception (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] response = self.aggregate(self.get_response()) (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 390, in get_response (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] raise RuntimeError( (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] RuntimeError: Worker failed with error 'CUDA out of memory. Tried to allocate 4.69 GiB. GPU 0 has a total capacity of 79.18 GiB of which 4.48 GiB is free. Including non-PyTorch memory, this process has 74.69 GiB memory in use. Of the allocated memory 71.91 GiB is allocated by PyTorch, and 755.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)', please check the stack trace above for the root cause (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] AsyncLLM output_handler failed. (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] Traceback (most recent call last): (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/async_llm.py", line 660, in output_handler (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] outputs = await engine_core.get_output_async() (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core_client.py", line 998, in get_output_async (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] raise self._format_exception(outputs) from None (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:775] Parent process exited, terminating worker queues (Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:872] WorkerProc shutting down.

Root Cause

(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=809, num_waiting_reqs=0, num_skipped_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.47516605700935377, prefix_cache_stats=PrefixCacheStats(reset=False, requests=12, queries=11530, hits=11408, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={}, cudagraph_stats=None, perf_stats=None) (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] EngineCore encountered a fatal error. (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] Traceback (most recent call last): (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1152, in run_engine_core (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] engine_core.run_busy_loop() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1831, in run_busy_loop (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] executed = self._process_engine_step() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1232, in _process_engine_step (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] outputs, model_executed = self.step_fn() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 548, in step_with_batch_queue (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] exec_model_fut.result() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 90, in result (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] return super().result() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] return self.__get_result() (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] raise self._exception (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] response = self.aggregate(self.get_response()) (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] ^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 390, in get_response (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] raise RuntimeError( (EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] RuntimeError: Worker failed with error 'CUDA out of memory. Tried to allocate 4.69 GiB. GPU 0 has a total capacity of 79.18 GiB of which 4.48 GiB is free. Including non-PyTorch memory, this process has 74.69 GiB memory in use. Of the allocated memory 71.91 GiB is allocated by PyTorch, and 755.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)', please check the stack trace above for the root cause (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] AsyncLLM output_handler failed. (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] Traceback (most recent call last): (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/async_llm.py", line 660, in output_handler (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] outputs = await engine_core.get_output_async() (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core_client.py", line 998, in get_output_async (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] raise self._format_exception(outputs) from None (APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause. (Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:775] Parent process exited, terminating worker queues (Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:872] WorkerProc shutting down.

Code Example

MODEL := "deepseek-ai/DeepSeek-V2-lite"

launch:
    chg run --gpus 2 -- vllm serve \
    {{MODEL}} \
    --port 8000 \
    --data-parallel-supervisor-port 7999  \
    --enable-expert-parallel \
    --data-parallel-size 2 \
    --data-parallel-size-local 2 \
    --data-parallel-start-rank 0 \
    --data-parallel-multi-port-external-lb \
    --enforce-eager \
    --shutdown-timeout 15

eval PORT:
    lm_eval \
        --model local-completions \
        --model_args "base_url=http://127.0.0.1:{{PORT}}/v1/completions,model={{MODEL}},num_concurrent=1024,num_retries=0" \
        --tasks gsm8k

probe_health:
    curl -vvv GET http://localhost:7999/health

---

just eval 8000
just eval 8001

---

(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:32908 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO 05-20 10:42:11 [loggers.py:271] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2226.2 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP0 pid=2646202) INFO 05-20 10:42:11 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1312.6 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:48392 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:48604 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:48406 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:48612 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO 05-20 10:42:21 [loggers.py:271] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP0 pid=2646202) INFO 05-20 10:42:21 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:57484 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:37980 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:57488 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:37982 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:46168 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:46328 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:46170 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:46330 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40140 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38350 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38352 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40156 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:60698 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38352 - "GET /health HTTP/1.1" 200 OK
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] WorkerProc hit an exception.
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/worker_base.py", line 345, in execute_model
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.worker.execute_model(scheduler_output)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_worker.py", line 843, in execute_model
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     output = self.model_runner.execute_model(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_model_runner.py", line 4200, in execute_model
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     model_output = self._model_forward(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                    ^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_model_runner.py", line 3677, in _model_forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.model(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1692, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/compilation/decorators.py", line 507, in __call__
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.forward(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1315, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     hidden_states, residual = layer(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                               ^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1174, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     hidden_states = self.self_attn(**attn_kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1067, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.mla_attn(positions, hidden_states, llama_4_scaling)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


[ ...... ]


(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=809, num_waiting_reqs=0, num_skipped_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.47516605700935377, prefix_cache_stats=PrefixCacheStats(reset=False, requests=12, queries=11530, hits=11408, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={}, cudagraph_stats=None, perf_stats=None)
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] Traceback (most recent call last):
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1152, in run_engine_core
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1831, in run_busy_loop
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     executed = self._process_engine_step()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1232, in _process_engine_step
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 548, in step_with_batch_queue
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     exec_model_fut.result()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     return super().result()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]            ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     return self.__get_result()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     raise self._exception
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     response = self.aggregate(self.get_response())
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     raise RuntimeError(
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] RuntimeError: Worker failed with error 'CUDA out of memory. Tried to allocate 4.69 GiB. GPU 0 has a total capacity of 79.18 GiB of which 4.48 GiB is free. Including non-PyTorch memory, this process has 74.69 GiB memory in use. Of the allocated memory 71.91 GiB is allocated by PyTorch, and 755.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)', please check the stack trace above for the root cause
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] AsyncLLM output_handler failed.
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] Traceback (most recent call last):
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/async_llm.py", line 660, in output_handler
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]     outputs = await engine_core.get_output_async()
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core_client.py", line 998, in get_output_async
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]     raise self._format_exception(outputs) from None
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:775] Parent process exited, terminating worker queues
(Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:872] WorkerProc shutting down.
RAW_BUFFERClick to expand / collapse

Your current environment

  • running on wentao-feature-local-external-dp

🐛 Describe the bug

MODEL := "deepseek-ai/DeepSeek-V2-lite"

launch:
    chg run --gpus 2 -- vllm serve \
    {{MODEL}} \
    --port 8000 \
    --data-parallel-supervisor-port 7999  \
    --enable-expert-parallel \
    --data-parallel-size 2 \
    --data-parallel-size-local 2 \
    --data-parallel-start-rank 0 \
    --data-parallel-multi-port-external-lb \
    --enforce-eager \
    --shutdown-timeout 15

eval PORT:
    lm_eval \
        --model local-completions \
        --model_args "base_url=http://127.0.0.1:{{PORT}}/v1/completions,model={{MODEL}},num_concurrent=1024,num_retries=0" \
        --tasks gsm8k

probe_health:
    curl -vvv GET http://localhost:7999/health

run lm_eval a couple times on each port

just eval 8000
just eval 8001
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:32908 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO 05-20 10:42:11 [loggers.py:271] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 2226.2 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP0 pid=2646202) INFO 05-20 10:42:11 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 1312.6 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:48392 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:48604 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:48406 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:48612 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO 05-20 10:42:21 [loggers.py:271] Engine 001: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP0 pid=2646202) INFO 05-20 10:42:21 [loggers.py:271] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 99.1%
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:57484 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:37980 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:57488 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:37982 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:46168 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:46328 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:46170 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:46330 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38338 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40140 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38350 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40154 - "GET /tokenizer_info HTTP/1.1" 404 Not Found
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38352 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:40156 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP1 pid=2646203) INFO:     127.0.0.1:60698 - "GET /health HTTP/1.1" 200 OK
(APIServer_DP0 pid=2646202) INFO:     127.0.0.1:38352 - "GET /health HTTP/1.1" 200 OK
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] WorkerProc hit an exception.
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962] Traceback (most recent call last):
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 957, in worker_busy_loop
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     output = func(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/worker_base.py", line 345, in execute_model
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.worker.execute_model(scheduler_output)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_worker.py", line 843, in execute_model
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     output = self.model_runner.execute_model(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return func(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_model_runner.py", line 4200, in execute_model
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     model_output = self._model_forward(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                    ^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/worker/gpu_model_runner.py", line 3677, in _model_forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.model(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1692, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     hidden_states = self.model(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                     ^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/compilation/decorators.py", line 507, in __call__
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.forward(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1315, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     hidden_states, residual = layer(
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                               ^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1174, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     hidden_states = self.self_attn(**attn_kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1790, in _call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return forward_call(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/vllm/model_executor/models/deepseek_v2.py", line 1067, in forward
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self.mla_attn(positions, hidden_states, llama_4_scaling)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]   File "/home/robertgshaw2-redhat/vllm/.venv/lib64/python3.12/site-packages/torch/nn/modules/module.py", line 1779, in _wrapped_call_impl
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]     return self._call_impl(*args, **kwargs)
(Worker_DP0_EP0 pid=2647785) ERROR 05-20 10:42:56 [multiproc_executor.py:962]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


[ ...... ]


(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [dump_input.py:81] Dumping scheduler stats: SchedulerStats(num_running_reqs=809, num_waiting_reqs=0, num_skipped_waiting_reqs=0, step_counter=0, current_wave=0, kv_cache_usage=0.47516605700935377, prefix_cache_stats=PrefixCacheStats(reset=False, requests=12, queries=11530, hits=11408, preempted_requests=0, preempted_queries=0, preempted_hits=0), connector_prefix_cache_stats=None, kv_cache_eviction_events=[], spec_decoding_stats=None, kv_connector_stats=None, waiting_lora_adapters={}, running_lora_adapters={}, cudagraph_stats=None, perf_stats=None)
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] EngineCore encountered a fatal error.
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] Traceback (most recent call last):
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1152, in run_engine_core
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     engine_core.run_busy_loop()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1831, in run_busy_loop
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     executed = self._process_engine_step()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]                ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 1232, in _process_engine_step
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     outputs, model_executed = self.step_fn()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]                               ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core.py", line 548, in step_with_batch_queue
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     exec_model_fut.result()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 90, in result
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     return super().result()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]            ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 449, in result
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     return self.__get_result()
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]            ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/usr/lib64/python3.12/concurrent/futures/_base.py", line 401, in __get_result
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     raise self._exception
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 94, in _wait_for_response
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     response = self.aggregate(self.get_response())
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]                               ^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/executor/multiproc_executor.py", line 390, in get_response
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161]     raise RuntimeError(
(EngineCore_DP0 pid=2647451) ERROR 05-20 10:42:56 [core.py:1161] RuntimeError: Worker failed with error 'CUDA out of memory. Tried to allocate 4.69 GiB. GPU 0 has a total capacity of 79.18 GiB of which 4.48 GiB is free. Including non-PyTorch memory, this process has 74.69 GiB memory in use. Of the allocated memory 71.91 GiB is allocated by PyTorch, and 755.13 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://docs.pytorch.org/docs/stable/notes/cuda.html#optimizing-memory-usage-with-pytorch-cuda-alloc-conf)', please check the stack trace above for the root cause
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] AsyncLLM output_handler failed.
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] Traceback (most recent call last):
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/async_llm.py", line 660, in output_handler
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]     outputs = await engine_core.get_output_async()
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]   File "/home/robertgshaw2-redhat/vllm/vllm/v1/engine/core_client.py", line 998, in get_output_async
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704]     raise self._format_exception(outputs) from None
(APIServer_DP0 pid=2646202) ERROR 05-20 10:42:56 [async_llm.py:704] vllm.v1.engine.exceptions.EngineDeadError: EngineCore encountered an issue. See stack trace (above) for the root cause.
(Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:775] Parent process exited, terminating worker queues
(Worker_DP0_EP0 pid=2647785) INFO 05-20 10:42:56 [multiproc_executor.py:872] WorkerProc shutting down.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Possible to get GPU OOM for DP/EP