vllm - 💡(How to fix) Fix [BUGS] vLLM V1 Engine Hangs After Weight Loading on Blackwell (sm_121) Multi-Node Ray Setup (TP=2) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39682Fetched 2026-04-14 05:38:09
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
commented ×2labeled ×1renamed ×1
RAW_BUFFERClick to expand / collapse

I am experiencing an indefinite hang during the memory profiling stage after model weights have successfully loaded.

Hardware: 2x NVIDIA Blackwell GPUs (Distributed: 1 per node across 2 nodes) vLLM Version: 0.19.1+ (V1 Engine enabled) Interconnect: ConnectX-7 (RoCE/Ethernet) Model: Qwen3-30B-A3B (FP8) Serving Command: vllm serve Qwen/Qwen3-30B-A3B --tensor-parallel-size 2 --distributed-executor-backend ray --quantization fp8 --kv-cache-dtype fp8 --enforce-eager

The BUg

The Issue: The logs show that model weights (approx 14.53 GiB) load successfully on both nodes. However, the process hangs immediately after. I do not see the "Profiling KV cache" or "Uvicorn running" messages.

Logs: (EngineCore pid=...) INFO ... Model loading took 14.53 GiB memory and 172.4 seconds (Stuck here indefinitely)

My attempt to fix it: Used --enforce-eager to skip CUDA graph capture. Set VLLM_DISABLE_FRONTEND_MULTIPROCESSING=1, but received a warning that the variable is unknown/unsupported in this version. Set NCCL_P2P_DISABLE=1 and increased Ray CGRAPH timeouts. Lowered --gpu-memory-utilization to 0.7.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be potentially resolved by investigating the memory utilization and distribution of the model weights across the GPUs, as the hang occurs immediately after successful model loading.

Guidance

  • Verify the memory utilization of each GPU node after model loading to ensure that the --gpu-memory-utilization setting of 0.7 is not causing an issue, and consider adjusting this value further if necessary.
  • Check the NCCL (Network Collective Communication Library) configuration and logs for any errors or warnings that might indicate issues with the distributed setup, given that NCCL_P2P_DISABLE=1 was set.
  • Investigate the impact of VLLM_DISABLE_FRONTEND_MULTIPROCESSING=1 despite the warning, as it might be related to the hanging issue, even if the variable is reported as unknown/unsupported in this version.
  • Review the Ray configuration and logs for any timeout or resource allocation issues, considering the adjustments made to CGRAPH timeouts.

Example

No specific code snippet can be provided without more context on the customizations or specific configurations applied to the vLLM setup.

Notes

The solution may depend on the specific version of vLLM and its compatibility with the distributed setup and hardware. The fact that model weights load successfully but the process hangs afterward suggests a potential issue with resource allocation, memory, or communication between nodes.

Recommendation

Apply workaround: Adjust the --gpu-memory-utilization setting and monitor the system's memory and resource utilization closely to identify any bottlenecks, as this seems to be a critical point of failure given the successful loading of model weights but subsequent hang.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING