vllm - 💡(How to fix) Fix [BUGS] vLLM V1 Engine Hangs After Weight Loading on Blackwell (sm_121) Multi-Node Ray Setup (TP=2) [2 comments, 2 participants]

vllm2026-04-13 08:01:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#39682•Fetched 2026-04-14 05:38:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Haroliao

Participants

Haroliao

wzhao18

Timeline (top)

commented ×2labeled ×1renamed ×1

RAW_BUFFERClick to expand / collapse

I am experiencing an indefinite hang during the memory profiling stage after model weights have successfully loaded.

Hardware: 2x NVIDIA Blackwell GPUs (Distributed: 1 per node across 2 nodes) vLLM Version: 0.19.1+ (V1 Engine enabled) Interconnect: ConnectX-7 (RoCE/Ethernet) Model: Qwen3-30B-A3B (FP8) Serving Command: vllm serve Qwen/Qwen3-30B-A3B --tensor-parallel-size 2 --distributed-executor-backend ray --quantization fp8 --kv-cache-dtype fp8 --enforce-eager

The BUg

The Issue: The logs show that model weights (approx 14.53 GiB) load successfully on both nodes. However, the process hangs immediately after. I do not see the "Profiling KV cache" or "Uvicorn running" messages.

Logs: (EngineCore pid=...) INFO ... Model loading took 14.53 GiB memory and 172.4 seconds (Stuck here indefinitely)

My attempt to fix it: Used --enforce-eager to skip CUDA graph capture. Set VLLM_DISABLE_FRONTEND_MULTIPROCESSING=1, but received a warning that the variable is unknown/unsupported in this version. Set NCCL_P2P_DISABLE=1 and increased Ray CGRAPH timeouts. Lowered --gpu-memory-utilization to 0.7.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be potentially resolved by investigating the memory utilization and distribution of the model weights across the GPUs, as the hang occurs immediately after successful model loading.

Guidance

Verify the memory utilization of each GPU node after model loading to ensure that the --gpu-memory-utilization setting of 0.7 is not causing an issue, and consider adjusting this value further if necessary.
Check the NCCL (Network Collective Communication Library) configuration and logs for any errors or warnings that might indicate issues with the distributed setup, given that NCCL_P2P_DISABLE=1 was set.
Investigate the impact of VLLM_DISABLE_FRONTEND_MULTIPROCESSING=1 despite the warning, as it might be related to the hanging issue, even if the variable is reported as unknown/unsupported in this version.
Review the Ray configuration and logs for any timeout or resource allocation issues, considering the adjustments made to CGRAPH timeouts.

Example

No specific code snippet can be provided without more context on the customizations or specific configurations applied to the vLLM setup.

Notes

The solution may depend on the specific version of vLLM and its compatibility with the distributed setup and hardware. The fact that model weights load successfully but the process hangs afterward suggests a potential issue with resource allocation, memory, or communication between nodes.

Recommendation

Apply workaround: Adjust the --gpu-memory-utilization setting and monitor the system's memory and resource utilization closely to identify any bottlenecks, as this seems to be a critical point of failure given the successful loading of model weights but subsequent hang.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#ssr #model loading #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [BUGS] vLLM V1 Engine Hangs After Weight Loading on Blackwell (sm_121) Multi-Node Ray Setup (TP=2) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

The BUg

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [BUGS] vLLM V1 Engine Hangs After Weight Loading on Blackwell (sm_121) Multi-Node Ray Setup (TP=2) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

The BUg

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING