vllm - 💡(How to fix) Fix [Bug]: Sync EPLB rearrangement hangs indefinitely with DP8 + EP on B200 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38986Fetched 2026-04-08 02:44:31
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Code Example

Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
CMake version                : version 3.28.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.12 (main, Feb  3 2026, 22:51:04) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-6.8.0-94-generic-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.8.61
GPU models and configuration :
GPU 0: NVIDIA B200
GPU 1: NVIDIA B200
GPU 2: NVIDIA B200
GPU 3: NVIDIA B200
GPU 4: NVIDIA B200
GPU 5: NVIDIA B200
GPU 6: NVIDIA B200
GPU 7: NVIDIA B200

Nvidia driver version        : 580.126.09
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
Model name:                           INTEL(R) XEON(R) PLATINUM 8570
CPU(s):                               224
Socket(s):                            2
Core(s) per socket:                   56
Thread(s) per core:                   2
NUMA node(s):                         2

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.7
[pip3] numpy==2.2.6
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] torch==2.10.0
[pip3] transformers==4.57.5
[pip3] triton==3.6.0

==============================
         vLLM Info
==============================
vLLM Version                 : 0.18.2rc1.dev78+g2021f494a (git sha: 2021f494a)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  GPU0-GPU7: All NV18 interconnected (8x NVIDIA B200)

---

vllm serve nvidia/Qwen3.5-397B-A17B-NVFP4 \
  --port 8000 -tp 1 -pp 1 -dp 8 \
  --enable-expert-parallel --language-model-only \
  --reasoning-parser qwen3 --stream-interval 100 \
  --enable-eplb \
  --eplb-config '{"num_redundant_experts": 32, "window_size": 100, "step_interval": 100, "log_balancedness": true, "log_balancedness_interval": 1}' \
  --gpu-memory-utilization 0.80

---

vllm bench serve \
  --backend vllm --model nvidia/Qwen3.5-397B-A17B-NVFP4 \
  --port 8000 --endpoint /v1/completions \
  --dataset-name random --random-input 8192 --random-output 1 \
  --max-concurrency 128 --num-prompt 128 --ignore-eos --temperature 0.0
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Collecting environment information...
==============================
        System Info
==============================
OS                           : Ubuntu 24.04.4 LTS (x86_64)
GCC version                  : (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0
CMake version                : version 3.28.3
Libc version                 : glibc-2.39

==============================
       PyTorch Info
==============================
PyTorch version              : 2.10.0+cu128
Is debug build               : False
CUDA used to build PyTorch   : 12.8
ROCM used to build PyTorch   : N/A

==============================
      Python Environment
==============================
Python version               : 3.12.12 (main, Feb  3 2026, 22:51:04) [Clang 21.1.4 ] (64-bit runtime)
Python platform              : Linux-6.8.0-94-generic-x86_64-with-glibc2.39

==============================
       CUDA / GPU Info
==============================
Is CUDA available            : True
CUDA runtime version         : 12.8.61
GPU models and configuration :
GPU 0: NVIDIA B200
GPU 1: NVIDIA B200
GPU 2: NVIDIA B200
GPU 3: NVIDIA B200
GPU 4: NVIDIA B200
GPU 5: NVIDIA B200
GPU 6: NVIDIA B200
GPU 7: NVIDIA B200

Nvidia driver version        : 580.126.09
Is XNNPACK available         : True

==============================
          CPU Info
==============================
Architecture:                         x86_64
Model name:                           INTEL(R) XEON(R) PLATINUM 8570
CPU(s):                               224
Socket(s):                            2
Core(s) per socket:                   56
Thread(s) per core:                   2
NUMA node(s):                         2

==============================
Versions of relevant libraries
==============================
[pip3] flashinfer-python==0.6.7
[pip3] numpy==2.2.6
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] torch==2.10.0
[pip3] transformers==4.57.5
[pip3] triton==3.6.0

==============================
         vLLM Info
==============================
vLLM Version                 : 0.18.2rc1.dev78+g2021f494a (git sha: 2021f494a)
vLLM Build Flags:
  CUDA Archs: Not Set; ROCm: Disabled
GPU Topology:
  GPU0-GPU7: All NV18 interconnected (8x NVIDIA B200)
</details>

🐛 Describe the bug

Sync EPLB rearrangement hangs indefinitely during serving with DP8 + expert parallel on 8xB200, causing all EngineCore processes to stall.

Timeline from logs:

  • Steps 75-99: EPLB runs normally, balancedness ~0.46
  • Step 99 (18:07:19): Rearranging experts sync mode ... -- rearrangement starts
  • 18:08:20 (+60s): All 8 EngineCore report No available shared memory broadcast block found in 60 seconds
  • This repeats every 60 seconds for ~11 minutes
  • 18:18:30: All EngineCore crash with RuntimeError: cancelled from shm_broadcast.py:677

The rearrangement never completes -- it deadlocks on the NCCL collective inside rearrange(). The first rearrangement during model loading (profile mode) works fine (3.80s). The hang occurs on the first real rearrangement triggered by serving load.

Note: the balancedness before the hang is very poor (~0.46). Not sure if that's related.

Server command:

vllm serve nvidia/Qwen3.5-397B-A17B-NVFP4 \
  --port 8000 -tp 1 -pp 1 -dp 8 \
  --enable-expert-parallel --language-model-only \
  --reasoning-parser qwen3 --stream-interval 100 \
  --enable-eplb \
  --eplb-config '{"num_redundant_experts": 32, "window_size": 100, "step_interval": 100, "log_balancedness": true, "log_balancedness_interval": 1}' \
  --gpu-memory-utilization 0.80

Benchmark command (triggers the hang):

vllm bench serve \
  --backend vllm --model nvidia/Qwen3.5-397B-A17B-NVFP4 \
  --port 8000 --endpoint /v1/completions \
  --dataset-name random --random-input 8192 --random-output 1 \
  --max-concurrency 128 --num-prompt 128 --ignore-eos --temperature 0.0

Key observations:

  • Profile rearrangement during startup completes fine (3.80s)
  • The hang occurs on the first real rearrangement at step 100 (after the first 25 steps of actual serving traffic, since initial step is set to 75)
  • All 8 GPU workers are at 100% utilization during the hang (busy-spinning on NCCL?)
  • Memory is not the issue (~17 GB free per GPU)

Related: #32478 (EPLB hangs in several cases) -- that issue covers async EPLB + DeepEP/specific backends. This is sync EPLB with standard NCCL on B200.

Full server log attached as sync_eplb_failure_log.txt.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The most likely fix for the sync EPLB rearrangement hang is to investigate and address the potential deadlock in the NCCL collective inside the rearrange() function, possibly by adjusting the EPLB configuration or NCCL settings.

Guidance

  • Investigate the NCCL collective inside the rearrange() function to identify the cause of the deadlock, considering the high GPU utilization and lack of memory issues.
  • Review the EPLB configuration, particularly the num_redundant_experts, window_size, and step_interval parameters, to determine if adjustments can help prevent the hang.
  • Consider reducing the max-concurrency and num-prompt parameters in the benchmark command to decrease the load on the GPU workers and potentially prevent the deadlock.
  • Examine the server log and attached sync_eplb_failure_log.txt file for additional clues about the cause of the hang.

Example

No specific code snippet is provided, as the issue is related to a complex system configuration and requires a more in-depth investigation.

Notes

The hang occurs only during the first real rearrangement at step 100, after the initial profile rearrangement completes successfully. The poor balancedness before the hang (~0.46) may be related to the issue, but its impact is unclear.

Recommendation

Apply a workaround by adjusting the EPLB configuration or NCCL settings to prevent the deadlock, as the root cause of the issue is not immediately clear and may require further investigation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Sync EPLB rearrangement hangs indefinitely with DP8 + EP on B200 [1 participants]