vllm - 💡(How to fix) Fix [Bug] custom_all_reduce IPC handle fails with expandable_segments when DP>1 AND TP>1

vllm2026-05-14 07:31:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

With PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True, starting vllm with both DP>1 and TP>1 crashes during worker init in custom_all_reduce.cuh:455 (cudaIpcGetMemHandle returns invalid argument). Either dim alone works.

Error Message

Failed: Cuda error custom_all_reduce.cuh:455 'invalid argument'

Root Cause

Root cause (best guess)

Fix Action

Workaround

Drop expandable_segments or pass --disable-custom-all-reduce.

Code Example

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
vllm serve Qwen/Qwen3-0.6B --data-parallel-size 2 --tensor-parallel-size 2

---

Failed: Cuda error custom_all_reduce.cuh:455 'invalid argument'
EngineCore failed to start: Worker proc VllmWorker-* died unexpectedly

RAW_BUFFERClick to expand / collapse

Summary

Environment

vllm 0.20.2
4× H200 (1 node)

Repro

PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \
vllm serve Qwen/Qwen3-0.6B --data-parallel-size 2 --tensor-parallel-size 2

Failed: Cuda error custom_all_reduce.cuh:455 'invalid argument'
EngineCore failed to start: Worker proc VllmWorker-* died unexpectedly

Matrix (Qwen3-0.6B, expandable_segments enabled)

DP	TP	result
2	1	✓
1	2	✓
2	2	✗ (crash as above)

Independent of --enable-sleep-mode (reproduced with and without it).

Root cause (best guess)

expandable_segments reserves virtual address ranges via cuMemAddressReserve and commits physical memory lazily with cuMemMap. The base pointer returned by cuPointerGetAttribute(..., rangeStartAddrAttr, ...) in get_graph_buffer_ipc_meta is the head of such a VA range, which is not a valid source for cudaIpcGetMemHandle — IPC handles require memory backed by a single cudaMalloc/cuMemCreate allocation. DP=1 or TP=1 doesn't trigger the path because cross-process IPC isn't needed.

Workaround

Drop expandable_segments or pass --disable-custom-all-reduce.

Suggested fix

PR #40812 already temporarily disables expandable_segments around the cumem sleep-mode pool. The custom_all_reduce IPC graph buffer registration needs the same treatment — either temporarily switch off expandable segments while the graph buffers are being allocated/registered, or detect the cumem-backed pointer and skip custom-all-reduce IPC fallback.

@youkaichao could you take a look?

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#ssr #authentication issue #prompt issue #agent setup #task chaining

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug] custom_all_reduce IPC handle fails with expandable_segments when DP>1 AND TP>1

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (best guess)

Fix Action

Workaround

Code Example

Summary

Environment

Repro

Matrix (Qwen3-0.6B, expandable_segments enabled)

Root cause (best guess)

Workaround

Suggested fix

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug] custom_all_reduce IPC handle fails with expandable_segments when DP>1 AND TP>1

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (best guess)

Fix Action

Workaround

Code Example

Summary

Environment

Repro

Matrix (Qwen3-0.6B, expandable_segments enabled)

Root cause (best guess)

Workaround

Suggested fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING