vllm - 💡(How to fix) Fix [Bug]: vLLM serve with tensor-parallel-size=8 on Kubernetes + vGPU fails: NCCL TCPStore broken pipe, EngineCore initialization failed

vllm2026-05-12 07:48:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Exception: WorkerProc initialization failed due to an exception in a background process. ... RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Root Cause

EngineCore process exception:

Exception: WorkerProc initialization failed due to an exception in a background process.
...
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

Code Example

vllm serve /shared/models/huggingface/hub/models--deepseek-ai--DeepSeek-V4-Flash/snapshots/6976c7ff1b30a1b2cb7805021b8ba4684041f136 \
  --host 0.0.0.0 \
  --port 5180 \
  --enable-prefix-caching \
  --trust-remote-code \
  --enable-auto-tool-choice \
  --tensor-parallel-size 8 \
  --max-model-len 8192 \
  --tokenizer-mode deepseek_v4 \
  --tool-call-parser deepseek_v4 \
  --kv-cache-dtype fp8

---

Exception: WorkerProc initialization failed due to an exception in a background process.
   ...
   RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

---

[rank6]:[W512 14:13:55.762631223 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=97, addr=[localhost]:46188, remote=[::ffff:127.0.0.1]:43715): Broken pipe
   [rank6]:[W512 14:13:55.765336703 ProcessGroupNCCL.cpp:1826] [PG ID 0 PG GUID 0 Rank 6] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe

---

template:
  metadata:
    labels:
      app: ${APP_NAME}
    annotations:
      nvidia.com/use-gputype: H100
  spec:
    tolerations:
    - key: "node/has-no-internet"
      operator: "Exists"
      effect: "NoSchedule"
    containers:
    - name: ${APP_NAME}
      image: ${DOCKER_PULL_IMAGE}
      command: ["vllm", "serve", "/${SHARE}/models/huggingface/hub/models--Qwen--Qwen3-Next-80B-A3B-Instruct/snapshots/3eb90afa4e2fff2db323f75999fd31dd2bae4021", "--host", "0.0.0.0", "--port", "${PORT}", "--swap-space", "16", "--gpu-memory-utilization", "0.95", "--max-num-seqs", "128", "--max-num-batched-tokens", "32768", "--max-model-len", "5120", "--tensor-parallel-size", "4"]
      ports:
      - containerPort: ${PORT}
      resources:
        limits:
          nvidia.com/vgpu: 4
          cpu: 32
          memory: 160Gi
        requests:
          nvidia.com/vgpu: 4
          cpu: 16
          memory: 80Gi

RAW_BUFFERClick to expand / collapse

Your current environment

🐛 Describe the bug

vLLM version: FROM vllm/vllm-openai:deepseekv4-cu129

Python version: 3.12

Hardware/Cluster: Kubernetes cluster, nodes with H100 vGPUs (virtual GPUs), requesting 8 vGPUs via nvidia.com/vgpu resource

Operating System: Container image based on Ubuntu (inferred)

Full launch command:

vllm serve /shared/models/huggingface/hub/models--deepseek-ai--DeepSeek-V4-Flash/snapshots/6976c7ff1b30a1b2cb7805021b8ba4684041f136 \
  --host 0.0.0.0 \
  --port 5180 \
  --enable-prefix-caching \
  --trust-remote-code \
  --enable-auto-tool-choice \
  --tensor-parallel-size 8 \
  --max-model-len 8192 \
  --tokenizer-mode deepseek_v4 \
  --tool-call-parser deepseek_v4 \
  --kv-cache-dtype fp8

Kubernetes resource configuration:

Requests/limits: nvidia.com/vgpu: 8, CPU 32/16, memory 320Gi/160Gi
hostIPC: true and /dev/shm volume were not set initially (we later added them, but vLLM's compatibility with vGPUs still needs confirmation)

Problem Description

When starting the DeepSeek-V4-Flash model with --tensor-parallel-size 8 to run inference across 8 vGPUs, the service fails during initialization with the following two types of errors:

EngineCore process exception:

Exception: WorkerProc initialization failed due to an exception in a background process.
...
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

NCCL communication error (critical):

[rank6]:[W512 14:13:55.762631223 TCPStore.cpp:106] [c10d] sendBytes failed on SocketImpl(fd=97, addr=[localhost]:46188, remote=[::ffff:127.0.0.1]:43715): Broken pipe
[rank6]:[W512 14:13:55.765336703 ProcessGroupNCCL.cpp:1826] [PG ID 0 PG GUID 0 Rank 6] Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Broken pipe

Multiple ranks output similar errors, accompanied by a KeyboardInterrupt that eventually shuts down the entire APIServer.

In addition, there are many harmless environment variable warnings (e.g., VLLM_QWQ_SERVICE_*), which can be ignored.

However, when I previously deployed the Qwen80B model on the same GPUs, I did not encounter similar issues.

vLLM version: vllm/vllm-openai:v0.11.2

Other configuration (as shown in the pod template):

template:
  metadata:
    labels:
      app: ${APP_NAME}
    annotations:
      nvidia.com/use-gputype: H100
  spec:
    tolerations:
    - key: "node/has-no-internet"
      operator: "Exists"
      effect: "NoSchedule"
    containers:
    - name: ${APP_NAME}
      image: ${DOCKER_PULL_IMAGE}
      command: ["vllm", "serve", "/${SHARE}/models/huggingface/hub/models--Qwen--Qwen3-Next-80B-A3B-Instruct/snapshots/3eb90afa4e2fff2db323f75999fd31dd2bae4021", "--host", "0.0.0.0", "--port", "${PORT}", "--swap-space", "16", "--gpu-memory-utilization", "0.95", "--max-num-seqs", "128", "--max-num-batched-tokens", "32768", "--max-model-len", "5120", "--tensor-parallel-size", "4"]
      ports:
      - containerPort: ${PORT}
      resources:
        limits:
          nvidia.com/vgpu: 4
          cpu: 32
          memory: 160Gi
        requests:
          nvidia.com/vgpu: 4
          cpu: 16
          memory: 80Gi

I found that the NCCL version in the DeepSeek image is 2.28.9, while the NCCL version in the Qwen80B image is 2.27.5. Could this be an issue with the newer NCCL version? Because when I use the newer image to deploy other models (non-DeepSeek, such as Qwen3.6 27B) with tensor parallelism, I still encounter NCCL issues. Moreover, this problem has been present in the vllm/vllm-openai:nightly images for a long time.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retriever error #indexing error #inference speed #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: vLLM serve with tensor-parallel-size=8 on Kubernetes + vGPU fails: NCCL TCPStore broken pipe, EngineCore initialization failed

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Your current environment

🐛 Describe the bug

Problem Description

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: vLLM serve with tensor-parallel-size=8 on Kubernetes + vGPU fails: NCCL TCPStore broken pipe, EngineCore initialization failed

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Your current environment

🐛 Describe the bug

Problem Description

Before submitting a new issue...

Still need to ship something?

RELATED_DISCOVERY

TRENDING