vllm - ✅(Solved) Fix [Bug] External LB test_external_lb_dp[4] failing since shutdown timeout PR #34730 [5 pull requests, 3 comments, 2 participants]

vllm2026-03-10 09:28:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36624•Fetched 2026-04-08 00:35:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

elvircrn

Participants

elvircrn

markmc

Timeline (top)

cross-referenced ×5commented ×3closed ×1mentioned ×1

test_external_lb_dp.py with api_server_count=4 has been failing consistently in every nightly/daily build since PR #34730 (27066d1b) landed on Mar 6.

The [1] variant (1 API server per DP rank) passes; only [4] (4 API servers per DP rank) fails.

Error Message

EngineCore_DP1: WorkerProc initialization failed due to an exception in a background process.
Worker_DP0_TP0: Parent process exited, terminating worker queues
Worker_DP0_TP0: BrokenPipeError: [Errno 32] Broken pipe
Failed to start server rank 0: Server exited unexpectedly
Exception: Servers failed to start

Root Cause

PR #34730 changes the server startup/shutdown chain:

run_multi_api_server(): New SIGTERM/SIGINT signal handler raising SystemExit, close() → shutdown(timeout=...)
run_engine_core(): Direct SystemExit signal handler replaced with SignalCallback thread + shutdown_state enum
EngineProcessManager: close() → shutdown() with _finalizer.detach() semantics

The [4] variant is more sensitive to timing because it launches 4 API servers per DP rank. During this heavier initialization, EngineCore_DP1's worker fails with BrokenPipeError because the parent process exits before the worker finishes initializing.

Fix Action

Fixed

Fixed by PR: [Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270) (https://github.com/vllm-project/vllm/pull/36628)
Fixed by PR: [DO NOT MERGE] Test revert "Remove busy loop from idle buffer readers" (#28053 and #36068) (https://github.com/vllm-project/vllm/pull/36619)
Fixed by PR: [DO NOT MERGE] Reapply "[BugFix] Fix engine hanging after KV cache initialization failure #35478" (https://github.com/vllm-project/vllm/pull/36650)
Fixed by PR: [DO NOT MERGE][Core] Revert "Fix benign error log during normal shutdown (#36270)" (https://github.com/vllm-project/vllm/pull/36646)
Fixed by PR: [Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish (https://github.com/vllm-project/vllm/pull/36666)

PR fix notes

PR #36628: [Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270)

Repository: vllm-project/vllm
Author: markmc
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/36628

Description (problem / solution / changelog)

Distributed Test 4 GPUs is still failing. Testing whether these reverts fix it

Fixes #36624

Changed files

tests/entrypoints/openai/test_shutdown.py (modified, +0/-459)
tests/entrypoints/test_api_server_process_manager.py (modified, +7/-15)
vllm/config/vllm.py (modified, +0/-6)
vllm/engine/arg_utils.py (modified, +0/-11)
vllm/engine/protocol.py (modified, +0/-5)
vllm/entrypoints/cli/serve.py (modified, +6/-42)
vllm/entrypoints/launcher.py (modified, +5/-23)
vllm/v1/engine/__init__.py (modified, +0/-2)
vllm/v1/engine/async_llm.py (modified, +3/-2)
vllm/v1/engine/coordinator.py (modified, +2/-4)
vllm/v1/engine/core.py (modified, +43/-127)
vllm/v1/engine/core_client.py (modified, +12/-12)
vllm/v1/engine/utils.py (modified, +5/-34)
vllm/v1/utils.py (modified, +12/-19)

PR #36619: [DO NOT MERGE] Test revert "Remove busy loop from idle buffer readers" (#28053 and #36068)

Repository: vllm-project/vllm
Author: markmc
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/36619

Description (problem / solution / changelog)

Distributed Test 4 GPUs is still failing. Testing whether these reverts fix it

Changed files

tests/basic_correctness/test_basic_correctness.py (modified, +2/-0)
tests/distributed/test_shm_broadcast.py (modified, +8/-285)
vllm/distributed/device_communicators/shm_broadcast.py (modified, +64/-194)
vllm/envs.py (modified, +5/-0)
vllm/v1/executor/multiproc_executor.py (modified, +64/-109)

PR #36650: [DO NOT MERGE] Reapply "[BugFix] Fix engine hanging after KV cache initialization failure #35478"

Repository: vllm-project/vllm
Author: markmc
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/36650

Description (problem / solution / changelog)

Distributed Test 4 GPUs is still failing. Testing whether this reverts fixes it

Fixes https://github.com/vllm-project/vllm/issues/36624

See https://github.com/vllm-project/vllm/pull/36628#issuecomment-4030961404

The series of relevant PRs are:

[Core] Remove busy loop from idle buffer readers (#28053)
- Passed before merging - https://buildkite.com/vllm/ci/builds/54275
.... (omitting a bunch here)
[BugFix] Fix engine hanging after KV cache initialization failure (#35478)
[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish (#34730)
- Failed - https://buildkite.com/vllm/ci/builds/54863
[Bugfix] Fix inner_dp_world initialization order for multi-node TP (#35892)
- Failed - https://buildkite.com/vllm/ci/builds/54864
[Bugfix] Quickfix followups to busy loop removal in #28053 (#36068)
- Passed (on PR branch!) (triggered today) - https://buildkite.com/vllm/ci/builds/54834
- Failed (on main) (triggered today) - https://buildkite.com/vllm/ci/builds/54959
Revert "[BugFix] Fix engine hanging after KV cache initialization fai… (#36262)
- Not tested on PR branch
- Failed (on main) (triggered after it merged?) - https://buildkite.com/vllm/ci/builds/54961
[Core] Fix benign error log during normal shutdown (#36270)
- Failed (on PR branch) (triggered today) - https://buildkite.com/vllm/ci/builds/55015
- Not tested (on main) (just triggered now) - https://buildkite.com/vllm/ci/builds/55026

So it appears that reverting #36262 might be sufficient

Changed files

vllm/v1/engine/core.py (modified, +55/-25)
vllm/v1/engine/utils.py (modified, +5/-0)

PR #36646: [DO NOT MERGE][Core] Revert "Fix benign error log during normal shutdown (#36270)"

Repository: vllm-project/vllm
Author: markmc
State: closed | merged: False
Link: https://github.com/vllm-project/vllm/pull/36646

Description (problem / solution / changelog)

Possible simpler alternative to #36628 fixing #36624

Changed files

vllm/v1/engine/core_client.py (modified, +9/-6)

PR #36666: [Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish

Repository: vllm-project/vllm
Author: markmc
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/36666

Description (problem / solution / changelog)

Re-apply #34730 and #36270 which were reverted by #36628 because of test_external_lb_dp.py failures in Distributed Tests (4 GPUs)

Relates to #24885 and supersedes a subset of #32420

🤖 Co-authored with Claude Code Co-authored-by: @njhill

Overview

This PR adds support for a shutdown timeout in the API server and engine core, allowing controlled handling of a SIGTERM signal - either immediate termination by default (--shutdown-timeout=0) or wait for in-flight requests to finish (--shutdown-timeout=SECONDS).

Background

Previously, SIGTERM to the API server or multi-API server launcher would cause immediate process termination without coordinating request handling. This PR addresses that by:

Rejecting new requests when shutdown is initiated (not queuing them like pause does)
Handling in-flight requests according to --shutdown-timeout:
- Default, no timeout - abort all in-flight requests immediately
- With timeout - Wait for in-flight requests to complete (up to --shutdown-timeout seconds)
Using a separate shutdown state machine in the engine core to manage the shutdown lifecycle

Scope and Constraints

This PR is intentionally scoped to the API server use case with SIGTERM handling. The following are explicitly out of scope:

KV transfers: unlike #32420, prefill instances are not (yet) waiting for KV transfers to complete before shutting down
Library use case: No changes to direct LLM or AsyncLLM usage (though shutdown() methods are added for future use)
ScalingMiddleware reuse: Request rejection happens at engine core level, not middleware
Signal handling redesign: No new shutdown pipe mechanism; minimal changes to existing handlers
Ctrl-C behavior: Process groups and Ctrl-C handling remain unchanged; only SIGTERM is handled
/health endpoint changes: No new /live endpoint or changes to health check behavior
External/hybrid load balancer validation: Assumes parallel SIGTERM to multiple API servers works (untested, similar to pause behavior)

Implementation Approach

Core Mechanism: EngineShutdownState State Machine

The implementation uses a separate EngineShutdownState enum to manage shutdown lifecycle independently from pause state:

class EngineShutdownState(IntEnum):
    RUNNING = 0
    REQUESTED = 1
    SHUTTING_DOWN = 2

Why a separate state machine:

Shutdown has different semantics than pause (permanent vs temporary)
Shutdown needs request rejection, pause needs request queuing
Different lifecycle: pause can be resumed, shutdown cannot
Cleaner separation of concerns

How it works:

SIGTERM signal handler sets engine_core.shutdown_state = EngineShutdownState.REQUESTED
Engine core's _handle_shutdown() method checks state each iteration:
- RUNNING: Continue normal operation
- REQUESTED: Abort/wait based on config, transition to SHUTTING_DOWN, reject new requests
- SHUTTING_DOWN: Exit when no work remaining
Engine core's _handle_client_request() calls _reject_add_in_shutdown() and _reject_utility_in_shutdown() which check shutdown state and reject new work
Engine core's run_busy_loop() exits when _handle_shutdown() returns False

Idle Engine Wake-up

When the engine is idle and blocked on input_queue.get(), it cannot detect shutdown state changes. To deal with this, we:

Add WAKEUP sentinel to EngineCoreRequestType for waking idle engine
Add a new signal-context-safe SignalCallback to allow triggering a callback from a signal handler
Add a signal callback that pushes WAKEUP to input_queue
Trigger the signal callback from the signal handler
Main loop wakes from blocking get(), checks shutdown_state, proceeds

Shutdown Flow

Single API Server:

SIGTERM → signal_handler() sets shutdown_event
       → _handle_shutdown() coroutine wakes up
       → Retrieves timeout from vllm_config.shutdown_config.wait_timeout
       → Calls engine_client.shutdown(timeout=timeout)
       → Shutdown propagates through client hierarchy:
          - AsyncLLM.shutdown() → EngineCoreClient.shutdown()
          - MPClient detaches finalizer, calls engine_manager.shutdown()
          - CoreEngineProcManager.shutdown() terminates/waits for processes
       → Then cancels server tasks and stops SSL refresher

Multi-API Server Launcher:

SIGTERM → signal_handler() sets shutdown_requested flag, raises SystemExit
       → finally block executes
       → Computes shared deadline from configured wait_timeout
       → Calls shutdown(deadline=deadline) on all managers:
          - api_server_manager.shutdown(deadline=deadline)
          - local_engine_manager.shutdown(deadline=deadline) if present
          - coordinator.shutdown(deadline=deadline) if present
       → Each manager terminates/waits for processes within deadline

Headless Mode:

SIGTERM → signal_handler() sets shutdown_requested flag, raises SystemExit
       → finally block executes
       → Retrieves timeout from vllm_config.shutdown_config.wait_timeout
       → Calls engine_manager.shutdown(timeout=timeout)
       → Process manager terminates/waits for processes with timeout

Changed files

Code Example

EngineCore_DP1: WorkerProc initialization failed due to an exception in a background process.
Worker_DP0_TP0: Parent process exited, terminating worker queues
Worker_DP0_TP0: BrokenPipeError: [Errno 32] Broken pipe
Failed to start server rank 0: Server exited unexpectedly
Exception: Servers failed to start

RAW_BUFFERClick to expand / collapse

Description

test_external_lb_dp.py with api_server_count=4 has been failing consistently in every nightly/daily build since PR #34730 (27066d1b) landed on Mar 6.

The [1] variant (1 API server per DP rank) passes; only [4] (4 API servers per DP rank) fails.

Affected tests

test_external_lb_server_info[4]
test_external_lb_single_completion[4-ibm-research/PowerMoE-3b]
test_external_lb_completion_streaming[4-ibm-research/PowerMoE-3b]

Error

EngineCore_DP1: WorkerProc initialization failed due to an exception in a background process.
Worker_DP0_TP0: Parent process exited, terminating worker queues
Worker_DP0_TP0: BrokenPipeError: [Errno 32] Broken pipe
Failed to start server rank 0: Server exited unexpectedly
Exception: Servers failed to start

Bisection

Last passing build: #54783 (daily Mar 5, commit a97954b6)
First failing build: #54882 (nightly Mar 6, commit 5afb387b)
Culprit: PR #34730 (27066d1b) — "[Frontend][Core] Add shutdown timeout - allowing in-flight requests to finish"
Failing in every nightly/daily since: #54882, #55011, #55067, #55110, #55120, #55158, #55214, #55354, #55437

Analysis

PR #34730 changes the server startup/shutdown chain:

run_multi_api_server(): New SIGTERM/SIGINT signal handler raising SystemExit, close() → shutdown(timeout=...)
run_engine_core(): Direct SystemExit signal handler replaced with SignalCallback thread + shutdown_state enum
EngineProcessManager: close() → shutdown() with _finalizer.detach() semantics

Note

This was already flagged on Mar 6 by @markmc:

Watch out for shutdown related regressions that may have been caused by 34730

Failing build examples

cc @markmc

extent analysis

Fix Plan

To address the issue, we need to adjust the shutdown timeout to allow for the heavier initialization of 4 API servers per DP rank. Here are the steps:

Increase the shutdown timeout in run_multi_api_server() to give the API servers enough time to initialize.
Modify the SignalCallback thread in run_engine_core() to wait for the worker initialization to complete before shutting down.
Update EngineProcessManager to use a longer timeout when calling shutdown() to ensure that the worker processes have enough time to finish initializing.

Example code changes:

# In run_multi_api_server()
shutdown(timeout=30)  # Increase timeout to 30 seconds

# In run_engine_core()
signal_callback = SignalCallback()
signal_callback.wait_for_worker_init()  # Wait for worker initialization to complete

# In EngineProcessManager
def shutdown(self):
    self._finalizer.detach()
    self._shutdown_timeout = 30  # Increase shutdown timeout to 30 seconds
    self._shutdown()

Verification

To verify that the fix worked, run the failing tests (test_external_lb_server_info[4], test_external_lb_single_completion[4-ibm-research/PowerMoE-3b], and test_external_lb_completion_streaming[4-ibm-research/PowerMoE-3b]) and check that they pass without errors.

Extra Tips

Monitor the tests for any shutdown-related regressions and adjust the timeouts as needed.
Consider adding logging to track the initialization time of the API servers and workers to help identify any future issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug] External LB test_external_lb_dp[4] failing since shutdown timeout PR #34730 [5 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #36628: [Frontend][Core] Revert "Add shutdown timeout" (#34730 and #36270)

Description (problem / solution / changelog)

Changed files

PR #36619: [DO NOT MERGE] Test revert "Remove busy loop from idle buffer readers" (#28053 and #36068)

Description (problem / solution / changelog)

Changed files

PR #36650: [DO NOT MERGE] Reapply "[BugFix] Fix engine hanging after KV cache initialization failure #35478"

Description (problem / solution / changelog)

Changed files

PR #36646: [DO NOT MERGE][Core] Revert "Fix benign error log during normal shutdown (#36270)"

Description (problem / solution / changelog)

Changed files

PR #36666: [Frontend][Core] Re-add shutdown timeout - allowing in-flight requests to finish

Description (problem / solution / changelog)

Overview

Background

Scope and Constraints

Implementation Approach

Core Mechanism: EngineShutdownState State Machine

Idle Engine Wake-up

Shutdown Flow

Changed files

Code Example

Description

Affected tests

Error

Bisection

Analysis

Note

Failing build examples

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING