vllm - 💡(How to fix) Fix [Bug] WSL2: Ctrl+C shows shutdown complete but requires pkill -9 (lingering processes) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39093Fetched 2026-04-08 03:02:01
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Timeline (top)
commented ×1renamed ×1

On WSL2, pressing Ctrl+C triggers a seemingly clean shutdown (EngineCore reports "Shutdown complete" and FastAPI reports "Application shutdown complete"), but vLLM-related processes sometimes remain alive afterwards. To fully stop vLLM and return to a clean state, I have to manually run pkill -9 on the remaining processes.

This suggests SIGINT shutdown is not reliably tearing down all child/worker/distributed processes on WSL2, even though the high-level shutdown logs indicate success.

Root Cause

On WSL2, pressing Ctrl+C triggers a seemingly clean shutdown (EngineCore reports "Shutdown complete" and FastAPI reports "Application shutdown complete"), but vLLM-related processes sometimes remain alive afterwards. To fully stop vLLM and return to a clean state, I have to manually run pkill -9 on the remaining processes.

This suggests SIGINT shutdown is not reliably tearing down all child/worker/distributed processes on WSL2, even though the high-level shutdown logs indicate success.

Code Example

(APIServer pid=13250) INFO:     Started server process [13250]
(APIServer pid=13250) INFO:     Waiting for application startup.
(APIServer pid=13250) INFO:     Application startup complete.

^C(Worker_TP1 pid=15694) WARNING 04-06 17:17:54 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP0 pid=15692) WARNING 04-06 17:17:54 [multiproc_executor.py:871] WorkerProc was terminated
(EngineCore pid=15405) INFO 04-06 17:17:54 [core.py:1210] Shutdown initiated (timeout=0)
(EngineCore pid=15405) INFO 04-06 17:17:54 [core.py:1233] Shutdown complete
(APIServer pid=13250) INFO:     Shutting down
(APIServer pid=13250) INFO 04-06 17:17:54 [launcher.py:137] Shutting down FastAPI HTTP server.
(APIServer pid=13250) INFO:     Waiting for application shutdown.
(APIServer pid=13250) INFO:     Application shutdown complete.

(vllm-qwen3-5-27b) user@host:~/vLLM$ [rank1]:[W406 17:20:09.946156099 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=22, addr=[localhost]:59102, remote=[localhost]:45095): Failed to recv, got 0 bytes
RAW_BUFFERClick to expand / collapse

Environment

  • OS: Windows 11 with WSL2
  • WSL kernel: linux 6.6.87.2-microsoft-standard-WSL2
  • vLLM version: 0.19.0 (from logs)
  • Model: Qwen/Qwen3.5-27B-FP8
  • Python: 3.12 (virtualenv)
  • NCCL: nccl==2.27.5 (from logs)
  • Launch method: bash script (start_vllm_FP8.sh) running vLLM OpenAI-compatible API server

Description

On WSL2, pressing Ctrl+C triggers a seemingly clean shutdown (EngineCore reports "Shutdown complete" and FastAPI reports "Application shutdown complete"), but vLLM-related processes sometimes remain alive afterwards. To fully stop vLLM and return to a clean state, I have to manually run pkill -9 on the remaining processes.

This suggests SIGINT shutdown is not reliably tearing down all child/worker/distributed processes on WSL2, even though the high-level shutdown logs indicate success.

Steps to Reproduce

  1. Start vLLM API server on WSL2 (example model: Qwen/Qwen3.5-27B-FP8).
  2. Wait for startup to complete (FastAPI prints Application startup complete).
  3. Press Ctrl+C in the same terminal.
  4. Observe shutdown logs claim completion.
  5. Check running processes — some vLLM-related processes remain; I then manually kill them with pkill -9.

Expected Behavior

After Ctrl+C and logs indicating shutdown completion, the main server and all worker/child processes should exit; no manual pkill -9 should be required.

Actual Behavior

Shutdown logs look clean, but processes sometimes linger and I need to manually pkill -9.

Relevant log excerpt

(APIServer pid=13250) INFO:     Started server process [13250]
(APIServer pid=13250) INFO:     Waiting for application startup.
(APIServer pid=13250) INFO:     Application startup complete.

^C(Worker_TP1 pid=15694) WARNING 04-06 17:17:54 [multiproc_executor.py:871] WorkerProc was terminated
(Worker_TP0 pid=15692) WARNING 04-06 17:17:54 [multiproc_executor.py:871] WorkerProc was terminated
(EngineCore pid=15405) INFO 04-06 17:17:54 [core.py:1210] Shutdown initiated (timeout=0)
(EngineCore pid=15405) INFO 04-06 17:17:54 [core.py:1233] Shutdown complete
(APIServer pid=13250) INFO:     Shutting down
(APIServer pid=13250) INFO 04-06 17:17:54 [launcher.py:137] Shutting down FastAPI HTTP server.
(APIServer pid=13250) INFO:     Waiting for application shutdown.
(APIServer pid=13250) INFO:     Application shutdown complete.

(vllm-qwen3-5-27b) user@host:~/vLLM$ [rank1]:[W406 17:20:09.946156099 TCPStore.cpp:125] [c10d] recvValue failed on SocketImpl(fd=22, addr=[localhost]:59102, remote=[localhost]:45095): Failed to recv, got 0 bytes

Questions

  • Is WSL2 a known problematic environment for signal/shutdown handling with multiprocess + distributed components?
  • Is there a recommended clean shutdown procedure on WSL2 (SIGTERM vs SIGINT, endpoint, flags)?
  • Can the shutdown path be hardened so that once Application shutdown complete is logged, all vLLM-related processes are guaranteed to exit?

extent analysis

TL;DR

The issue can be mitigated by using a more robust shutdown mechanism, such as sending a SIGTERM signal instead of SIGINT, and implementing a wait mechanism to ensure all child processes exit before considering the shutdown complete.

Guidance

  • Investigate using SIGTERM instead of SIGINT for shutdown, as it may provide a cleaner exit for child processes.
  • Implement a wait mechanism in the shutdown code to ensure all child processes exit before logging "Application shutdown complete".
  • Review the multiprocess and distributed component handling code to identify potential issues with signal propagation and process termination.
  • Consider adding a timeout and retry mechanism to the shutdown procedure to handle cases where child processes do not exit promptly.

Example

No code example is provided due to the complexity of the issue and the lack of specific code snippets in the problem description.

Notes

The issue may be related to the specific environment (WSL2) and the interaction between the signal handling and multiprocess components. Further investigation is needed to determine the root cause and develop a reliable solution.

Recommendation

Apply a workaround by modifying the shutdown script to send a SIGTERM signal and implement a wait mechanism to ensure all child processes exit before considering the shutdown complete. This approach may provide a more reliable shutdown procedure, but further testing and investigation are needed to confirm its effectiveness.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING