vllm - 💡(How to fix) Fix [Bug]: V1 Scheduler hard-fails on stale req_id in `_update_after_schedule` (defensive guard missing) [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

File "vllm/v1/core/sched/scheduler.py", line 990, in _update_after_schedule
    request = self.requests[req_id]
              ~~~~~~~~~~~~~^^^^^^^^
KeyError: 'chatcmpl-8a6f39cb01952274'

Fix Action

Fixed

Code Example

File "vllm/v1/core/sched/scheduler.py", line 990, in _update_after_schedule
    request = self.requests[req_id]
              ~~~~~~~~~~~~~^^^^^^^^
KeyError: 'chatcmpl-8a6f39cb01952274'
RAW_BUFFERClick to expand / collapse

Environment

  • vllm 0.19.1 (verified via vllm-omni's vendored copy)
  • Affected: v0.19.1, v0.20.0, v0.20.1, v0.20.2, main HEAD (sha 530d3713 as of 2026-05-09)
  • Model: MiniCPM-o-4.5 (multi-stage omni — Thinker / Talker / Code2Wav via vllm-omni)

Bug

V1 scheduler's _update_after_schedule() does an unchecked self.requests[req_id] lookup and hard-fails the engine core process with KeyError when the request was concurrently finished/aborted between schedule build and post-schedule update. This kills every active client connection.

This issue is about the hard-fail (defensive guard missing), not the underlying race itself. The race condition (abort/finish ordering) is a separate concern; see #26400 for one related thread.

Stack trace

File "vllm/v1/core/sched/scheduler.py", line 990, in _update_after_schedule
    request = self.requests[req_id]
              ~~~~~~~~~~~~~^^^^^^^^
KeyError: 'chatcmpl-8a6f39cb01952274'

Existing inconsistency in the same file

scheduler.py already uses the defensive .get(req_id) pattern in many places:

  • self.requests.get(req_id) at lines 1235, 1296, 1606, 1633, 1715
  • self.requests[req_id] (unchecked) at lines 944 (this bug), 1446, 2049, 2054

So the defensive pattern is already an established convention in this file — _update_after_schedule() just isn't following it.

Reproducer

vllm-omni multi-stage setup (backend=vllm_omni, MiniCPM-o-Demo). Realtime WebSocket voice conversation, then either:

  • start a new turn before the previous response finishes, or
  • disconnect mid-stream and immediately reconnect.

finish_requests() synchronously removes the req from self.requests (via _free_blocks() and del self.requests[...] around lines 1755 / 1836). The _update_after_schedule() of the same step then hits the stale id and crashes.

Proposed fix (defensive guard)

Use self.requests.get(req_id) and skip + debug-log on None. This does not fix the race — it only prevents the engine from hard-failing when the race occurs. PR coming.

Related

  • #26400 (closed) — abort/finish ordering at the engine loop level. Different layer, doesn't prevent this KeyError.
  • #25991 (open) — V1 KeyError on concurrent embedding requests. Different surface (gpu_model_runner.py), same broader robustness pattern.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: V1 Scheduler hard-fails on stale req_id in `_update_after_schedule` (defensive guard missing) [1 pull requests]