vllm - 💡(How to fix) Fix [Bug]: V1 Scheduler hard-fails on stale req_id in `_update_after_schedule` (defensive guard missing) [1 pull requests]

vllm2026-05-09 10:22:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

File "vllm/v1/core/sched/scheduler.py", line 990, in _update_after_schedule
    request = self.requests[req_id]
              ~~~~~~~~~~~~~^^^^^^^^
KeyError: 'chatcmpl-8a6f39cb01952274'

Fix Action

Fixed

Fixed by PR: [Bugfix][V1] Defensive guard for stale req_id in _update_after_schedule (https://github.com/vllm-project/vllm/pull/42158)

Code Example

File "vllm/v1/core/sched/scheduler.py", line 990, in _update_after_schedule
    request = self.requests[req_id]
              ~~~~~~~~~~~~~^^^^^^^^
KeyError: 'chatcmpl-8a6f39cb01952274'

RAW_BUFFERClick to expand / collapse

Environment

vllm 0.19.1 (verified via vllm-omni's vendored copy)
Affected: v0.19.1, v0.20.0, v0.20.1, v0.20.2, main HEAD (sha 530d3713 as of 2026-05-09)
Model: MiniCPM-o-4.5 (multi-stage omni — Thinker / Talker / Code2Wav via vllm-omni)

Bug

V1 scheduler's _update_after_schedule() does an unchecked self.requests[req_id] lookup and hard-fails the engine core process with KeyError when the request was concurrently finished/aborted between schedule build and post-schedule update. This kills every active client connection.

This issue is about the hard-fail (defensive guard missing), not the underlying race itself. The race condition (abort/finish ordering) is a separate concern; see #26400 for one related thread.

Stack trace

File "vllm/v1/core/sched/scheduler.py", line 990, in _update_after_schedule
    request = self.requests[req_id]
              ~~~~~~~~~~~~~^^^^^^^^
KeyError: 'chatcmpl-8a6f39cb01952274'

Existing inconsistency in the same file

scheduler.py already uses the defensive .get(req_id) pattern in many places:

✅ self.requests.get(req_id) at lines 1235, 1296, 1606, 1633, 1715
❌ self.requests[req_id] (unchecked) at lines 944 (this bug), 1446, 2049, 2054

So the defensive pattern is already an established convention in this file — _update_after_schedule() just isn't following it.

Reproducer

vllm-omni multi-stage setup (backend=vllm_omni, MiniCPM-o-Demo). Realtime WebSocket voice conversation, then either:

start a new turn before the previous response finishes, or
disconnect mid-stream and immediately reconnect.

finish_requests() synchronously removes the req from self.requests (via _free_blocks() and del self.requests[...] around lines 1755 / 1836). The _update_after_schedule() of the same step then hits the stale id and crashes.

Proposed fix (defensive guard)

Use self.requests.get(req_id) and skip + debug-log on None. This does not fix the race — it only prevents the engine from hard-failing when the race occurs. PR coming.

#26400 (closed) — abort/finish ordering at the engine loop level. Different layer, doesn't prevent this KeyError.
#25991 (open) — V1 KeyError on concurrent embedding requests. Different surface (gpu_model_runner.py), same broader robustness pattern.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: V1 Scheduler hard-fails on stale req_id in `_update_after_schedule` (defensive guard missing) [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

Code Example

Environment

Bug

Stack trace

Existing inconsistency in the same file

Reproducer

Proposed fix (defensive guard)

Related

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]: V1 Scheduler hard-fails on stale req_id in `_update_after_schedule` (defensive guard missing) [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

Code Example

Environment

Bug

Stack trace

Existing inconsistency in the same file

Reproducer

Proposed fix (defensive guard)

Related

Still need to ship something?

RELATED_DISCOVERY

TRENDING