vllm - 💡(How to fix) Fix [Bug]: EngineCore crash — assert req_id in self.requests in _update_from_kv_xfer_finished when an async KV connector reports a finished transfer for an aborted/freed request

Code Example

# vllm/v1/core/sched/scheduler.py  (present on main as of 2026-05-20)
for req_id in kv_connector_output.finished_recving or ():
    logger.debug("Finished recving KV transfer for request %s", req_id)
    assert req_id in self.requests          # <-- crashes the EngineCore
    req = self.requests[req_id]
    ...
for req_id in kv_connector_output.finished_sending or ():
    logger.debug("Finished sending KV transfer for request %s", req_id)
    assert req_id in self.requests          # <-- same
    self._free_blocks(self.requests[req_id])

---

File ".../vllm/v1/core/sched/scheduler.py", line ~2124, in _update_from_kv_xfer_finished
    assert req_id in self.requests
AssertionError  ->  vllm.v1.engine.exceptions.EngineDeadError

---

# update_from_output (post-#33377)
request = self.requests.get(req_id)
if request is None or request.is_finished():
    # The request is already finished. This can happen if the
    # request is aborted while the model is executing it ...
    # NOTE: When delay_free_blocks=True (for async KV cache transfer
    # in KV connector) ...
    continue

---

for req_id in kv_connector_output.finished_recving or ():
    req = self.requests.get(req_id)
    if req is None:
        # Transfer completed after the request was aborted/freed
        # (async connector). Nothing to do.
        logger.debug("Ignoring finished_recving for unknown request %s", req_id)
        continue
    if req.status == RequestStatus.WAITING_FOR_REMOTE_KVS:
        self.finished_recving_kv_req_ids.add(req_id)
    elif RequestStatus.is_finished(req.status):
        self._free_blocks(req)

for req_id in kv_connector_output.finished_sending or ():
    req = self.requests.get(req_id)
    if req is None:
        logger.debug("Ignoring finished_sending for unknown request %s", req_id)
        continue
    self._free_blocks(req)

🐛 Describe the bug

Scheduler._update_from_kv_xfer_finished asserts that every connector-reported finished transfer still belongs to a live request:

# vllm/v1/core/sched/scheduler.py  (present on main as of 2026-05-20)
for req_id in kv_connector_output.finished_recving or ():
    logger.debug("Finished recving KV transfer for request %s", req_id)
    assert req_id in self.requests          # <-- crashes the EngineCore
    req = self.requests[req_id]
    ...
for req_id in kv_connector_output.finished_sending or ():
    logger.debug("Finished sending KV transfer for request %s", req_id)
    assert req_id in self.requests          # <-- same
    self._free_blocks(self.requests[req_id])

With an asynchronous KV connector (KV transfer completion is reported some steps after it is initiated), a request can be aborted/finished and freed by the scheduler while a store/recv transfer for it is still in flight. When that transfer later completes, the connector's get_finished() reports the now-unknown req_id, the assertion fails, and the entire EngineCore dies with EngineDeadError — taking down the server, not just the one request.

File ".../vllm/v1/core/sched/scheduler.py", line ~2124, in _update_from_kv_xfer_finished
    assert req_id in self.requests
AssertionError  ->  vllm.v1.engine.exceptions.EngineDeadError

This is the classic "abort races an in-flight async KV transfer" double-free/stale-id scenario.

Why #33377 does not fully fix this

PR #33377 ("avoid vllm-side double free during async scheduling + request abort + async KV cache transfer", merged 2026-02-03) addressed exactly this class of race — but only in update_from_output, which now tolerates already-gone requests:

# update_from_output (post-#33377)
request = self.requests.get(req_id)
if request is None or request.is_finished():
    # The request is already finished. This can happen if the
    # request is aborted while the model is executing it ...
    # NOTE: When delay_free_blocks=True (for async KV cache transfer
    # in KV connector) ...
    continue

The sibling path _update_from_kv_xfer_finished was left with the hard assert req_id in self.requests. So the same abort-vs-async-transfer race that #33377 made survivable in one function is still fatal in the function that consumes finished_recving / finished_sending. Confirmed: the assert is byte-identical and unguarded on v0.20.1, v0.21.0, and main.

(Note: #42831 / #42841 are a different trigger of the same assert — #42841 fixes MultiConnector mis-routing transfers to non-owning sub-connectors. That fix is connector-specific and does not cover a single async connector reporting a finished transfer for an already-freed request. This issue is the general scheduler-side gap.)

Proposed fix

Mirror #33377's existing tolerance inside _update_from_kv_xfer_finished: look the request up with .get() and skip unknown / already-finished ids instead of asserting. A late completion for an aborted request is benign — its blocks were already freed when it was aborted.

for req_id in kv_connector_output.finished_recving or ():
    req = self.requests.get(req_id)
    if req is None:
        # Transfer completed after the request was aborted/freed
        # (async connector). Nothing to do.
        logger.debug("Ignoring finished_recving for unknown request %s", req_id)
        continue
    if req.status == RequestStatus.WAITING_FOR_REMOTE_KVS:
        self.finished_recving_kv_req_ids.add(req_id)
    elif RequestStatus.is_finished(req.status):
        self._free_blocks(req)

for req_id in kv_connector_output.finished_sending or ():
    req = self.requests.get(req_id)
    if req is None:
        logger.debug("Ignoring finished_sending for unknown request %s", req_id)
        continue
    self._free_blocks(req)

This is connector-agnostic (fixes Nixl, LMCache, any async/disaggregated connector) and consistent with the precedent #33377 already set two functions away. Happy to send a PR.

Reproduction context

Observed in production with LMCacheMPConnector (LMCache multi-process mode) on GLM-5.1-FP8, TP=8, vLLM v0.20.1 — ~9.5h MTBF under real traffic (the crash needs an abort to coincide with an in-flight async KV transfer, so it's a low-rate tail event). The connector's get_finished() builds its finished sets from completing async futures without re-checking engine ownership, but per the analysis above the robust, connector-agnostic fix belongs in the scheduler.

Environment

vLLM v0.20.1 (assert also present unchanged on v0.21.0 and main). KV connector: async multi-process connector (LMCache LMCacheMPConnector). TP=8, CUDA 12.9.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: EngineCore crash — assert req_id in self.requests in _update_from_kv_xfer_finished when an async KV connector reports a finished transfer for an aborted/freed request

Recommended Tools

GitHub issue graph ai analysis

Error Message