vllm - ✅(Solved) Fix [Bug]: parity with CUDA & parity with rocm sglang: vLLM router doesn't current support MoRI kvcache connector [1 pull requests, 7 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38692Fetched 2026-04-08 02:23:30
View on GitHub
Comments
7
Participants
4
Timeline
36
Reactions
0
Timeline (top)
mentioned ×11subscribed ×11commented ×7labeled ×3

Fix Action

Fixed

PR fix notes

PR #948: [Draft, no merge] MVP for vLLM Disagg

Description (problem / solution / changelog)

We prototype the PD Disagg on DeepSeek. So far, we've done

  • Framework Integration: Successfully integrated vLLM’s Prefill/Decode disaggregated architecture into the InferenceX framework.
  • MoRI/Deep EP Integration on DPSK V3: for MoE All-to-All and for KV Cache transfer.
  • Confirmed MoRI-IO outperforms the standard NixlConnector in TTFT (Time to First Token).
  • Stability & Bug Fixes: * Resolved hang issues at high concurrency (CONC512) by fixing KV cache leaks and optimizing the "Reaper" logic for memory block release.
  • Fixed hardware compatibility issues, including PCI topology failures on Broadcom switches and error handling for mlx5 NICs.
  • Cluster Validation: Verified the multi-node deployment recipe on the SA-9N (mia1) cluster.

co-authors:

  • @ichbinblau
  • @ChuanLi1101
  • @billishyahao Thanks to @functionstackx

Changed files

  • .github/configs/amd-master.yaml (modified, +75/-0)
  • benchmarks/multi_node/dsr1_fp8_mi355x_vllm-disagg.sh (added, +79/-0)
  • benchmarks/multi_node/vllm_disagg_utils/bench.sh (added, +75/-0)
  • benchmarks/multi_node/vllm_disagg_utils/env.sh (added, +98/-0)
  • benchmarks/multi_node/vllm_disagg_utils/job.slurm (added, +358/-0)
  • benchmarks/multi_node/vllm_disagg_utils/models.yaml (added, +41/-0)
  • benchmarks/multi_node/vllm_disagg_utils/moriio_proxy.py (added, +326/-0)
  • benchmarks/multi_node/vllm_disagg_utils/server.sh (added, +490/-0)
  • benchmarks/multi_node/vllm_disagg_utils/setup_deps.sh (added, +848/-0)
  • benchmarks/multi_node/vllm_disagg_utils/start_etcd.sh (added, +47/-0)
  • benchmarks/multi_node/vllm_disagg_utils/submit.sh (added, +166/-0)
  • benchmarks/multi_node/vllm_disagg_utils/sync.py (added, +201/-0)
  • runners/launch_mi355x-amds.sh (modified, +12/-3)
  • utils/bench_serving/backend_request_func.py (modified, +104/-66)
  • utils/bench_serving/benchmark_serving.py (modified, +43/-15)
RAW_BUFFERClick to expand / collapse

Your current environment

all the nightly images in https://hub.docker.com/r/vllm/vllm-openai-rocm/tags as of April 1st, 2026

vllm/vllm-openai-rocm:v0.18.1

vllm/vllm-openai-rocm:v0.18.0

🐛 Describe the bug

hi @hongxiayang

+viz @powderluv @chunfangamd @andyluo7 @ChuanLi1101

vLLM router does not currently with with MoRI kvcache connector for ROCm disagg. Verus on the CUDA side, vLLM router works with the NVIDIA equivalent of MoRI kvcache connector aka NIXL.

On ROCm vLLM stack, users can only currently use RIXL (the second class fork of NIXL, RIXL isn't included out of the box in the docker image unfortunately) or use the MORIIO kvcache toy proxy server which is not prod ready.

On ROCm SGLang stack, the sglang equivalent of vLLM router is called sglang model gateway server is already supports MoRI kvcache transfer. Lets ensure that the ROCm vLLM experience is parity with ROCm SGlang experience https://github.com/sgl-project/sglang/pull/14626

Can you look into having vLLM router support MoRI kvcache transfer? thanks

the expected user experience is that the python wheel from upstream pypi and the docker image should support it https://hub.docker.com/r/vllm/vllm-router/tags

https://github.com/vllm-project/vllm/blob/main/examples/online_serving/disaggregated_serving/moriio_toy_proxy_server.py

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The vLLM router needs to be updated to support MoRI kvcache transfer for ROCm to achieve parity with the ROCm SGlang experience.

Guidance

  • Review the sglang model gateway server implementation to understand how MoRI kvcache transfer is supported, as seen in https://github.com/sgl-project/sglang/pull/14626.
  • Investigate modifying the vLLM router to include support for MoRI kvcache connector, potentially using the MORIIO kvcache toy proxy server as a reference.
  • Consider including RIXL in the docker image or providing clear instructions for users to set it up, as it is currently the only alternative to the MORIIO kvcache toy proxy server.
  • Verify that the python wheel from upstream pypi and the docker image support the updated vLLM router with MoRI kvcache transfer.

Notes

The current implementation of the vLLM router only supports RIXL or the MORIIO kvcache toy proxy server, which is not production-ready, limiting the user experience on ROCm.

Recommendation

Apply workaround: Modify the vLLM router to support MoRI kvcache transfer to achieve parity with the ROCm SGlang experience, as this will improve the user experience and provide a more robust solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING