vllm - 💡(How to fix) Fix AMD Development Roadmap (2026 Q3)

vllm2026-05-31 02:35:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

AMD Development Roadmap (2026 Q3)

Draft for review — 2026-05-30. Target repo: vllm-project/vllm. Modeled on the SGLang AMD roadmap (sgl-project/sglang#23494). Aligns to the AMD vLLM roadmap deck (2026-05-29).

Contributions and feedback are welcome.

⚠️ Draft / work-in-progress. This roadmap is a draft and may still be updated. Items, owners, linked tickets, and dates are tentative and subject to change.

This is the ROCm counterpart to the overall vLLM Roadmap Q2 2026 and the DeepSeek V4 roadmap.

Legend: ✓ done · ▶ in progress · ○ planned. Each item links a tracked vLLM issue/PR.

Focus

Day-0 enablement template: A repeatable path so the next DeepSeek / Llama / Qwen drop lands on ROCm at launch.
Decode parity: Match SGLang / ATOM decode at concurrency 128/512 on MI355X.
vLLM V1 engine migration: Move ROCm backends onto the new engine API; harden disagg.
Public no-regression coverage: Promote the regression CI to a public-facing dashboard with auto-gating.

Feature and Performance Improvements (Next · Q3)

Models PoC: @tjtanaa @ChuanLi1101 @kliuae Goal: Full DSv4 release plus a repeatable day-0 path for the next frontier drop.
- ○ DeepSeek-V4 full release on ROCm — fix the bring-up gap reported in #42876 (v0.21.0 claims support but MI350X fails)
- ○ DSv4 FP8 base on MI300X
- ○ Stand up a day-0 enablement template (next DeepSeek / Llama / Qwen)
- ○ Kimi-Linear 64M-ctx upstream · enable >100M-ctx long-context path
- ○ Enable next VLM (Qwen-VL / Llama-4 multimodal)
Performance PoC: @maeehart @frida-andersson Goal: Reach decode parity and mature FP4 perf.
- ○ Decode parity @ conc 128/512 vs SGLang / ATOM on MI355X
- ○ FlyDSL MoE a8w4 replaces torch matmul_ogs
- ○ Add DSv4 decode perf to nightly vs SGLang
- ○ Tune NVFP4 / MXFP4 quant perf
- ○ Cross-node TP=16 + KV-offload
CI & Quality PoC: @AndreasKaratzas Goal: Public no-regression coverage with auto-gating.
- ○ AITER accuracy + regression CI
- ○ Public no-regression dashboard + alerts #43916
- ○ DI e2e test when NIC FW + nodes are ready
- ○ Per-model regression case added on each enablement
- ○ Auto-gate releases on perf SLA
Platform & API PoC: @ChuanLi1101 @dllehr-amd Goal: Migrate ROCm backends to the new engine API and harden disagg.
- ○ Validate NVFP4 / Quark upstream on MI300 / MI355
- ○ Pin llm-d + vLLM image for DI · enable llm-d MI355 gating
- ○ ROCm Kimi K2.5 disagg PD + wide-EP recipe #34781
- ○ vLLM router should have ROCm CI #38693
- ○ Harden KV-connector / disagg API · migrate ROCm backends to new engine API
- ○ MI450 / gfx1250 enablement and validation on vLLM (next-gen CDNA), gated on kernel + platform readiness

Milestones

Q3 (target): DSv4 full release · next frontier day-0 · long-ctx scaling · public no-regression dashboard.

TODO: WIP. Dates are targets and will be refined after upstream alignment.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering