vllm - 💡(How to fix) Fix AMD Development Roadmap (2026 Q2)

StepCodex · 2026-05-31T02:35:17Z

[vllm] AMD Development Roadmap 2026 Q2 Draft for review — 2026-05-30. Target repo: vllm-project/vllm. Modeled on the SGLang AMD roadmap sgl-project/sglang 2349… # AMD Development Roadmap (2026 Q2) *Draft for review — 2026-05-30. Target repo: vllm-project/vllm. Modeled on the SGLang AMD roadmap (sgl-project/sglang#23494). Aligns to the AMD vLLM roadmap deck (2026-05-29).* *Contributions and feedback are welcome.* > **⚠️ Draft / work-in-progress.** This roadmap is a draft and may still be updated. Items, owners, linked tickets, and dates are tentative and subject to change. This is the ROCm counterpart to the overall [vLLM Roadmap Q2 2026](https://github.com/vllm-project/vllm/issues/39749). Legend: ✓ done · ▶ in progress · ○ planned. Each item links a tracked vLLM issue/PR. ## Focus - **Day-0 → Day-N model enablement**: Frontier models (DeepSeek, Llama, Qwen, Kimi) running on ROCm at launch. - **Close the perf gap**: Decode parity vs SGLang / ATOM at concurrency 128/512, on MoE and MLA. - **vLLM V1 engine**: Adopt the V1 engine on ROCm — full-cudagraph, spec-decode, llm-d / P-D. - **No regression**: e2e accuracy + perf regression CI with nightly gating, answering the public "no perf/accuracy regression" ask (#43916). ## Feature and Performance Improvements (Now · June) - **Models** PoC: @tjtanaa @ChuanLi1101 Goal: Land the frontier MoE models on ROCm with AITER acceleration. - ▶ AITER-accelerated MLA decode for DeepSeek-V4 on MI355X (FP4 e2e, TP=8) #40889 - ▶ Wire Kimi-Linear FlyDSL gated delta rule decode kernel #40697 - **Performance** PoC: @maeehart @frida-andersson Goal: Close the decode gap on MLA / DSv4 hot paths. - ▶ AITER MLA decode int64 source fix - ▶ DeepSeek-V4 CSA multistream decode overlap #43718 - ▶ Skip head repeat_interleave for AITER MLA decode with BF16 KV cache #37353 - **CI & Quality** PoC: @AndreasKaratzas Goal: Stand up e2e accuracy + perf regression gating on ROCm. - ✓ MoRI CI green - ▶ Add MI355 / MI300 into perf & accuracy regression testing & dashboard #43916 (plan below) - ▶ ROCm CI infrastructure improvements #34994 · CI failure tracker #40554 - **Platform & API** PoC: @ChuanLi1101 @dllehr-amd Goal: Adopt the V1 engine and stand up llm-d gates on ROCm. - ▶ First stage of enabling torch stable on ROCm (V1 full-cudagraph) #39513 - ▶ llm-d v0.8.0 well-lit gates on MI300 (MoRI · P/D · Wide-EP) - ▶ PD disaggregation recipes on vLLM #40421 ## e2e Accuracy & Perf Regression CI — Plan (#43916) Directly answers the public "no perf/accuracy regression" ask. - **Accuracy gate**: lm-eval gsm8k + mmlu within abs tolerance vs reference; greedy token-match on fixed prompts/seeds; >5% drift = fail. - **Perf gate**: TPOT / TTFT / output tok-s vs rolling baseline; fail if regression > threshold (start 5%, tune). - **Coverage (initial)**: DeepSeek-V3/V4, Llama-3.x, Qwen-3, Mixtral · BF16 / FP8 / FP4·NVFP4 · TP = 1/2/4/8 (representative subset) · MI300X + MI355X first, MI325X as capacity allows. - **Reuse**: seed from AITER accuracy gates + existing lm-eval-small + regression suites — not from scratch. - **Baseline**: captured from last green nightly, versioned + stored, refreshed on intentional perf changes. - **Reporting**: dashboard + alert on regression; weekly summary rolled into this roadmap. - **Phases**: (1) Jun wk1-2 smoke + baseline capture (non-blocking) → (2) Jun gate top models, block on regression → (3) end-Jun/Q3 public-facing no-regression coverage. ## Milestones - **Jun 1**: Share roadmap + e2e regression-CI plan & baselines. - **June**: CI rollout in progress · llm-d MI300 gates · V1 cudagraph. ------ Q3 plan: see the [2026 Q3 roadmap](https://github.com/vllm-project/vllm/issues/44091). Dates are targets and will be refined.

vllm2026-05-31 02:35:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

AMD Development Roadmap (2026 Q2)

Draft for review — 2026-05-30. Target repo: vllm-project/vllm. Modeled on the SGLang AMD roadmap (sgl-project/sglang#23494). Aligns to the AMD vLLM roadmap deck (2026-05-29).

Contributions and feedback are welcome.

⚠️ Draft / work-in-progress. This roadmap is a draft and may still be updated. Items, owners, linked tickets, and dates are tentative and subject to change.

This is the ROCm counterpart to the overall vLLM Roadmap Q2 2026.

Legend: ✓ done · ▶ in progress · ○ planned. Each item links a tracked vLLM issue/PR.

Focus

Day-0 → Day-N model enablement: Frontier models (DeepSeek, Llama, Qwen, Kimi) running on ROCm at launch.
Close the perf gap: Decode parity vs SGLang / ATOM at concurrency 128/512, on MoE and MLA.
vLLM V1 engine: Adopt the V1 engine on ROCm — full-cudagraph, spec-decode, llm-d / P-D.
No regression: e2e accuracy + perf regression CI with nightly gating, answering the public "no perf/accuracy regression" ask (#43916).

Feature and Performance Improvements (Now · June)

Models PoC: @tjtanaa @ChuanLi1101 Goal: Land the frontier MoE models on ROCm with AITER acceleration.
- ▶ AITER-accelerated MLA decode for DeepSeek-V4 on MI355X (FP4 e2e, TP=8) #40889
- ▶ Wire Kimi-Linear FlyDSL gated delta rule decode kernel #40697
Performance PoC: @maeehart @frida-andersson Goal: Close the decode gap on MLA / DSv4 hot paths.
- ▶ AITER MLA decode int64 source fix
- ▶ DeepSeek-V4 CSA multistream decode overlap #43718
- ▶ Skip head repeat_interleave for AITER MLA decode with BF16 KV cache #37353
CI & Quality PoC: @AndreasKaratzas Goal: Stand up e2e accuracy + perf regression gating on ROCm.
- ✓ MoRI CI green
- ▶ Add MI355 / MI300 into perf & accuracy regression testing & dashboard #43916 (plan below)
- ▶ ROCm CI infrastructure improvements #34994 · CI failure tracker #40554
Platform & API PoC: @ChuanLi1101 @dllehr-amd Goal: Adopt the V1 engine and stand up llm-d gates on ROCm.
- ▶ First stage of enabling torch stable on ROCm (V1 full-cudagraph) #39513
- ▶ llm-d v0.8.0 well-lit gates on MI300 (MoRI · P/D · Wide-EP)
- ▶ PD disaggregation recipes on vLLM #40421

e2e Accuracy & Perf Regression CI — Plan (#43916)

Directly answers the public "no perf/accuracy regression" ask.

Accuracy gate: lm-eval gsm8k + mmlu within abs tolerance vs reference; greedy token-match on fixed prompts/seeds; >5% drift = fail.
Perf gate: TPOT / TTFT / output tok-s vs rolling baseline; fail if regression > threshold (start 5%, tune).
Coverage (initial): DeepSeek-V3/V4, Llama-3.x, Qwen-3, Mixtral · BF16 / FP8 / FP4·NVFP4 · TP = 1/2/4/8 (representative subset) · MI300X + MI355X first, MI325X as capacity allows.
Reuse: seed from AITER accuracy gates + existing lm-eval-small + regression suites — not from scratch.
Baseline: captured from last green nightly, versioned + stored, refreshed on intentional perf changes.
Reporting: dashboard + alert on regression; weekly summary rolled into this roadmap.
Phases: (1) Jun wk1-2 smoke + baseline capture (non-blocking) → (2) Jun gate top models, block on regression → (3) end-Jun/Q3 public-facing no-regression coverage.

Milestones

Jun 1: Share roadmap + e2e regression-CI plan & baselines.
June: CI rollout in progress · llm-d MI300 gates · V1 cudagraph.

Q3 plan: see the 2026 Q3 roadmap. Dates are targets and will be refined.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering