vllm - 💡(How to fix) Fix AMD Development Roadmap (2026 Q2)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

AMD Development Roadmap (2026 Q2)

Draft for review — 2026-05-30. Target repo: vllm-project/vllm. Modeled on the SGLang AMD roadmap (sgl-project/sglang#23494). Aligns to the AMD vLLM roadmap deck (2026-05-29).

Contributions and feedback are welcome.

⚠️ Draft / work-in-progress. This roadmap is a draft and may still be updated. Items, owners, linked tickets, and dates are tentative and subject to change.

This is the ROCm counterpart to the overall vLLM Roadmap Q2 2026.

Legend: ✓ done · ▶ in progress · ○ planned. Each item links a tracked vLLM issue/PR.

Focus

  • Day-0 → Day-N model enablement: Frontier models (DeepSeek, Llama, Qwen, Kimi) running on ROCm at launch.
  • Close the perf gap: Decode parity vs SGLang / ATOM at concurrency 128/512, on MoE and MLA.
  • vLLM V1 engine: Adopt the V1 engine on ROCm — full-cudagraph, spec-decode, llm-d / P-D.
  • No regression: e2e accuracy + perf regression CI with nightly gating, answering the public "no perf/accuracy regression" ask (#43916).

Feature and Performance Improvements (Now · June)

  • Models PoC: @tjtanaa @ChuanLi1101 Goal: Land the frontier MoE models on ROCm with AITER acceleration.

    • ▶ AITER-accelerated MLA decode for DeepSeek-V4 on MI355X (FP4 e2e, TP=8) #40889
    • ▶ Wire Kimi-Linear FlyDSL gated delta rule decode kernel #40697
  • Performance PoC: @maeehart @frida-andersson Goal: Close the decode gap on MLA / DSv4 hot paths.

    • ▶ AITER MLA decode int64 source fix
    • ▶ DeepSeek-V4 CSA multistream decode overlap #43718
    • ▶ Skip head repeat_interleave for AITER MLA decode with BF16 KV cache #37353
  • CI & Quality PoC: @AndreasKaratzas Goal: Stand up e2e accuracy + perf regression gating on ROCm.

    • ✓ MoRI CI green
    • ▶ Add MI355 / MI300 into perf & accuracy regression testing & dashboard #43916 (plan below)
    • ▶ ROCm CI infrastructure improvements #34994 · CI failure tracker #40554
  • Platform & API PoC: @ChuanLi1101 @dllehr-amd Goal: Adopt the V1 engine and stand up llm-d gates on ROCm.

    • ▶ First stage of enabling torch stable on ROCm (V1 full-cudagraph) #39513
    • ▶ llm-d v0.8.0 well-lit gates on MI300 (MoRI · P/D · Wide-EP)
    • ▶ PD disaggregation recipes on vLLM #40421

e2e Accuracy & Perf Regression CI — Plan (#43916)

Directly answers the public "no perf/accuracy regression" ask.

  • Accuracy gate: lm-eval gsm8k + mmlu within abs tolerance vs reference; greedy token-match on fixed prompts/seeds; >5% drift = fail.
  • Perf gate: TPOT / TTFT / output tok-s vs rolling baseline; fail if regression > threshold (start 5%, tune).
  • Coverage (initial): DeepSeek-V3/V4, Llama-3.x, Qwen-3, Mixtral · BF16 / FP8 / FP4·NVFP4 · TP = 1/2/4/8 (representative subset) · MI300X + MI355X first, MI325X as capacity allows.
  • Reuse: seed from AITER accuracy gates + existing lm-eval-small + regression suites — not from scratch.
  • Baseline: captured from last green nightly, versioned + stored, refreshed on intentional perf changes.
  • Reporting: dashboard + alert on regression; weekly summary rolled into this roadmap.
  • Phases: (1) Jun wk1-2 smoke + baseline capture (non-blocking) → (2) Jun gate top models, block on regression → (3) end-Jun/Q3 public-facing no-regression coverage.

Milestones

  • Jun 1: Share roadmap + e2e regression-CI plan & baselines.
  • June: CI rollout in progress · llm-d MI300 gates · V1 cudagraph.

Q3 plan: see the 2026 Q3 roadmap. Dates are targets and will be refined.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix AMD Development Roadmap (2026 Q2)