vllm - 💡(How to fix) Fix [Bug][Perf Regression]: AMD MI355X Kimi K2.5/2.6 arch 38% perf regression [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#43153Fetched 2026-05-20 03:39:39
View on GitHub
Comments
3
Participants
3
Timeline
22
Reactions
0
Timeline (top)
mentioned ×7subscribed ×7commented ×3labeled ×2
RAW_BUFFERClick to expand / collapse

Your current environment

0.21

🐛 Describe the bug

Human

hi @hongxiayang

+viz @powderluv @chunfangamd @andyluo7

kimi k2.5 int4 is regressing on mi355 a lot, can u take a look? thanks

Live: https://inferencex.semianalysis.com/inference?g_model=Kimi-K2.5&g_rundate=2026-05-18&g_runid=26067557908&i_prec=int4&i_gpus=mi355x_vllm&i_dstart=2026-03-27&i_dend=2026-05-17&i_dates=2026-03-27&i_linelabel=1

<img width="1356" height="847" alt="Image" src="https://github.com/user-attachments/assets/ddca9781-9b1f-4266-b3e3-9f602c3583df" />

AI Summary

We're observing a large throughput regression for Kimi K2.5 (INT4) on AMD MI355X between vLLM ROCm v0.18.0 and v0.21.0.

DatevLLM ROCmToken Throughput / GPU @ ~17 tok/s/userInferenceX run
2026-03-27v0.18.0~1,223 tok/s/gpu (peak ~1,300)Actions run 23626527425 · config 23669977901
2026-05-17v0.21.0~753 tok/s/gpuActions run 25956503845 · config 25984517560

Net regression: ~−38% (≈470 tok/s/gpu absolute) at mid-interactivity (~17 tok/s/user).

The regression is consistent across the full interactivity sweep — the 2026-05-17 curve sits below the 2026-03-27 baseline at every point and falls off faster as interactivity increases, suggesting the regression is not a constant offset but compounds with load. The only known config change between the two runs is the vLLM ROCm image upgrade from v0.18.0 → v0.21.0, pointing at one or more commits in that range that degraded INT4 perf on gfx950 (MI355X).

Range to bisect: v0.18.0...v0.21.0

Chart

Live: https://inferencex.semianalysis.com/inference?g_model=Kimi-K2.5&g_rundate=2026-05-18&g_runid=26067557908&i_prec=int4&i_gpus=mi355x_vllm&i_dstart=2026-03-27&i_dend=2026-05-17&i_dates=2026-03-27&i_linelabel=1

<img width="1356" height="847" alt="Image" src="https://github.com/user-attachments/assets/e725f05f-9cf5-4866-a1e6-5a43a6b81d0d" />

Configuration

FieldValue
ModelKimi K2.5
PrecisionINT4
HardwareMI355X
EnginevLLM (ROCm)
ISL / OSL8K / 1K
SourceSemiAnalysis InferenceX™

Full SemiAnalysis InferenceX benchmark configs for the two data points:

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug][Perf Regression]: AMD MI355X Kimi K2.5/2.6 arch 38% perf regression [3 comments, 3 participants]