vllm - 💡(How to fix) Fix [Bug][Perf Regression]: AMD MI355X Kimi K2.5/2.6 arch 38% perf regression [3 comments, 3 participants]

vllm2026-05-19 21:59:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#43153•Fetched 2026-05-20 03:39:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×7subscribed ×7commented ×3labeled ×2

RAW_BUFFERClick to expand / collapse

Your current environment

0.21

🐛 Describe the bug

Human

hi @hongxiayang

+viz @powderluv @chunfangamd @andyluo7

kimi k2.5 int4 is regressing on mi355 a lot, can u take a look? thanks

Live: https://inferencex.semianalysis.com/inference?g_model=Kimi-K2.5&g_rundate=2026-05-18&g_runid=26067557908&i_prec=int4&i_gpus=mi355x_vllm&i_dstart=2026-03-27&i_dend=2026-05-17&i_dates=2026-03-27&i_linelabel=1

AI Summary

We're observing a large throughput regression for Kimi K2.5 (INT4) on AMD MI355X between vLLM ROCm v0.18.0 and v0.21.0.

Date	vLLM ROCm	Token Throughput / GPU @ ~17 tok/s/user	InferenceX run
2026-03-27	v0.18.0	~1,223 tok/s/gpu (peak ~1,300)	Actions run 23626527425 · config 23669977901
2026-05-17	v0.21.0	~753 tok/s/gpu	Actions run 25956503845 · config 25984517560

Net regression: ~−38% (≈470 tok/s/gpu absolute) at mid-interactivity (~17 tok/s/user).

The regression is consistent across the full interactivity sweep — the 2026-05-17 curve sits below the 2026-03-27 baseline at every point and falls off faster as interactivity increases, suggesting the regression is not a constant offset but compounds with load. The only known config change between the two runs is the vLLM ROCm image upgrade from v0.18.0 → v0.21.0, pointing at one or more commits in that range that degraded INT4 perf on gfx950 (MI355X).

Range to bisect: v0.18.0...v0.21.0

Chart

Configuration

Field	Value
Model	Kimi K2.5
Precision	INT4
Hardware	MI355X
Engine	vLLM (ROCm)
ISL / OSL	8K / 1K
Source	SemiAnalysis InferenceX™

Full SemiAnalysis InferenceX benchmark configs for the two data points:

Baseline run (v0.18.0): https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23626527425/attempts/1 · config workflow https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23669977901
Regressed run (v0.21.0): https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25956503845/attempts/1 · config workflow https://github.com/SemiAnalysisAI/InferenceX/actions/runs/25984517560

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory optimization #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug][Perf Regression]: AMD MI355X Kimi K2.5/2.6 arch 38% perf regression [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Your current environment

🐛 Describe the bug

Human

AI Summary

Chart

Configuration

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug][Perf Regression]: AMD MI355X Kimi K2.5/2.6 arch 38% perf regression [3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Your current environment

🐛 Describe the bug

Human

AI Summary

Chart

Configuration

Before submitting a new issue...

Still need to ship something?

RELATED_DISCOVERY

TRENDING