vllm - 💡(How to fix) Fix [Bug][Perf] MiniMax-M2.5 FP8 on MI325X — ~38% throughput regression between vLLM ROCm v0.18.0 and v0.21.0

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

Human

hi @hongxiayang

+viz @powderluv @chunfangamd @andyluo7

minimax is regression on mi325 a lot, can u take a look? thanks

Live: https://inferencex.semianalysis.com/inference?g_model=MiniMax-M2.5&g_rundate=2026-05-18&i_gpus=mi325x_vllm&i_dstart=2026-03-29&i_dend=2026-05-18&i_prec=fp8

<img width="974" height="559" alt="Image" src="https://github.com/user-attachments/assets/c7adda42-baf2-4eef-9382-b8571e982753" />

AI Summary

We're observing a large throughput regression for MiniMax-M2.5 (FP8) on AMD MI325X between vLLM ROCm v0.18.0 and v0.21.0.

DatevLLM ROCmToken Throughput / GPU @ 35 tok/s/userInferenceX run
2026-03-29v0.18.0~2,600 tok/s/gpu (peak)commit d43805f · Actions run 23700855432
2026-05-18v0.21.0~1,600 tok/s/gpucommit a0f295e · Actions run 26062952448

Net regression: ~−38% (≈1,000 tok/s/gpu absolute) from peak to current head.

The drop is not a single cliff at the v0.21.0 update — it's a smooth, monotone decline starting immediately after the v0.18.0 upgrade on 2026-03-29 and continuing through to the v0.21.0 image on 2026-05-18. This suggests one or more commits landed between v0.18.0 (bcf2be9, 2026-03-20) and v0.21.0 (ad7125a, 2026-05-15) that progressively degraded MoE/FP8 perf on gfx942 (MI325X).

Range to bisect: bcf2be9...ad7125a

Chart

Live: https://inferencex.semianalysis.com/inference?g_model=MiniMax-M2.5&g_rundate=2026-05-18&i_gpus=mi325x_vllm&i_dstart=2026-03-29&i_dend=2026-05-18&i_prec=fp8

<img width="974" height="559" alt="Image" src="https://github.com/user-attachments/assets/c7adda42-baf2-4eef-9382-b8571e982753" />

Full SemiAnalysis InferenceX benchmark configs for the two data points:

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING