vllm - 💡(How to fix) Fix [Bug][Perf] MiniMax-M2.5 FP8 on MI325X — ~38% throughput regression between vLLM ROCm v0.18.0 and v0.21.0

vllm2026-05-19 00:56:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

Human

hi @hongxiayang

+viz @powderluv @chunfangamd @andyluo7

minimax is regression on mi325 a lot, can u take a look? thanks

Live: https://inferencex.semianalysis.com/inference?g_model=MiniMax-M2.5&g_rundate=2026-05-18&i_gpus=mi325x_vllm&i_dstart=2026-03-29&i_dend=2026-05-18&i_prec=fp8

AI Summary

We're observing a large throughput regression for MiniMax-M2.5 (FP8) on AMD MI325X between vLLM ROCm v0.18.0 and v0.21.0.

Date	vLLM ROCm	Token Throughput / GPU @ 35 tok/s/user	InferenceX run
2026-03-29	v0.18.0	~2,600 tok/s/gpu (peak)	commit `d43805f` · Actions run 23700855432
2026-05-18	v0.21.0	~1,600 tok/s/gpu	commit `a0f295e` · Actions run 26062952448

Net regression: ~−38% (≈1,000 tok/s/gpu absolute) from peak to current head.

The drop is not a single cliff at the v0.21.0 update — it's a smooth, monotone decline starting immediately after the v0.18.0 upgrade on 2026-03-29 and continuing through to the v0.21.0 image on 2026-05-18. This suggests one or more commits landed between v0.18.0 (bcf2be9, 2026-03-20) and v0.21.0 (ad7125a, 2026-05-15) that progressively degraded MoE/FP8 perf on gfx942 (MI325X).

Range to bisect: bcf2be9...ad7125a

Chart

Live: https://inferencex.semianalysis.com/inference?g_model=MiniMax-M2.5&g_rundate=2026-05-18&i_gpus=mi325x_vllm&i_dstart=2026-03-29&i_dend=2026-05-18&i_prec=fp8

Full SemiAnalysis InferenceX benchmark configs for the two data points:

Baseline run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/23700855432 (commit d43805f)
Regressed run: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/26062952448 (commit a0f295e)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug][Perf] MiniMax-M2.5 FP8 on MI325X — ~38% throughput regression between vLLM ROCm v0.18.0 and v0.21.0

Recommended Tools

GitHub issue graph ai analysis

Human

AI Summary

Chart

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug][Perf] MiniMax-M2.5 FP8 on MI325X — ~38% throughput regression between vLLM ROCm v0.18.0 and v0.21.0

Recommended Tools

GitHub issue graph ai analysis

Human

AI Summary

Chart

Still need to ship something?

RELATED_DISCOVERY

TRENDING