vllm - ✅(Solved) Fix [Bug]: Triton MXFP4 MoE device capability check < (11, 0) breaks RDNA3.5 (gfx1151) support [1 pull requests, 3 comments, 3 participants]

kyuz0 · 2026-04-19T16:42:16Z

[vllm] PR 40333: ROCm Allow Triton MXFP4 MoE support checks on gfx11xx - Repository: vllm-project/vllm - Author: wangrui6 - State: open | merged: False - Link:… # PR #40333: [ROCm] Allow Triton MXFP4 MoE support checks on gfx11xx - Repository: vllm-project/vllm - Author: wangrui6 - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/40333 ## Description (problem / solution / changelog) Add a shared helper for Triton MXFP4 device support checks and use it in the GPT-OSS Triton MXFP4 experts support paths. This keeps the CUDA < 11.0 guard unchanged, while allowing ROCm gfx11xx devices (for example gfx1151 / RDNA3.5) to pass the support check. Also add unit coverage for the platform-specific capability ceilings. Tested with: - .venv/bin/python -m pytest tests/kernels/moe/test_mxfp4_support.py -v AI assistance was used to prepare this change, and I reviewed the final diff. Co-authored-by: OpenAI Codex ## Purpose This PR fixes the Triton MXFP4 MoE support check for ROCm gfx11xx devices such as gfx1151 / RDNA3.5. Changes: - add a shared helper for Triton MXFP4 device support checks - keep the existing CUDA `< 11.0` guard unchanged - allow ROCm gfx11xx devices through the Triton MXFP4 support path - add unit coverage for the platform-specific capability ceilings ### Why this is not duplicate work I checked issue #40301 and searched for open PRs referencing `#40301` and related terms before opening this PR. I did not find an open PR covering this specific ROCm gfx11xx support-check fix. ## Test Plan Ran: - `.venv/bin/python -m pytest tests/kernels/moe/test_mxfp4_support.py -v` ## Test Result Result: - `3 passed` ## AI assistance This is an AI-assisted contribution. I reviewed the final diff and verified the test command above before submitting. --- Essential Elements of an Effective PR Description Checklist - [x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [x] The test plan, such as providing test command. - [x] The test results, such as pasting the results comparison before and after, or e2e results - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. ## Changed files - `tests/kernels/moe/test_mxfp4_support.py` (added, +70/-0) - `vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py` (modified, +6/-19) - `vllm/model_executor/layers/fused_moe/utils.py` (modified, +25/-0) ### Your current environment ### System Info **OS:** Linux (e.g., Fedora 43) **Hardware:** AMD Strix Halo APU (gfx1151 / RDNA 3.5) **vLLM version:** `v0.19.2` (and recent nightlies/main) **Model:** `openai/gpt-oss-20b` (or any `gpt_oss_mxfp4` quantized MoE model) ### 🐛 Describe the bug I am trying to run vLLM on an AMD Strix Halo (gfx1151) using ROCm. The environment is properly configured to compile Triton kernels. Previously, `gpt-oss-20b` (which initializes using `gpt_oss_mxfp4` quantization) worked perfectly fine and used the Triton MXFP4 MoE backend as expected. However, a recent update explicitly bounded the `device_capability` checks for the Triton MoE kernels to ` bool: ... return (9, 0) <= (cap.major, cap.minor) < (11, 0) ``` * In `vllm/model_executor/layers/fused_moe/oracle/mxfp4.py`: ```python triton_kernels_supported = has_triton_kernels() and ( 9, 0, ) <= current_platform.get_device_capability() < (11, 0) ``` Because vLLM maps `gfx1151` to a device capability of `(11, 5)`, the `< (11, 0)` check completely fails for the entire RDNA3/RDNA3.5 family. As a result, the backend oracle drops the Triton kernels, cannot find any other fallback MXFP4 backends for ROCm, and crashes with: ``` NotImplementedError: No MXFP4 MoE backend supports the deployment configuration. ``` Could this check please be widened to `(9, 0) <= cap < (12, 0)` to allow RDNA3 architectures? Or was there a specific hardware-level bug on Blackwell/future architectures that necessitated this hard `< (11,0)` roof? ### Before submitting a new issue... - [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

vllm2026-04-19 16:42:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40301•Fetched 2026-04-20 11:59:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×4subscribed ×4commented ×3labeled ×2

Error Message

NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.

Root Cause

Because vLLM maps gfx1151 to a device capability of (11, 5), the < (11, 0) check completely fails for the entire RDNA3/RDNA3.5 family. As a result, the backend oracle drops the Triton kernels, cannot find any other fallback MXFP4 backends for ROCm, and crashes with:

PR fix notes

PR #40333: [ROCm] Allow Triton MXFP4 MoE support checks on gfx11xx

Repository: vllm-project/vllm
Author: wangrui6
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40333

Description (problem / solution / changelog)

Add a shared helper for Triton MXFP4 device support checks and use it in the GPT-OSS Triton MXFP4 experts support paths.

This keeps the CUDA < 11.0 guard unchanged, while allowing ROCm gfx11xx devices (for example gfx1151 / RDNA3.5) to pass the support check.

Also add unit coverage for the platform-specific capability ceilings.

Tested with:

.venv/bin/python -m pytest tests/kernels/moe/test_mxfp4_support.py -v

AI assistance was used to prepare this change, and I reviewed the final diff.

Co-authored-by: OpenAI Codex

Purpose

This PR fixes the Triton MXFP4 MoE support check for ROCm gfx11xx devices such as gfx1151 / RDNA3.5. Changes:

add a shared helper for Triton MXFP4 device support checks
keep the existing CUDA < 11.0 guard unchanged
allow ROCm gfx11xx devices through the Triton MXFP4 support path
add unit coverage for the platform-specific capability ceilings

Why this is not duplicate work

I checked issue #40301 and searched for open PRs referencing #40301 and related terms before opening this PR. I did not find an open PR covering this specific ROCm gfx11xx support-check fix.

Test Plan

Ran:

.venv/bin/python -m pytest tests/kernels/moe/test_mxfp4_support.py -v

Test Result

Result:

3 passed

AI assistance

This is an AI-assisted contribution. I reviewed the final diff and verified the test command above before submitting.

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

</details>

Changed files

tests/kernels/moe/test_mxfp4_support.py (added, +70/-0)
vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py (modified, +6/-19)
vllm/model_executor/layers/fused_moe/utils.py (modified, +25/-0)

Code Example

def _supports_current_device() -> bool:
      ...
      return (9, 0) <= (cap.major, cap.minor) < (11, 0)

---

triton_kernels_supported = has_triton_kernels() and (
      9,
      0,
  ) <= current_platform.get_device_capability() < (11, 0)

---

NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.

RAW_BUFFERClick to expand / collapse

Your current environment

System Info

OS: Linux (e.g., Fedora 43) Hardware: AMD Strix Halo APU (gfx1151 / RDNA 3.5) vLLM version: v0.19.2 (and recent nightlies/main) Model: openai/gpt-oss-20b (or any gpt_oss_mxfp4 quantized MoE model)

🐛 Describe the bug

I am trying to run vLLM on an AMD Strix Halo (gfx1151) using ROCm. The environment is properly configured to compile Triton kernels. Previously, gpt-oss-20b (which initializes using gpt_oss_mxfp4 quantization) worked perfectly fine and used the Triton MXFP4 MoE backend as expected.

However, a recent update explicitly bounded the device_capability checks for the Triton MoE kernels to < (11, 0).

In vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py:

def _supports_current_device() -> bool:
    ...
    return (9, 0) <= (cap.major, cap.minor) < (11, 0)

In vllm/model_executor/layers/fused_moe/oracle/mxfp4.py:

triton_kernels_supported = has_triton_kernels() and (
    9,
    0,
) <= current_platform.get_device_capability() < (11, 0)

NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.

Could this check please be widened to (9, 0) <= cap < (12, 0) to allow RDNA3 architectures? Or was there a specific hardware-level bug on Blackwell/future architectures that necessitated this hard < (11,0) roof?

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be resolved by updating the device capability check in the Triton MoE kernels to support RDNA3 architectures.

Guidance

The current check (9, 0) <= (cap.major, cap.minor) < (11, 0) fails for the RDNA3/RDNA3.5 family, which has a device capability of (11, 5).
To fix this, the check can be widened to (9, 0) <= cap < (12, 0) to allow RDNA3 architectures.
Verify that the updated check resolves the NotImplementedError issue by running vLLM with the modified code.
If the issue persists, investigate other potential fallback MXFP4 backends for ROCm that may be compatible with the RDNA3/RDNA3.5 family.

Example

def _supports_current_device() -> bool:
    ...
    return (9, 0) <= (cap.major, cap.minor) < (12, 0)

Notes

The updated check may introduce compatibility issues with future architectures, so it's essential to monitor the behavior of vLLM with the modified code.

Recommendation

Apply the workaround by updating the device capability check to (9, 0) <= cap < (12, 0), as this should resolve the immediate issue with RDNA3 architectures.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt formatting #chain error #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: Triton MXFP4 MoE device capability check < (11, 0) breaks RDNA3.5 (gfx1151) support [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #40333: [ROCm] Allow Triton MXFP4 MoE support checks on gfx11xx

Description (problem / solution / changelog)

Purpose

Why this is not duplicate work

Test Plan

Test Result

AI assistance

Changed files

Code Example

Your current environment

System Info

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: Triton MXFP4 MoE device capability check < (11, 0) breaks RDNA3.5 (gfx1151) support [1 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #40333: [ROCm] Allow Triton MXFP4 MoE support checks on gfx11xx

Description (problem / solution / changelog)

Purpose

Why this is not duplicate work

Test Plan

Test Result

AI assistance

Changed files

Code Example

Your current environment

System Info

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING