vllm - ✅(Solved) Fix [Bug]: Triton MXFP4 MoE device capability check < (11, 0) breaks RDNA3.5 (gfx1151) support [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40301Fetched 2026-04-20 11:59:28
View on GitHub
Comments
3
Participants
3
Timeline
16
Reactions
0
Author
Timeline (top)
mentioned ×4subscribed ×4commented ×3labeled ×2

Error Message

NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.

Root Cause

Because vLLM maps gfx1151 to a device capability of (11, 5), the < (11, 0) check completely fails for the entire RDNA3/RDNA3.5 family. As a result, the backend oracle drops the Triton kernels, cannot find any other fallback MXFP4 backends for ROCm, and crashes with:

PR fix notes

PR #40333: [ROCm] Allow Triton MXFP4 MoE support checks on gfx11xx

Description (problem / solution / changelog)

Add a shared helper for Triton MXFP4 device support checks and use it in the GPT-OSS Triton MXFP4 experts support paths.

This keeps the CUDA < 11.0 guard unchanged, while allowing ROCm gfx11xx devices (for example gfx1151 / RDNA3.5) to pass the support check.

Also add unit coverage for the platform-specific capability ceilings.

Tested with:

  • .venv/bin/python -m pytest tests/kernels/moe/test_mxfp4_support.py -v

AI assistance was used to prepare this change, and I reviewed the final diff.

Co-authored-by: OpenAI Codex

Purpose

This PR fixes the Triton MXFP4 MoE support check for ROCm gfx11xx devices such as gfx1151 / RDNA3.5. Changes:

  • add a shared helper for Triton MXFP4 device support checks
  • keep the existing CUDA < 11.0 guard unchanged
  • allow ROCm gfx11xx devices through the Triton MXFP4 support path
  • add unit coverage for the platform-specific capability ceilings

Why this is not duplicate work

I checked issue #40301 and searched for open PRs referencing #40301 and related terms before opening this PR. I did not find an open PR covering this specific ROCm gfx11xx support-check fix.

Test Plan

Ran:

  • .venv/bin/python -m pytest tests/kernels/moe/test_mxfp4_support.py -v

Test Result

Result:

  • 3 passed

AI assistance

This is an AI-assisted contribution. I reviewed the final diff and verified the test command above before submitting.


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
</details>

Changed files

  • tests/kernels/moe/test_mxfp4_support.py (added, +70/-0)
  • vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py (modified, +6/-19)
  • vllm/model_executor/layers/fused_moe/utils.py (modified, +25/-0)

Code Example

def _supports_current_device() -> bool:
      ...
      return (9, 0) <= (cap.major, cap.minor) < (11, 0)

---

triton_kernels_supported = has_triton_kernels() and (
      9,
      0,
  ) <= current_platform.get_device_capability() < (11, 0)

---

NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.
RAW_BUFFERClick to expand / collapse

Your current environment

System Info

OS: Linux (e.g., Fedora 43) Hardware: AMD Strix Halo APU (gfx1151 / RDNA 3.5) vLLM version: v0.19.2 (and recent nightlies/main) Model: openai/gpt-oss-20b (or any gpt_oss_mxfp4 quantized MoE model)

🐛 Describe the bug

I am trying to run vLLM on an AMD Strix Halo (gfx1151) using ROCm. The environment is properly configured to compile Triton kernels. Previously, gpt-oss-20b (which initializes using gpt_oss_mxfp4 quantization) worked perfectly fine and used the Triton MXFP4 MoE backend as expected.

However, a recent update explicitly bounded the device_capability checks for the Triton MoE kernels to < (11, 0).

  • In vllm/model_executor/layers/fused_moe/experts/gpt_oss_triton_kernels_moe.py:
    def _supports_current_device() -> bool:
        ...
        return (9, 0) <= (cap.major, cap.minor) < (11, 0)
  • In vllm/model_executor/layers/fused_moe/oracle/mxfp4.py:
    triton_kernels_supported = has_triton_kernels() and (
        9,
        0,
    ) <= current_platform.get_device_capability() < (11, 0)

Because vLLM maps gfx1151 to a device capability of (11, 5), the < (11, 0) check completely fails for the entire RDNA3/RDNA3.5 family. As a result, the backend oracle drops the Triton kernels, cannot find any other fallback MXFP4 backends for ROCm, and crashes with:

NotImplementedError: No MXFP4 MoE backend supports the deployment configuration.

Could this check please be widened to (9, 0) <= cap < (12, 0) to allow RDNA3 architectures? Or was there a specific hardware-level bug on Blackwell/future architectures that necessitated this hard < (11,0) roof?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be resolved by updating the device capability check in the Triton MoE kernels to support RDNA3 architectures.

Guidance

  • The current check (9, 0) <= (cap.major, cap.minor) < (11, 0) fails for the RDNA3/RDNA3.5 family, which has a device capability of (11, 5).
  • To fix this, the check can be widened to (9, 0) <= cap < (12, 0) to allow RDNA3 architectures.
  • Verify that the updated check resolves the NotImplementedError issue by running vLLM with the modified code.
  • If the issue persists, investigate other potential fallback MXFP4 backends for ROCm that may be compatible with the RDNA3/RDNA3.5 family.

Example

def _supports_current_device() -> bool:
    ...
    return (9, 0) <= (cap.major, cap.minor) < (12, 0)

Notes

The updated check may introduce compatibility issues with future architectures, so it's essential to monitor the behavior of vLLM with the modified code.

Recommendation

Apply the workaround by updating the device capability check to (9, 0) <= cap < (12, 0), as this should resolve the immediate issue with RDNA3 architectures.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: Triton MXFP4 MoE device capability check < (11, 0) breaks RDNA3.5 (gfx1151) support [1 pull requests, 3 comments, 3 participants]