vllm - ✅(Solved) Fix [Bug]: SM120 / RTX 5090 source build still registers unsupported FlashMLA / FA targets and uses non-SM120 Marlin defaults. [1 pull requests, 1 participants]

CristyNel · 2026-03-12T09:15:33Z

[vllm] PR 36873: ROCm Fix build issues with cub:: namespace and missing headers - Repository: vllm-project/vllm - Author: xueliangyang-oeuler - State: open | m… # PR #36873: [ROCm] Fix build issues with cub:: namespace and missing headers - Repository: vllm-project/vllm - Author: xueliangyang-oeuler - State: open | merged: False - Link: https://github.com/vllm-project/vllm/pull/36873 ## Description (problem / solution / changelog) This commit fixes build issues on ROCm platform by replacing manual includes of `cub/cub.cuh` with `cub_helpers.h`. `cub_helpers.h` correctly handles the `hipcub` namespace aliasing and includes necessary headers for HIP. Affected files: - `csrc/sampler.cu`: Replaced manual `#ifndef USE_ROCM` block. - `csrc/moe/moe_align_sum_kernels.cu`: Replaced `#include `. - `csrc/moe/moeTopKFuncs.cuh`: Replaced `#include `. - `csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.h`: Replaced `cub` includes. Fixes #36865 ## Purpose ## Test Plan ## Test Result --- Essential Elements of an Effective PR Description Checklist - [ ] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)". - [ ] The test plan, such as providing test command. - [ ] The test results, such as pasting the results comparison before and after, or e2e results - [ ] (Optional) The necessary documentation update, such as updating `supported_models.md` and `examples` for a new model. - [ ] (Optional) Release notes update. If your change is user facing, please update the release notes draft in the [Google Doc](https://docs.google.com/document/d/1YyVqrgX4gHTtrstbq8oWUImOyPCKSGnJ7xtTpmXzlRs/edit?tab=t.0). ## Changed files - `csrc/moe/moeTopKFuncs.cuh` (modified, +1/-1) - `csrc/moe/moe_align_sum_kernels.cu` (modified, +1/-1) - `csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.h` (modified, +1/-3) - `csrc/sampler.cu` (modified, +1/-5) ## Fixed - Fixed by PR: [ROCm] Fix build issues with cub:: namespace and missing headers (https://github.com/vllm-project/vllm/pull/36873) ## System Info * GPU: **NVIDIA GeForce RTX 5090** (Blackwell **SM120 / compute capability 12.0**) * Driver 595.45.04 - CUDA 13.2 * OS: Ubuntu 24.04.x LTS x86_64 * Python: 3.11.14 (venv) - Python 3.12.12 (system) * vLLM: `0.17.2.dev0+g95c0f928c.d20260312.cu132` * PyTorch: `2.10.0+cu130` * Flashinfer: `0.6.6+sm120` ### ✔️ SM120 (OK) * CUTLASS SM120 kernels * NVFP4 SM120 * MLA SM120 * moe_data SM120 * general SM120 codegen (`compute_120f`, `sm_120f`) ### ❌ Not SM120 * Marlin → need SM120 forcing / 8.0+PTX * Marlin‑MOE → need SM120 forcing / 8.0+PTX * FlashMLA → should be skipped/disabled on SM120, FlashMLA kernels only work on Hopper and require CUDA 12.3+, so on SM120 the correct action is to guard it out and disable its targets? * FA2_ARCHS → pip/setuptools still asks Ninja for_vllm_fa2_C / _vllm_fa3_C /_vllm_fa4_cutedsl_C

vllm2026-03-12 09:15:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36865•Fetched 2026-04-08 00:34:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

CristyNel

Participants

CristyNel

Timeline (top)

labeled ×1

Fix Action

Fixed

Fixed by PR: [ROCm] Fix build issues with cub:: namespace and missing headers (https://github.com/vllm-project/vllm/pull/36873)

PR fix notes

PR #36873: [ROCm] Fix build issues with cub:: namespace and missing headers

Repository: vllm-project/vllm
Author: xueliangyang-oeuler
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/36873

Description (problem / solution / changelog)

This commit fixes build issues on ROCm platform by replacing manual includes of cub/cub.cuh with cub_helpers.h. cub_helpers.h correctly handles the hipcub namespace aliasing and includes necessary headers for HIP.

Affected files:

csrc/sampler.cu: Replaced manual #ifndef USE_ROCM block.
csrc/moe/moe_align_sum_kernels.cu: Replaced #include <cub/cub.cuh>.
csrc/moe/moeTopKFuncs.cuh: Replaced #include <cub/cub.cuh>.
csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.h: Replaced cub includes.

Fixes #36865

Purpose

Test Plan

Test Result

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

csrc/moe/moeTopKFuncs.cuh (modified, +1/-1)
csrc/moe/moe_align_sum_kernels.cu (modified, +1/-1)
csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.h (modified, +1/-3)
csrc/sampler.cu (modified, +1/-5)

RAW_BUFFERClick to expand / collapse

System Info

GPU: NVIDIA GeForce RTX 5090 (Blackwell SM120 / compute capability 12.0)
Driver 595.45.04 - CUDA 13.2
OS: Ubuntu 24.04.x LTS x86_64
Python: 3.11.14 (venv) - Python 3.12.12 (system)
vLLM: 0.17.2.dev0+g95c0f928c.d20260312.cu132
PyTorch: 2.10.0+cu130
Flashinfer: 0.6.6+sm120

✔️ SM120 (OK)

CUTLASS SM120 kernels
NVFP4 SM120
MLA SM120
moe_data SM120
general SM120 codegen (compute_120f, sm_120f)

❌ Not SM120

Marlin → need SM120 forcing / 8.0+PTX
Marlin‑MOE → need SM120 forcing / 8.0+PTX
FlashMLA → should be skipped/disabled on SM120, FlashMLA kernels only work on Hopper and require CUDA 12.3+, so on SM120 the correct action is to guard it out and disable its targets?
FA2_ARCHS → pip/setuptools still asks Ninja for_vllm_fa2_C / _vllm_fa3_C /_vllm_fa4_cutedsl_C

extent analysis

Fix Plan

To resolve the compatibility issues with SM120, we need to make the following changes:

Force SM120 support for Marlin and Marlin-MOE by using 8.0+PTX.
Disable FlashMLA on SM120 as its kernels only work on Hopper and require CUDA 12.3+.
Update FA2_ARCHS to exclude SM120 incompatible targets.

Step-by-Step Solution

Update Marlin and Marlin-MOE:

In setup.py or build script

import torch

Force SM120 support

torch.cuda.set_arch('sm_120')

Build Marlin and Marlin-MOE with 8.0+PTX

2. **Disable FlashMLA on SM120**:
   ```python
# In FlashMLA code
import torch

if torch.cuda.get_device_capability() == (12, 0):  # SM120
    # Disable FlashMLA
    print("FlashMLA disabled on SM120")
else:
    # Enable FlashMLA
    print("FlashMLA enabled")

Update FA2_ARCHS:

In setup.py or build script

import sys

Exclude SM120 incompatible targets

if sys.platform == 'linux' and torch.cuda.get_device_capability() == (12, 0): # SM120 FA2_ARCHS = ['sm_80', 'sm_90'] else: FA2_ARCHS = ['sm_80', 'sm_90', 'sm_120']


### Verification
After applying these changes, verify that:
* Marlin and Marlin-MOE are built with SM120 support.
* FlashMLA is disabled on SM120.
* FA2_ARCHS excludes SM120 incompatible targets.

### Extra Tips
* Ensure that the CUDA version is compatible with the SM120 architecture.
* Test the changes thoroughly to ensure that they do not introduce any regressions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: SM120 / RTX 5090 source build still registers unsupported FlashMLA / FA targets and uses non-SM120 Marlin defaults. [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #36873: [ROCm] Fix build issues with cub:: namespace and missing headers

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

System Info

✔️ SM120 (OK)

❌ Not SM120

extent analysis

Fix Plan

Step-by-Step Solution

In setup.py or build script

Force SM120 support

Build Marlin and Marlin-MOE with 8.0+PTX

In setup.py or build script

Exclude SM120 incompatible targets

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: SM120 / RTX 5090 source build still registers unsupported FlashMLA / FA targets and uses non-SM120 Marlin defaults. [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #36873: [ROCm] Fix build issues with cub:: namespace and missing headers

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

System Info

✔️ SM120 (OK)

❌ Not SM120

extent analysis

Fix Plan

Step-by-Step Solution

In setup.py or build script

Force SM120 support

Build Marlin and Marlin-MOE with 8.0+PTX

In setup.py or build script

Exclude SM120 incompatible targets

Still need to ship something?

RELATED_DISCOVERY

TRENDING