vllm - ✅(Solved) Fix [Bug]: SM120 / RTX 5090 source build still registers unsupported FlashMLA / FA targets and uses non-SM120 Marlin defaults. [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36865Fetched 2026-04-08 00:34:07
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
2
Author
Participants
Timeline (top)
labeled ×1

Fix Action

Fixed

PR fix notes

PR #36873: [ROCm] Fix build issues with cub:: namespace and missing headers

Description (problem / solution / changelog)

This commit fixes build issues on ROCm platform by replacing manual includes of cub/cub.cuh with cub_helpers.h. cub_helpers.h correctly handles the hipcub namespace aliasing and includes necessary headers for HIP.

Affected files:

  • csrc/sampler.cu: Replaced manual #ifndef USE_ROCM block.
  • csrc/moe/moe_align_sum_kernels.cu: Replaced #include <cub/cub.cuh>.
  • csrc/moe/moeTopKFuncs.cuh: Replaced #include <cub/cub.cuh>.
  • csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.h: Replaced cub includes.

Fixes #36865

<!-- markdownlint-disable -->

Purpose

Test Plan

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • csrc/moe/moeTopKFuncs.cuh (modified, +1/-1)
  • csrc/moe/moe_align_sum_kernels.cu (modified, +1/-1)
  • csrc/moe/permute_unpermute_kernels/moe_permute_unpermute_kernel.h (modified, +1/-3)
  • csrc/sampler.cu (modified, +1/-5)
RAW_BUFFERClick to expand / collapse

System Info

  • GPU: NVIDIA GeForce RTX 5090 (Blackwell SM120 / compute capability 12.0)
  • Driver 595.45.04 - CUDA 13.2
  • OS: Ubuntu 24.04.x LTS x86_64
  • Python: 3.11.14 (venv) - Python 3.12.12 (system)
  • vLLM: 0.17.2.dev0+g95c0f928c.d20260312.cu132
  • PyTorch: 2.10.0+cu130
  • Flashinfer: 0.6.6+sm120

✔️ SM120 (OK)

  • CUTLASS SM120 kernels
  • NVFP4 SM120
  • MLA SM120
  • moe_data SM120
  • general SM120 codegen (compute_120f, sm_120f)

❌ Not SM120

  • Marlin → need SM120 forcing / 8.0+PTX
  • Marlin‑MOE → need SM120 forcing / 8.0+PTX
  • FlashMLA → should be skipped/disabled on SM120, FlashMLA kernels only work on Hopper and require CUDA 12.3+, so on SM120 the correct action is to guard it out and disable its targets?
  • FA2_ARCHS → pip/setuptools still asks Ninja for_vllm_fa2_C / _vllm_fa3_C /_vllm_fa4_cutedsl_C

extent analysis

Fix Plan

To resolve the compatibility issues with SM120, we need to make the following changes:

  • Force SM120 support for Marlin and Marlin-MOE by using 8.0+PTX.
  • Disable FlashMLA on SM120 as its kernels only work on Hopper and require CUDA 12.3+.
  • Update FA2_ARCHS to exclude SM120 incompatible targets.

Step-by-Step Solution

  1. Update Marlin and Marlin-MOE:

In setup.py or build script

import torch

Force SM120 support

torch.cuda.set_arch('sm_120')

Build Marlin and Marlin-MOE with 8.0+PTX

2. **Disable FlashMLA on SM120**:
   ```python
# In FlashMLA code
import torch

if torch.cuda.get_device_capability() == (12, 0):  # SM120
    # Disable FlashMLA
    print("FlashMLA disabled on SM120")
else:
    # Enable FlashMLA
    print("FlashMLA enabled")
  1. Update FA2_ARCHS:

In setup.py or build script

import sys

Exclude SM120 incompatible targets

if sys.platform == 'linux' and torch.cuda.get_device_capability() == (12, 0): # SM120 FA2_ARCHS = ['sm_80', 'sm_90'] else: FA2_ARCHS = ['sm_80', 'sm_90', 'sm_120']


### Verification
After applying these changes, verify that:
* Marlin and Marlin-MOE are built with SM120 support.
* FlashMLA is disabled on SM120.
* FA2_ARCHS excludes SM120 incompatible targets.

### Extra Tips
* Ensure that the CUDA version is compatible with the SM120 architecture.
* Test the changes thoroughly to ensure that they do not introduce any regressions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING