vllm - ✅(Solved) Fix [CI Failure]: Test Eval Marlin Qwen3-30B-A3B-Fp8 [1 pull requests, 2 comments, 2 participants]

vllm2026-03-25 11:17:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#38101•Fetched 2026-04-08 01:26:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ilmarkov

Participants

ilmarkov

jikunshang

Timeline (top)

mentioned ×5subscribed ×5commented ×2added_to_project_v2 ×1

Root Cause

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

PR fix notes

PR #32929: [FP8]add FP8 WoQ kernel abstraction.

Repository: vllm-project/vllm
Author: jikunshang
State: closed | merged: True
Link: https://github.com/vllm-project/vllm/pull/32929

Description (problem / solution / changelog)

Purpose

This PR refactors the FP8 linear kernel stack to integrate the Marlin kernel into the FP8 kernel abstraction and to centralize kernel selection. After this change, the FP8 execution path has a single intentional divergence: block-scaled scaled_mm, which should not use Marlin. Changes

Centralized FP8 kernel selection via init_fp8_linear_kernel() Uses init_fp8_linear_kernel() to select the appropriate FP8 kernel implementation (e.g., W8A16 vs. W8A8) based on configuration and platform capability.
Added MarlinFP8ScaledMMLinearKernel Introduces a Marlin-backed FP8 kernel implementation under the scaled-mm kernel abstraction, enabling Marlin to be selected and used through the unified FP8 kernel interface.

Follow-up (post-merge)

Add XPU W8A16 GEMM kernel support in the FP8 linear path once this refactor is merged.

Test Plan

Test Result

lm-eval result qwen3-4B on 3090 <img width="848" height="207" alt="image" src="https://github.com/user-attachments/assets/a7bbd3bf-308b-445b-be56-42d8da1cdb41" />

<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

</details>

Changed files

vllm/model_executor/kernels/linear/__init__.py (modified, +4/-0)
vllm/model_executor/kernels/linear/scaled_mm/__init__.py (modified, +4/-0)
vllm/model_executor/kernels/linear/scaled_mm/marlin.py (added, +120/-0)
vllm/model_executor/layers/quantization/fbgemm_fp8.py (modified, +0/-12)
vllm/model_executor/layers/quantization/fp8.py (modified, +49/-71)

RAW_BUFFERClick to expand / collapse

Name of failing test

tests/evals/gsm8k/test_gsm8k_correctness.py::test_gsm8k_correctness[Qwen3-30B-A3B-Fp8-CT-Channel-marlin]

Basic information

Flaky test
Can reproduce locally
Caused by external libraries (e.g. bug in transformers)

🧪 Describe the failing test

pytest tests/evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/moe-refactor/config-h100.txt -k "Qwen3-30B-A3B-Fp8-CT-Channel-marlin" -v -s

Fails with 'QKVParallelLinear' object has no attribute 'workspace' in vllm/vllm/model_executor/kernels/linear/scaled_mm/marlin.py", line 101, in apply_weights called from schemes/compressed_tensors_w8a8_fp8.py because CompressedTensorsW8A8Fp8 doesn't call fp8_linear.process_weights_after_loading(layer)

Introduced in https://github.com/vllm-project/vllm/pull/32929

📝 History of failing test

https://buildkite.com/vllm/ci/builds/57706/steps/canvas?sid=019d1e6f-1784-460b-b631-fcea0f90d7ff&tab=output

CC List.

@jikunshang @robertgshaw2-redhat @mgoin @tjtanaa

extent analysis

Fix Plan

To fix the issue, we need to ensure that fp8_linear.process_weights_after_loading(layer) is called for CompressedTensorsW8A8Fp8.

Here are the steps:

Modify schemes/compressed_tensors_w8a8_fp8.py to call fp8_linear.process_weights_after_loading(layer) after loading the weights.
Update the CompressedTensorsW8A8Fp8 class to handle the workspace attribute.

Example code:

# schemes/compressed_tensors_w8a8_fp8.py
from vllm.vllm.model_executor.kernels.linear.scaled_mm import fp8_linear

class CompressedTensorsW8A8Fp8:
    # ...
    def load_weights(self, layer):
        # ...
        fp8_linear.process_weights_after_loading(layer)
        # ...

Verification

To verify the fix, run the failing test again:

pytest tests/evals/gsm8k/test_gsm8k_correctness.py --config-list-file=configs/moe-refactor/config-h100.txt -k "Qwen3-30B-A3B-Fp8-CT-Channel-marlin" -v -s

If the test passes, the fix is successful.

Extra Tips

Make sure to update the CompressedTensorsW8A8Fp8 class to handle the workspace attribute correctly.
Test the fix thoroughly to ensure it doesn't introduce any regressions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#logging issue #authentication issue #prompt issue #agent setup #task chaining

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [CI Failure]: Test Eval Marlin Qwen3-30B-A3B-Fp8 [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

PR fix notes

PR #32929: [FP8]add FP8 WoQ kernel abstraction.

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [CI Failure]: Test Eval Marlin Qwen3-30B-A3B-Fp8 [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

PR fix notes

PR #32929: [FP8]add FP8 WoQ kernel abstraction.

Description (problem / solution / changelog)

Purpose

Test Plan

Test Result

Changed files

Name of failing test

Basic information

🧪 Describe the failing test

📝 History of failing test

CC List.

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING