vllm - 💡(How to fix) Fix [Hybrid SSM] Investigate accuracy divergence between `mamba_chunk_scan` and `selective_state_update` kernels

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

PR #43186 lowered the gsm8k accuracy threshold for ibm-granite/granite-4.0-h-tiny under NIXL PD disagg after #42430 rerouted the D-side 1-token recompute from mamba_chunk_scan_combined_varlen to selective_state_update (SSU).

The two kernels produce different bf16 outputs due (supposedly) different reduction orders:

  • mamba_chunk_scan_combined_varlen goes through bf16 matmul with intermediate bf16 casts (tl.dot)
  • selective_state_update keeps the SSM step in fp32 throughout

As noted in this comment, this seems like a large accuracy drop for something meant to be algebraically equivalent.

Error Message

  1. Quantify the magnitude: Run both kernels on identical inputs for a single prompt (prefill + decode) and measure the output divergence (max abs diff, relative error distribution).

Root Cause

PR #43186 lowered the gsm8k accuracy threshold for ibm-granite/granite-4.0-h-tiny under NIXL PD disagg after #42430 rerouted the D-side 1-token recompute from mamba_chunk_scan_combined_varlen to selective_state_update (SSU).

The two kernels produce different bf16 outputs due (supposedly) different reduction orders:

  • mamba_chunk_scan_combined_varlen goes through bf16 matmul with intermediate bf16 casts (tl.dot)
  • selective_state_update keeps the SSM step in fp32 throughout

As noted in this comment, this seems like a large accuracy drop for something meant to be algebraically equivalent.

RAW_BUFFERClick to expand / collapse

Context

PR #43186 lowered the gsm8k accuracy threshold for ibm-granite/granite-4.0-h-tiny under NIXL PD disagg after #42430 rerouted the D-side 1-token recompute from mamba_chunk_scan_combined_varlen to selective_state_update (SSU).

The two kernels produce different bf16 outputs due (supposedly) different reduction orders:

  • mamba_chunk_scan_combined_varlen goes through bf16 matmul with intermediate bf16 casts (tl.dot)
  • selective_state_update keeps the SSM step in fp32 throughout

As noted in this comment, this seems like a large accuracy drop for something meant to be algebraically equivalent.

Goal

Investigate the kernel-level numerical divergence in isolation to:

  1. Quantify the magnitude: Run both kernels on identical inputs for a single prompt (prefill + decode) and measure the output divergence (max abs diff, relative error distribution).
  2. Determine hardware dependence: Test on at least Ampere (A100) and Hopper (H100) to see if the divergence magnitude or direction is architecture-dependent.
  3. Recommend a fix or acceptable tolerance: Based on findings, either align the kernels' numeric behavior or document the expected tolerance.

Reproduction Plan

  • Isolate the two kernels (mamba_chunk_scan_combined_varlen and selective_state_update) with a single-prompt workload (1 prefill + N decode steps).
  • Compare outputs element-wise against each other and against an fp64 reference.
  • Run on Ampere (A100) and Hopper (H100) hardware.

References

  • PR that lowered threshold: #43186
  • PR that rerouted to SSU: #42430

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING