vllm - 💡(How to fix) Fix [Feature]: Batch-invariant support for GDN_ATTN (Qwen3-Next / Qwen3.6 hybrid Mamba+GDN MoE models)

vllm2026-05-18 09:51:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

RuntimeError: VLLM batch_invariant mode is not supported for GDN_ATTN.

Fix Action

Fix / Workaround

Both v0.21.0 and nightly (May 2026) fail with the same error. The check is triggered as soon as the engine selects the Mamba/GDN attention backend during init, before any AWQ-kernel logic runs — so it is independent of --quantization, --attention-backend, VLLM_ATTENTION_BACKEND (unrecognized in 0.21.0), and other workaround knobs.

Happy to test patches on A100 + Qwen3.6-A3B if helpful.

Code Example

RuntimeError: VLLM batch_invariant mode is not supported for GDN_ATTN.

---

docker run --rm --gpus all --ipc host \
  -e HUGGING_FACE_HUB_TOKEN=... \
  -e VLLM_BATCH_INVARIANT=1 \
  vllm/vllm-openai:v0.21.0 \
  --model cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit \
  --trust-remote-code \
  --max-model-len 20480

RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>Versions</summary>

vLLM: 0.21.0 (Docker image vllm/vllm-openai:v0.21.0) and nightly (digest sha256:d1bd760bf6630f67378206c7945afb6ab9bc046064a51fe421461e91261dcd7b, pulled 2026-05-18)
PyTorch: 2.11.0+cu130
CUDA: 13.0
GPU: NVIDIA A100-SXM4-80GB (compute capability 8.0, SM80)
Model: cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit (Qwen3-Next-style hybrid Mamba + Gated-Delta-Net + softmax-attention MoE; quantization: compressed-tensors)
TP: 1

</details>

🐛 Describe the bug / 🛠 Feature request

Setting VLLM_BATCH_INVARIANT=1 on a model that contains GDN (Gated-Delta-Net) linear-attention layers causes engine startup to abort with:

RuntimeError: VLLM batch_invariant mode is not supported for GDN_ATTN.

Source: vllm/v1/attention/selector.py:154 in _cached_get_mamba_attn_backend.

This is a hard incompatibility — no fallback, no partial mode. It blocks reproducibility work for all Qwen3-Next / Qwen3.6-style models (and any other hybrid Mamba/GDN architecture).

Reproduction

docker run --rm --gpus all --ipc host \
  -e HUGGING_FACE_HUB_TOKEN=... \
  -e VLLM_BATCH_INVARIANT=1 \
  vllm/vllm-openai:v0.21.0 \
  --model cyankiwi/Qwen3.6-35B-A3B-AWQ-4bit \
  --trust-remote-code \
  --max-model-len 20480

#42456 — added SM80 batch-invariant support for the regular attention path. This issue requests the analogous coverage for the linear-attention (Mamba/GDN) path.
#29581 — closed; addressed batch-invariance for AWQ-Marlin kernels on the non-hybrid Qwen3-30B-A3B. Different layer family (no GDN); doesn't cover this case.
#32992 — closed; B200 / Blackwell + torch.compile. Different hardware.

Happy to test patches on A100 + Qwen3.6-A3B if helpful.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Feature]: Batch-invariant support for GDN_ATTN (Qwen3-Next / Qwen3.6 hybrid Mamba+GDN MoE models)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

🐛 Describe the bug / 🛠 Feature request

Reproduction

Related

Before submitting a new issue...

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Feature]: Batch-invariant support for GDN_ATTN (Qwen3-Next / Qwen3.6 hybrid Mamba+GDN MoE models)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Your current environment

🐛 Describe the bug / 🛠 Feature request

Reproduction

Related

Before submitting a new issue...

Still need to ship something?

RELATED_DISCOVERY

TRENDING