ollama - 💡(How to fix) Fix Fuse Qwen3Next Gated Delta Net autoregressive decode [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Qwen3Next and Qwen35 use Gated Delta Net recurrent layers. In autoregressive decode, the current graph expands this recurrent update into many smaller tensor operations. llama.cpp has since added a fused ggml_gated_delta_net op for this path.

This issue proposes bringing that fused autoregressive decode path into Ollama while keeping behaviour unchanged for unsupported backends.

Proposed shape:

  • Backport the upstream ggml Gated Delta Net op from llama.cpp while Ollama's pinned llama.cpp commit predates that implementation.
  • Use the fused op only for autoregressive decode.
  • Keep chunked prefill on the existing explicit graph.
  • Gate the fused path by backend capability, so unsupported backends continue to use the existing path rather than silently falling back to CPU.
  • Add CPU reference tests, CUDA parity tests for supported head dimensions, and multi-step recurrent state tests.

The change should not add a user-facing option. It is an internal execution optimisation for existing Qwen3Next and Qwen35 models.

Validation expected before merge:

  • ggml CPU reference parity.
  • CUDA versus CPU parity for supported S_v values.
  • Multi-step recurrent state continuity.
  • Native ggml, ggml-cpu and ggml-cuda builds.
  • End-to-end smoke or benchmark evidence on a Qwen3Next or Qwen35 GGUF.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Fuse Qwen3Next Gated Delta Net autoregressive decode [1 pull requests]