ollama - ✅(Solved) Fix qwen35moe architecture missing from vendored llama.cpp -- mmproj/vision loading fails [1 pull requests, 2 comments, 1 participants]

ollama2026-04-30 13:17:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15898•Fetched 2026-05-01 05:33:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ArkaD171717

Participants

ArkaD171717

Timeline (top)

commented ×2cross-referenced ×1referenced ×1

Error Message

llama_model_load: error loading model: error loading model architecture: ollama create succeeds, but any /api/generate call returns unable to load model. Server log shows the architecture error above.

#14730 (same error, closed as dup of #14575)
#15747 (same error on Ollama 0.21.0)
#15499 (same error, closed as dup of #14575)

Root Cause

PR #14517 added qwen35moe to the Go engine's text runner. But the Go engine does not support split vision models for this architecture -- when projectors are present, it falls back to the C++ llama.cpp runner. Ollama's vendored llama.cpp fork does not have qwen35 or qwen35moe in its architecture table, so the fallback fails.

Upstream ggml-org/llama.cpp already supports both architectures (LLM_ARCH_QWEN35, LLM_ARCH_QWEN35MOE).

Fix Action

Fixed

Fixed by PR: llama: add qwen35/qwen35moe architecture support for community GGUFs (https://github.com/ollama/ollama/pull/15899)

PR fix notes

PR #15899: llama: add qwen35/qwen35moe architecture support for community GGUFs

Repository: ollama/ollama
Author: ArkaD171717
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/15899

Description (problem / solution / changelog)

Community GGUFs (e.g. bartowski) use upstream llama.cpp's converter which writes "qwen35moe" as the architecture string. Ollama's vendored llama.cpp only recognizes "qwen3next", causing "unknown model architecture: 'qwen35moe'" errors when loading these files

This adds full graph-building support for the qwen35 and qwen35moe architectures. The key differences from qwen3next are:

Separate attn_qkv (QKV) + attn_gate (Z) projections instead of combined ssm_in (QKVZ)
Separate ssm_alpha and ssm_beta tensors instead of combined ssm_beta_alpha
IMROPE (ggml_rope_multi with sections) instead of NEOX (ggml_rope_ext)

The delta-net chunked/autoregressive math, conv1d pipeline, gated normalization, and MoE FFN logic are identical to qwen3next

Fixes #15898

Note: Will be superseded by #15122 when it lands, intended as a stopgap for users blocked on this

Changed files

llama/llama.cpp/src/llama-arch.cpp (modified, +67/-0)
llama/llama.cpp/src/llama-arch.h (modified, +4/-0)
llama/llama.cpp/src/llama-model.cpp (modified, +162/-0)
llama/llama.cpp/src/llama-model.h (modified, +5/-0)
llama/llama.cpp/src/models/models.h (modified, +51/-0)
llama/llama.cpp/src/models/qwen35.cpp (added, +749/-0)

Code Example

llama_model_load: error loading model: error loading model architecture:
unknown model architecture: 'qwen35moe'

---

FROM /path/to/Qwen_Qwen3.6-35B-A3B-IQ2_XS.gguf
FROM /path/to/mmproj-F16.gguf

RAW_BUFFERClick to expand / collapse

Bug

Attaching an mmproj (vision projector) GGUF to a qwen35moe model fails with:

llama_model_load: error loading model: error loading model architecture:
unknown model architecture: 'qwen35moe'

This blocks ALL inference (text + vision) when an mmproj is attached via dual-FROM Modelfile.

Reproduction

Reproduced on Kaggle T4x2 (2026-04-30) using:

Text GGUF: bartowski/Qwen_Qwen3.6-35B-A3B-GGUF (IQ2_XS)
mmproj: Youseff1987/Qwen3.6-35B-A3B-Claude-4.6-Opus-Reasoning-Distilled-GGUF-with-mmproj (mmproj-F16)

Modelfile:

FROM /path/to/Qwen_Qwen3.6-35B-A3B-IQ2_XS.gguf
FROM /path/to/mmproj-F16.gguf

ollama create succeeds, but any /api/generate call returns unable to load model. Server log shows the architecture error above.

Full reproduction notebook: https://github.com/ArkaD171717/Qwen3.6-Compat/blob/main/ollama/test_mmproj_clip_runner.ipynb

Root cause

Upstream ggml-org/llama.cpp already supports both architectures (LLM_ARCH_QWEN35, LLM_ARCH_QWEN35MOE).

Proposed fix

Sync qwen35/qwen35moe architecture support from upstream ggml-org/llama.cpp into:

llama/llama.cpp/src/llama-arch.h (enum entries)
llama/llama.cpp/src/llama-arch.cpp (name map + tensor maps)
llama/llama.cpp/src/llama-model.cpp (hparams + graph building)

Related issues

#14730 (same error, closed as dup of #14575)
#14575 (open, Qwen3.5 loading failures)
#15747 (same error on Ollama 0.21.0)
#15499 (same error, closed as dup of #14575)
#14517 (text runner fix, merged)

extent analysis

TL;DR

Syncing qwen35 and qwen35moe architecture support from upstream ggml-org/llama.cpp into the vendored llama.cpp fork should resolve the model loading issue.

Guidance

Verify that the llama.cpp fork used by Ollama does not support qwen35 and qwen35moe architectures by checking the llama-arch.h and llama-arch.cpp files.
Update the llama-arch.h file to include LLM_ARCH_QWEN35 and LLM_ARCH_QWEN35MOE enum entries.
Update the llama-arch.cpp file to include name maps and tensor maps for qwen35 and qwen35moe architectures.
Update the llama-model.cpp file to include hparams and graph building support for qwen35 and qwen35moe architectures.

Example

No code snippet is provided as the necessary changes are specific to the llama.cpp fork and require careful updates to multiple files.

Notes

The proposed fix assumes that syncing the architecture support from upstream ggml-org/llama.cpp will resolve the issue. However, this may not be the case if there are other underlying problems.

Recommendation

Apply the workaround by syncing the qwen35 and qwen35moe architecture support from upstream ggml-org/llama.cpp into the vendored llama.cpp fork, as this is the most direct solution to the problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ISR setup #authentication setup #request error #file not found

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix qwen35moe architecture missing from vendored llama.cpp -- mmproj/vision loading fails [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #15899: llama: add qwen35/qwen35moe architecture support for community GGUFs

Description (problem / solution / changelog)

Changed files

Code Example

Bug

Reproduction

Root cause

Proposed fix

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix qwen35moe architecture missing from vendored llama.cpp -- mmproj/vision loading fails [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #15899: llama: add qwen35/qwen35moe architecture support for community GGUFs

Description (problem / solution / changelog)

Changed files

Code Example

Bug

Reproduction

Root cause

Proposed fix

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING