ollama - 💡(How to fix) Fix Any way to run large model locally instead of cloud? GLM, Deepseek v4, minimax, etc. [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15824Fetched 2026-04-27 05:29:02
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

It always results in an error. such as llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm-dsa'

Code Example

ollama run hf.co/<username>/<model-name>
ollama run hf.co/unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL
ollama run hf.co/0xSero/GLM-5-REAP-50pct-UD-IQ2_M-GGUF:UD-IQ2_M

---

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
#or
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm-dsa'
RAW_BUFFERClick to expand / collapse

I have a high-performance server equipped with 8x NVIDIA A100 (40GB) GPUs, totaling 320GB of VRAM. I want to run large models like GLM and DeepSeek V4 locally, utilizing my full VRAM capacity, rather than using the cloud mode.

I have downloaded the quantized versions of these models from Hugging Face (HF), which are specifically optimized for local inference.

However, when I try to run them using the command:

ollama run hf.co/<username>/<model-name>
ollama run hf.co/unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL
ollama run hf.co/0xSero/GLM-5-REAP-50pct-UD-IQ2_M-GGUF:UD-IQ2_M

It always results in an error. such as

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
#or
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm-dsa'

My Question: Is there a way to force Ollama to load these models locally instead of attempting to use cloud mode?

Has anyone successfully run such large models locally using Ollama with Hugging Face checkpoints?

extent analysis

TL;DR

Check if Ollama supports the specific quantized model architectures ('qwen35moe' and 'glm-dsa') and consider updating Ollama or using a different loader to support local inference of these models.

Guidance

  • Verify that Ollama is configured to use local mode instead of cloud mode by checking the documentation or configuration options.
  • Check the Hugging Face model cards for 'Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL' and 'GLM-5-REAP-50pct-UD-IQ2_M' to ensure they are compatible with Ollama and local inference.
  • Investigate if there are any known issues or limitations with loading quantized models in Ollama, specifically with the 'qwen35moe' and 'glm-dsa' architectures.
  • Consider reaching out to the Ollama community or developers to ask about support for these specific models and architectures.

Notes

The error messages suggest that Ollama does not recognize the model architectures, which may indicate a limitation or incompatibility with the quantized models from Hugging Face.

Recommendation

Apply workaround: Try to update Ollama to the latest version or use a different model loader that supports the specific architectures, as it seems that the current version of Ollama may not support the 'qwen35moe' and 'glm-dsa' models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING