ollama - 💡(How to fix) Fix Any way to run large model locally instead of cloud? GLM, Deepseek v4, minimax, etc. [1 participants]

ollama2026-04-26 14:32:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15824•Fetched 2026-04-27 05:29:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

berlin2123

Participants

berlin2123

Timeline (top)

labeled ×1

Error Message

It always results in an error. such as llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe' llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm-dsa'

Code Example

ollama run hf.co/<username>/<model-name>
ollama run hf.co/unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL
ollama run hf.co/0xSero/GLM-5-REAP-50pct-UD-IQ2_M-GGUF:UD-IQ2_M

---

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
#or
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm-dsa'

RAW_BUFFERClick to expand / collapse

I have a high-performance server equipped with 8x NVIDIA A100 (40GB) GPUs, totaling 320GB of VRAM. I want to run large models like GLM and DeepSeek V4 locally, utilizing my full VRAM capacity, rather than using the cloud mode.

I have downloaded the quantized versions of these models from Hugging Face (HF), which are specifically optimized for local inference.

However, when I try to run them using the command:

ollama run hf.co/<username>/<model-name>
ollama run hf.co/unsloth/Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL
ollama run hf.co/0xSero/GLM-5-REAP-50pct-UD-IQ2_M-GGUF:UD-IQ2_M

It always results in an error. such as

llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35moe'
#or
llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'glm-dsa'

My Question: Is there a way to force Ollama to load these models locally instead of attempting to use cloud mode?

Has anyone successfully run such large models locally using Ollama with Hugging Face checkpoints?

extent analysis

TL;DR

Check if Ollama supports the specific quantized model architectures ('qwen35moe' and 'glm-dsa') and consider updating Ollama or using a different loader to support local inference of these models.

Guidance

Verify that Ollama is configured to use local mode instead of cloud mode by checking the documentation or configuration options.
Check the Hugging Face model cards for 'Qwen3.6-35B-A3B-GGUF:UD-Q8_K_XL' and 'GLM-5-REAP-50pct-UD-IQ2_M' to ensure they are compatible with Ollama and local inference.
Investigate if there are any known issues or limitations with loading quantized models in Ollama, specifically with the 'qwen35moe' and 'glm-dsa' architectures.
Consider reaching out to the Ollama community or developers to ask about support for these specific models and architectures.

Notes

The error messages suggest that Ollama does not recognize the model architectures, which may indicate a limitation or incompatibility with the quantized models from Hugging Face.

Recommendation

Apply workaround: Try to update Ollama to the latest version or use a different model loader that supports the specific architectures, as it seems that the current version of Ollama may not support the 'qwen35moe' and 'glm-dsa' models.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Any way to run large model locally instead of cloud? GLM, Deepseek v4, minimax, etc. [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Any way to run large model locally instead of cloud? GLM, Deepseek v4, minimax, etc. [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING