ollama - 💡(How to fix) Fix 500 Internal Server Error when loading hf.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M via ollama run [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14631Fetched 2026-04-08 00:33:34
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
closed ×1labeled ×1

Error Message

I'm encountering a 500 Internal Server Error when attempting to run a quantized GGUF model from Hugging Face using Ollama. arch ollama[1158318]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'

Code Example

arch ollama[1158318]: llama_model_loader: - type q4_K:  203 tensors
arch ollama[1158318]: llama_model_loader: - type q5_K:   24 tensors
arch ollama[1158318]: llama_model_loader: - type q6_K:   22 tensors
arch ollama[1158318]: print_info: file format = GGUF V3 (latest)
arch ollama[1158318]: print_info: file type   = Q4_K - Medium
arch ollama[1158318]: print_info: file size   = 2.51 GiB (5.13 BPW)
arch ollama[1158318]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
arch ollama[1158318]: llama_model_load_from_file_impl: failed to load model
RAW_BUFFERClick to expand / collapse

What is the issue?

I'm encountering a 500 Internal Server Error when attempting to run a quantized GGUF model from Hugging Face using Ollama.

arch ollama[1158318]: llama_model_loader: - type q4_K:  203 tensors
arch ollama[1158318]: llama_model_loader: - type q5_K:   24 tensors
arch ollama[1158318]: llama_model_loader: - type q6_K:   22 tensors
arch ollama[1158318]: print_info: file format = GGUF V3 (latest)
arch ollama[1158318]: print_info: file type   = Q4_K - Medium
arch ollama[1158318]: print_info: file size   = 2.51 GiB (5.13 BPW)
arch ollama[1158318]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
arch ollama[1158318]: llama_model_load_from_file_impl: failed to load model

Ollama version

0.17.6

extent analysis

Fix Plan

The fix involves updating the Ollama version to support the latest GGUF model architecture.

Steps to Fix

  • Update Ollama to the latest version using the following command:
pip install --upgrade ollama
  • If the issue persists, try specifying the model architecture explicitly when loading the model:
from ollama import LLaMA

model = LLaMA.from_pretrained('path/to/model', architecture='qwen35')

Alternatively, you can try updating the ollama configuration file to include the missing architecture:

# ollama_config.json
{
    "architectures": {
        "qwen35": {
            "type": "q4_K",
            "num_tensors": 203
        }
    }
}

Verification

After applying the fix, verify that the model loads successfully by running the following command:

from ollama import LLaMA

model = LLaMA.from_pretrained('path/to/model')
print(model)

If the model loads successfully, you should see the model's architecture and configuration printed to the console.

Extra Tips

  • Make sure to check the Ollama documentation for the latest supported architectures and models.
  • If you're using a custom model, ensure that it's compatible with the latest Ollama version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING