ollama - 💡(How to fix) Fix 500 Internal Server Error when loading hf.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M via ollama run [1 participants]

ollama2026-03-05 04:03:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14631•Fetched 2026-04-08 00:33:34

View on GitHub

Comments

Participants

Timeline

Reactions

Author

pinghe

Participants

pinghe

Timeline (top)

closed ×1labeled ×1

Error Message

I'm encountering a 500 Internal Server Error when attempting to run a quantized GGUF model from Hugging Face using Ollama. arch ollama[1158318]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'

Code Example

arch ollama[1158318]: llama_model_loader: - type q4_K:  203 tensors
arch ollama[1158318]: llama_model_loader: - type q5_K:   24 tensors
arch ollama[1158318]: llama_model_loader: - type q6_K:   22 tensors
arch ollama[1158318]: print_info: file format = GGUF V3 (latest)
arch ollama[1158318]: print_info: file type   = Q4_K - Medium
arch ollama[1158318]: print_info: file size   = 2.51 GiB (5.13 BPW)
arch ollama[1158318]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
arch ollama[1158318]: llama_model_load_from_file_impl: failed to load model

RAW_BUFFERClick to expand / collapse

What is the issue?

I'm encountering a 500 Internal Server Error when attempting to run a quantized GGUF model from Hugging Face using Ollama.

arch ollama[1158318]: llama_model_loader: - type q4_K:  203 tensors
arch ollama[1158318]: llama_model_loader: - type q5_K:   24 tensors
arch ollama[1158318]: llama_model_loader: - type q6_K:   22 tensors
arch ollama[1158318]: print_info: file format = GGUF V3 (latest)
arch ollama[1158318]: print_info: file type   = Q4_K - Medium
arch ollama[1158318]: print_info: file size   = 2.51 GiB (5.13 BPW)
arch ollama[1158318]: llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35'
arch ollama[1158318]: llama_model_load_from_file_impl: failed to load model

Ollama version

0.17.6

extent analysis

Fix Plan

The fix involves updating the Ollama version to support the latest GGUF model architecture.

Steps to Fix

Update Ollama to the latest version using the following command:

pip install --upgrade ollama

If the issue persists, try specifying the model architecture explicitly when loading the model:

from ollama import LLaMA

model = LLaMA.from_pretrained('path/to/model', architecture='qwen35')

Alternatively, you can try updating the ollama configuration file to include the missing architecture:

# ollama_config.json
{
    "architectures": {
        "qwen35": {
            "type": "q4_K",
            "num_tensors": 203
        }
    }
}

Verification

After applying the fix, verify that the model loads successfully by running the following command:

from ollama import LLaMA

model = LLaMA.from_pretrained('path/to/model')
print(model)

If the model loads successfully, you should see the model's architecture and configuration printed to the console.

Extra Tips

Make sure to check the Ollama documentation for the latest supported architectures and models.
If you're using a custom model, ensure that it's compatible with the latest Ollama version.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix 500 Internal Server Error when loading hf.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M via ollama run [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Ollama version

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix 500 Internal Server Error when loading hf.co/HauhauCS/Qwen3.5-4B-Uncensored-HauhauCS-Aggressive:Q4_K_M via ollama run [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Ollama version

extent analysis

Fix Plan

Steps to Fix

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING