ollama - 💡(How to fix) Fix ollama run hf.co/lmstudio-community/Qwen3.5-9B-GGUF:Q8_0 - Error: 500 Internal Server Error: unable to load model [1 participants]

ollama2026-03-05 09:58:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14636•Fetched 2026-04-08 00:33:28

View on GitHub

Comments

Participants

Timeline

Reactions

Author

DjceUo

Participants

DjceUo

Timeline (top)

closed ×1labeled ×1

Error Message

time=2026-03-05T12:56:27.925+03:00 level=ERROR source=ui.go:1524 msg="failed to get inference info" error="timeout scanning server log for inference compute details" time=2026-03-05T12:56:27.925+03:00 level=ERROR source=ui.go:241 msg=site.serveHTTP error="failed to get inference info: timeout scanning server log for inference compute details" http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=500 http.d=502.9763ms request_id=1772704587422361700 version=0.17.6 time=2026-03-05T12:56:27.209+03:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=0 time=2026-03-05T12:56:27.209+03:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' time=2026-03-05T12:56:35.697+03:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=e:.ollama\models\blobs\sha256-096a84703a334662086e4b46b6a4dd896bad10a46f67cd84a53cfe1420b24717 error="unable to load model: e:\.ollama\models\blobs\sha256-096a84703a334662086e4b46b6a4dd896bad10a46f67cd84a53cfe1420b24717"

RAW_BUFFERClick to expand / collapse

What is the issue?

app.log time=2026-03-05T12:56:26.020+03:00 level=INFO source=app_windows.go:282 msg="starting Ollama" app=C:\Ollama version=0.17.6 OS=Windows/10.0.19044 time=2026-03-05T12:56:26.046+03:00 level=INFO source=app.go:239 msg="initialized tools registry" tool_count=0 time=2026-03-05T12:56:26.061+03:00 level=INFO source=app.go:254 msg="starting ollama server" time=2026-03-05T12:56:26.061+03:00 level=INFO source=app.go:285 msg="starting ui server" port=58498 time=2026-03-05T12:56:27.413+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=1.1008ms request_id=1772704587412289300 version=0.17.6 time=2026-03-05T12:56:27.420+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/cloud http.pattern="GET /api/v1/cloud" http.status=200 http.d=503.9µs request_id=1772704587419737900 version=0.17.6 time=2026-03-05T12:56:27.430+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/chats http.pattern="GET /api/v1/chats" http.status=200 http.d=999.9µs request_id=1772704587429359300 version=0.17.6 time=2026-03-05T12:56:27.847+03:00 level=INFO source=ui.go:159 msg="configuring ollama proxy" target=http://127.0.0.1:11434 time=2026-03-05T12:56:27.925+03:00 level=ERROR source=ui.go:1524 msg="failed to get inference info" error="timeout scanning server log for inference compute details" time=2026-03-05T12:56:27.925+03:00 level=ERROR source=ui.go:241 msg=site.serveHTTP error="failed to get inference info: timeout scanning server log for inference compute details" http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=500 http.d=502.9763ms request_id=1772704587422361700 version=0.17.6 time=2026-03-05T12:56:28.934+03:00 level=INFO source=server.go:362 msg=Matched "inference compute"="{Library:CUDA Variant: Compute:8.6 Driver:12.7 Name:CUDA0 VRAM:24.0 GiB}" time=2026-03-05T12:56:28.936+03:00 level=INFO source=server.go:373 msg="Matched default context length" default_num_ctx=32768 time=2026-03-05T12:56:28.936+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/inference-compute http.pattern="GET /api/v1/inference-compute" http.status=200 http.d=1.7921ms request_id=1772704588934752100 version=0.17.6 time=2026-03-05T12:56:28.948+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1772704588948846200 version=0.17.6 time=2026-03-05T12:56:28.949+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=303.4µs request_id=1772704588949359800 version=0.17.6 time=2026-03-05T12:56:28.949+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=0s request_id=1772704588949663200 version=0.17.6 time=2026-03-05T12:56:28.950+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=44.7µs request_id=1772704588950168100 version=0.17.6 time=2026-03-05T12:56:28.950+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=37.2µs request_id=1772704588950175600 version=0.17.6 time=2026-03-05T12:56:28.952+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=POST http.path=/api/v1/settings http.pattern="POST /api/v1/settings" http.status=200 http.d=577.1µs request_id=1772704588952348400 version=0.17.6 time=2026-03-05T12:56:28.957+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1772704588957892800 version=0.17.6 time=2026-03-05T12:56:28.962+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1772704588962655400 version=0.17.6 time=2026-03-05T12:56:28.965+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1772704588965601700 version=0.17.6 time=2026-03-05T12:56:28.968+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=297µs request_id=1772704588968113100 version=0.17.6 time=2026-03-05T12:56:28.971+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=114.5µs request_id=1772704588970951300 version=0.17.6 time=2026-03-05T12:56:28.973+03:00 level=INFO source=ui.go:241 msg=site.serveHTTP http.method=GET http.path=/api/v1/settings http.pattern="GET /api/v1/settings" http.status=200 http.d=0s request_id=1772704588973674900 version=0.17.6 time=2026-03-05T12:56:29.062+03:00 level=INFO source=updater.go:296 msg="beginning update checker" interval=1h0m0s

server.log time=2026-03-05T12:56:27.143+03:00 level=INFO source=routes.go:1664 msg="server config" env="map[CUDA_VISIBLE_DEVICES:0 GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:true OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:12h0m0s OLLAMA_KV_CACHE_TYPE:q8_0 OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:1 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:e:\.ollama\models\ OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[* http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-05T12:56:27.165+03:00 level=INFO source=routes.go:1666 msg="Ollama cloud disabled: false" time=2026-03-05T12:56:27.178+03:00 level=INFO source=images.go:477 msg="total blobs: 180" time=2026-03-05T12:56:27.186+03:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-05T12:56:27.191+03:00 level=INFO source=routes.go:1719 msg="Listening on 127.0.0.1:11434 (version 0.17.6)" time=2026-03-05T12:56:27.191+03:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-05T12:56:27.209+03:00 level=WARN source=runner.go:485 msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=0 time=2026-03-05T12:56:27.209+03:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again" time=2026-03-05T12:56:27.218+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Ollama\ollama.exe runner --ollama-engine --port 58603" time=2026-03-05T12:56:27.318+03:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-05T12:56:27.319+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Ollama\ollama.exe runner --ollama-engine --port 58614" time=2026-03-05T12:56:27.522+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Ollama\ollama.exe runner --ollama-engine --port 58628" time=2026-03-05T12:56:27.661+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Ollama\ollama.exe runner --ollama-engine --port 58635" time=2026-03-05T12:56:27.847+03:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-43d944cd-e7c6-c3e9-4414-6ab0ef6870b2 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v12 driver=12.7 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="22.5 GiB" time=2026-03-05T12:56:27.847+03:00 level=INFO source=routes.go:1769 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768 [GIN] 2026/03/05 - 12:56:27 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/05 - 12:56:27 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/05 - 12:56:27 | 200 | 0s | 127.0.0.1 | GET "/api/version" [GIN] 2026/03/05 - 12:56:27 | 200 | 12.5085ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/05 - 12:56:28 | 401 | 215.3313ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/05 - 12:56:28 | 401 | 226.9811ms | 127.0.0.1 | POST "/api/me" [GIN] 2026/03/05 - 12:56:28 | 404 | 6.3582ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/05 - 12:56:34 | 200 | 0s | 127.0.0.1 | HEAD "/" [GIN] 2026/03/05 - 12:56:34 | 200 | 155.5336ms | 127.0.0.1 | POST "/api/show" [GIN] 2026/03/05 - 12:56:35 | 200 | 152.2485ms | 127.0.0.1 | POST "/api/show" time=2026-03-05T12:56:35.311+03:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Ollama\ollama.exe runner --ollama-engine --port 58658" time=2026-03-05T12:56:35.497+03:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-05T12:56:35.497+03:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16 llama_model_loader: loaded meta data with 34 key-value pairs and 427 tensors from e:.ollama\models\blobs\sha256-096a84703a334662086e4b46b6a4dd896bad10a46f67cd84a53cfe1420b24717 (version GGUF V3 (latest)) llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. llama_model_loader: - kv 0: general.architecture str = qwen35 llama_model_loader: - kv 1: general.type str = model llama_model_loader: - kv 2: general.name str = Qwen_Qwen3.5 9B llama_model_loader: - kv 3: general.basename str = Qwen_Qwen3.5 llama_model_loader: - kv 4: general.size_label str = 9B llama_model_loader: - kv 5: qwen35.block_count u32 = 32 llama_model_loader: - kv 6: qwen35.context_length u32 = 262144 llama_model_loader: - kv 7: qwen35.embedding_length u32 = 4096 llama_model_loader: - kv 8: qwen35.feed_forward_length u32 = 12288 llama_model_loader: - kv 9: qwen35.attention.head_count u32 = 16 llama_model_loader: - kv 10: qwen35.attention.head_count_kv u32 = 4 llama_model_loader: - kv 11: qwen35.rope.dimension_sections arr[i32,4] = [11, 11, 10, 0] llama_model_loader: - kv 12: qwen35.rope.freq_base f32 = 10000000.000000 llama_model_loader: - kv 13: qwen35.attention.layer_norm_rms_epsilon f32 = 0.000001 llama_model_loader: - kv 14: qwen35.attention.key_length u32 = 256 llama_model_loader: - kv 15: qwen35.attention.value_length u32 = 256 llama_model_loader: - kv 16: qwen35.ssm.conv_kernel u32 = 4 llama_model_loader: - kv 17: qwen35.ssm.state_size u32 = 128 llama_model_loader: - kv 18: qwen35.ssm.group_count u32 = 16 llama_model_loader: - kv 19: qwen35.ssm.time_step_rank u32 = 32 llama_model_loader: - kv 20: qwen35.ssm.inner_size u32 = 4096 llama_model_loader: - kv 21: qwen35.full_attention_interval u32 = 4 llama_model_loader: - kv 22: qwen35.rope.dimension_count u32 = 64 llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2 llama_model_loader: - kv 24: tokenizer.ggml.pre str = qwen35 llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,248320] = ["!", """, "#", "$", "%", "&", "'", ... llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,248320] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... llama_model_loader: - kv 27: tokenizer.ggml.merges arr[str,247587] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",... llama_model_loader: - kv 28: tokenizer.ggml.eos_token_id u32 = 248046 llama_model_loader: - kv 29: tokenizer.ggml.padding_token_id u32 = 248044 llama_model_loader: - kv 30: tokenizer.ggml.add_bos_token bool = false llama_model_loader: - kv 31: tokenizer.chat_template str = {%- set image_count = namespace(value... llama_model_loader: - kv 32: general.quantization_version u32 = 2 llama_model_loader: - kv 33: general.file_type u32 = 7 llama_model_loader: - type f32: 177 tensors llama_model_loader: - type q8_0: 250 tensors print_info: file format = GGUF V3 (latest) print_info: file type = Q8_0 print_info: file size = 8.86 GiB (8.50 BPW) llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'qwen35' llama_model_load_from_file_impl: failed to load model time=2026-03-05T12:56:35.697+03:00 level=INFO source=sched.go:471 msg="NewLlamaServer failed" model=e:.ollama\models\blobs\sha256-096a84703a334662086e4b46b6a4dd896bad10a46f67cd84a53cfe1420b24717 error="unable to load model: e:\.ollama\models\blobs\sha256-096a84703a334662086e4b46b6a4dd896bad10a46f67cd84a53cfe1420b24717" [GIN] 2026/03/05 - 12:56:35 | 500 | 568.7333ms | 127.0.0.1 | POST "/api/generate"

Relevant log output

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.17.6

extent analysis

Fix Plan

The issue seems to be related to the model loading failure due to an unknown model architecture 'qwen35'. To fix this, we need to update the model loader to support the 'qwen35' architecture or use a different model that is supported.

Here are the steps to fix the issue:

Update the model loader to support the 'qwen35' architecture:
- Check the model loader code and add support for the 'qwen35' architecture.
- Update the model loader to handle the new architecture.
Use a different model that is supported:
- Check the available models and select one that is supported by the model loader.
- Update the model path in the configuration to point to the new model.

Example code to update the model loader:

// Update the model loader to support the 'qwen35' architecture
func loadModel(architecture string) error {
    // ...
    case "qwen35":
        // Add support for the 'qwen35' architecture
        // ...
    default:
        return errors.New("unknown model architecture")
}

Verification

To verify that the fix worked, restart the Ollama server and check the logs for any errors related to model loading. If the model loads successfully, the server should start without any issues.

Extra Tips

Make sure to update the model loader to support the 'qwen35' architecture or use a different model that is supported.
Check the model loader code and configuration to ensure that it is correct and up-to-date.
If the issue persists, try debugging the model loader code to identify the root cause of the problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix ollama run hf.co/lmstudio-community/Qwen3.5-9B-GGUF:Q8_0 - Error: 500 Internal Server Error: unable to load model [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix ollama run hf.co/lmstudio-community/Qwen3.5-9B-GGUF:Q8_0 - Error: 500 Internal Server Error: unable to load model [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING