ollama - 💡(How to fix) Fix Qwen3.5:9b rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98 ggml_cuda_compute_forward: SOLVE_TRI failed ROCm error: invalid device function [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15343Fetched 2026-04-08 02:52:30
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
closed ×1cross-referenced ×1labeled ×1

Error Message

time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2026-03-30T13:42:49.544-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 56996" time=2026-03-30T13:42:49.544-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\Program Files\AMD\ROCm\6.4\" HIP_PATH_64="C:\Program Files\AMD\ROCm\6.4\" HIP_PATH_71="C:\Program Files\AMD\ROCm\7.1\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Users\DaveyBoneZ\AppData\Local\AMD\AI_Bundle\VSCode\bin;C:\Program Files\Git\cmd;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\PowerShell\7\;C:\Program Files\AMD\ROCm\7.1\bin;C:\Users\DaveyBoneZ\AppData\Local\Programs\Python\Launcher\;C:\Users\DaveyBoneZ\AppData\Local\Microsoft\WindowsApps;C:\Users\DaveyBoneZ\AppData\Local\AMD\AI_Bundle\Ollama;C:\Users\DaveyBoneZ\.lmstudio\bin;C:\Users\DaveyBoneZ\AppData\Local\Python\bin;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=367.4097ms OLLAMA_LIBRARY_PATH="[C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:0] time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=367.9302ms time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16 time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:229 msg="loading first model" model=C:\Users\DaveyBoneZ.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c time=2026-03-30T13:42:49.977-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-30T13:42:50.013-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-30T13:42:50.017-04:00 level=INFO source=server.go:247 msg="enabling flash attention" time=2026-03-30T13:42:50.018-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 57002" time=2026-03-30T13:42:50.018-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\Program Files\AMD\ROCm\6.4\" HIP_PATH_64="C:\Program Files\AMD\ROCm\6.4\" HIP_PATH_71="C:\Program Files\AMD\ROCm\7.1\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Users\DaveyBoneZ\AppData\Local\AMD\AI_Bundle\VSCode\bin;C:\Program Files\Git\cmd;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\PowerShell\7\;C:\Program Files\AMD\ROCm\7.1\bin;C:\Users\DaveyBoneZ\AppData\Local\Programs\Python\Launcher\;C:\Users\DaveyBoneZ\AppData\Local\Microsoft\WindowsApps;C:\Users\DaveyBoneZ\AppData\Local\AMD\AI_Bundle\Ollama;C:\Users\DaveyBoneZ\.lmstudio\bin;C:\Users\DaveyBoneZ\AppData\Local\Python\bin;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0 time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:484 msg="system memory" total="31.9 GiB" free="22.8 GiB" free_swap="25.3 GiB" time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="14.4 GiB" free="14.8 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-30T13:42:50.021-04:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1 time=2026-03-30T13:42:50.051-04:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-30T13:42:50.052-04:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:57002" time=2026-03-30T13:42:50.063-04:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-30T13:42:50.100-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-30T13:42:50.103-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52 time=2026-03-30T13:42:50.103-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll time=2026-03-30T13:42:50.116-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: 0 load_backend: loaded ROCm backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll time=2026-03-30T13:42:50.142-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang) time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-30T13:42:50.544-04:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1 rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98 ggml_cuda_compute_forward: SOLVE_TRI failed ROCm error: invalid device function current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882 err C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post "http://127.0.0.1:57002/load\": read tcp 127.0.0.1:57007->127.0.0.1:57002: wsarecv: An existing connection was forcibly closed by the remote host." time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post "http://127.0.0.1:57002/load\": dial tcp 127.0.0.1:57002: connectex: No connection could be made because the target machine actively refused it." time=2026-03-30T13:42:51.743-04:00 level=INFO source=sched.go:511 msg="Load failed" model=C:\Users\DaveyBoneZ.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details" time=2026-03-30T13:42:51.743-04:00 level=DEBUG source=server.go:1832 msg="stopping llama server" pid=2444 [GIN] 2026/03/30 - 13:42:51 | 500 | 2.3317189s | 127.0.0.1 | POST "/api/chat" time=2026-03-30T13:42:51.764-04:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1" [GIN] 2026/03/30 - 13:43:19 | 200 | 1.5188ms | 127.0.0.1 | GET "/api/tags" [GIN] 2026/03/30 - 13:43:50 | 200 | 505.2µs | 127.0.0.1 | GET "/api/tags"

Root Cause

time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-30T13:42:49.544-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 56996"
time=2026-03-30T13:42:49.544-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=367.4097ms OLLAMA_LIBRARY_PATH="[C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:0]
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=367.9302ms
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:229 msg="loading first model" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c
time=2026-03-30T13:42:49.977-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.013-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.017-04:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-03-30T13:42:50.018-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\DaveyBoneZ\\.ollama\\models\\blobs\\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 57002"
time=2026-03-30T13:42:50.018-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:484 msg="system memory" total="31.9 GiB" free="22.8 GiB" free_swap="25.3 GiB"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="14.4 GiB" free="14.8 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1
time=2026-03-30T13:42:50.051-04:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-30T13:42:50.052-04:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:57002"
time=2026-03-30T13:42:50.063-04:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-30T13:42:50.100-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-30T13:42:50.103-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
time=2026-03-30T13:42:50.103-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-03-30T13:42:50.116-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-30T13:42:50.142-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.544-04:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
ggml_cuda_compute_forward: SOLVE_TRI failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": read tcp 127.0.0.1:57007->127.0.0.1:57002: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": dial tcp 127.0.0.1:57002: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-30T13:42:51.743-04:00 level=INFO source=sched.go:511 msg="Load failed" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-30T13:42:51.743-04:00 level=DEBUG source=server.go:1832 msg="stopping llama server" pid=2444
[GIN] 2026/03/30 - 13:42:51 | 500 |    2.3317189s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-30T13:42:51.764-04:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"
[GIN] 2026/03/30 - 13:43:19 | 200 |      1.5188ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/30 - 13:43:50 | 200 |       505.2µs |       127.0.0.1 | GET      "/api/tags"

Code Example

time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-30T13:42:49.544-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 56996"
time=2026-03-30T13:42:49.544-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=367.4097ms OLLAMA_LIBRARY_PATH="[C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:0]
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=367.9302ms
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:229 msg="loading first model" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c
time=2026-03-30T13:42:49.977-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.013-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.017-04:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-03-30T13:42:50.018-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\DaveyBoneZ\\.ollama\\models\\blobs\\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 57002"
time=2026-03-30T13:42:50.018-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:484 msg="system memory" total="31.9 GiB" free="22.8 GiB" free_swap="25.3 GiB"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="14.4 GiB" free="14.8 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1
time=2026-03-30T13:42:50.051-04:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-30T13:42:50.052-04:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:57002"
time=2026-03-30T13:42:50.063-04:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-30T13:42:50.100-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-30T13:42:50.103-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
time=2026-03-30T13:42:50.103-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-03-30T13:42:50.116-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-30T13:42:50.142-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.544-04:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
ggml_cuda_compute_forward: SOLVE_TRI failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": read tcp 127.0.0.1:57007->127.0.0.1:57002: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": dial tcp 127.0.0.1:57002: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-30T13:42:51.743-04:00 level=INFO source=sched.go:511 msg="Load failed" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-30T13:42:51.743-04:00 level=DEBUG source=server.go:1832 msg="stopping llama server" pid=2444
[GIN] 2026/03/30 - 13:42:51 | 500 |    2.3317189s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-30T13:42:51.764-04:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"
[GIN] 2026/03/30 - 13:43:19 | 200 |      1.5188ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/30 - 13:43:50 | 200 |       505.2µs |       127.0.0.1 | GET      "/api/tags"
RAW_BUFFERClick to expand / collapse

What is the issue?

Error 500 Internal Server Error: model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details

windows 11 GPU: RX 9060 XT only happens with Qwen models.

Relevant log output

time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-30T13:42:49.541-04:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-30T13:42:49.544-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 56996"
time=2026-03-30T13:42:49.544-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=367.4097ms OLLAMA_LIBRARY_PATH="[C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[HIP_VISIBLE_DEVICES:0]
time=2026-03-30T13:42:49.909-04:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=367.9302ms
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-30T13:42:49.910-04:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=8 efficiency=0 threads=16
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-03-30T13:42:49.910-04:00 level=DEBUG source=sched.go:229 msg="loading first model" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c
time=2026-03-30T13:42:49.977-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.013-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.016-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.017-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.017-04:00 level=INFO source=server.go:247 msg="enabling flash attention"
time=2026-03-30T13:42:50.018-04:00 level=INFO source=server.go:432 msg="starting runner" cmd="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\DaveyBoneZ\\.ollama\\models\\blobs\\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 57002"
time=2026-03-30T13:42:50.018-04:00 level=DEBUG source=server.go:433 msg=subprocess HIP_PATH="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_64="C:\\Program Files\\AMD\\ROCm\\6.4\\" HIP_PATH_71="C:\\Program Files\\AMD\\ROCm\\7.1\\" OLLAMA_CONTEXT_LENGTH=32768 OLLAMA_DEBUG=1 OLLAMA_FLASH_ATTENTION=1 OLLAMA_HOST=0.0.0.0 OLLAMA_KV_CACHE_TYPE=q8_0 PATH="C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\VSCode\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\PowerShell\\7\\;C:\\Program Files\\AMD\\ROCm\\7.1\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Python\\Launcher\\;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\DaveyBoneZ\\AppData\\Local\\AMD\\AI_Bundle\\Ollama;C:\\Users\\DaveyBoneZ\\.lmstudio\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Python\\bin;C:\\Users\\DaveyBoneZ\\AppData\\Local\\Programs\\Ollama" OLLAMA_LIBRARY_PATH=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm HIP_VISIBLE_DEVICES=0
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:484 msg="system memory" total="31.9 GiB" free="22.8 GiB" free_swap="25.3 GiB"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="14.4 GiB" free="14.8 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-30T13:42:50.021-04:00 level=INFO source=server.go:759 msg="loading model" "model layers"=33 requested=-1
time=2026-03-30T13:42:50.051-04:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-30T13:42:50.052-04:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:57002"
time=2026-03-30T13:42:50.063-04:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:32768 KvCacheType:q8_0 NumThreads:8 GPULayers:33[ID:0 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-30T13:42:50.100-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-30T13:42:50.102-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-30T13:42:50.103-04:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values=52
time=2026-03-30T13:42:50.103-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-haswell.dll
time=2026-03-30T13:42:50.116-04:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon RX 9060 XT, gfx1200 (0x1200), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\DaveyBoneZ\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-03-30T13:42:50.142-04:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-30T13:42:50.146-04:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-30T13:42:50.544-04:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
rocBLAS error from hip error code: 'hipErrorInvalidDeviceFunction':98
ggml_cuda_compute_forward: SOLVE_TRI failed
ROCm error: invalid device function
  current device: 0, in function ggml_cuda_compute_forward at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:2882
  err
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": read tcp 127.0.0.1:57007->127.0.0.1:57002: wsarecv: An existing connection was forcibly closed by the remote host."
time=2026-03-30T13:42:51.742-04:00 level=ERROR source=server.go:1207 msg="do load request" error="Post \"http://127.0.0.1:57002/load\": dial tcp 127.0.0.1:57002: connectex: No connection could be made because the target machine actively refused it."
time=2026-03-30T13:42:51.743-04:00 level=INFO source=sched.go:511 msg="Load failed" model=C:\Users\DaveyBoneZ\.ollama\models\blobs\sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c error="model failed to load, this may be due to resource limitations or an internal error, check ollama server logs for details"
time=2026-03-30T13:42:51.743-04:00 level=DEBUG source=server.go:1832 msg="stopping llama server" pid=2444
[GIN] 2026/03/30 - 13:42:51 | 500 |    2.3317189s |       127.0.0.1 | POST     "/api/chat"
time=2026-03-30T13:42:51.764-04:00 level=ERROR source=server.go:304 msg="llama runner terminated" error="exit status 1"
[GIN] 2026/03/30 - 13:43:19 | 200 |      1.5188ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/03/30 - 13:43:50 | 200 |       505.2µs |       127.0.0.1 | GET      "/api/tags"

OS

Win 11 pro

GPU

RX 9060 XT

CPU

AMD Ryzen 7 5800X3D

Ollama version

v0.18.3 > Current

extent analysis

TL;DR

The issue is likely due to resource limitations or an internal error when loading Qwen models with Ollama version v0.18.3 on Windows 11 with an RX 9060 XT GPU.

Guidance

  • Check the Ollama server logs for more detailed error messages to identify the root cause of the issue.
  • Verify that the system has sufficient resources (CPU, memory, and GPU) to load the Qwen models.
  • Try reducing the model complexity or splitting the model into smaller parts to alleviate potential resource constraints.
  • Consider updating Ollama to a newer version, if available, to ensure the latest bug fixes and improvements.

Example

No specific code snippet is applicable in this case, as the issue appears to be related to system resources or internal errors within the Ollama application.

Notes

The provided log output indicates a ROCm error with an invalid device function, which may be related to the GPU or driver configuration. However, without more specific information about the error, it is difficult to provide a precise solution.

Recommendation

Apply a workaround by reducing model complexity or splitting the model into smaller parts to alleviate potential resource constraints, as updating to a fixed version is not explicitly implied in the given information.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING