ollama - 💡(How to fix) Fix Ryzen 395+ Running Qwen3.5:122b [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15709Fetched 2026-04-20 11:59:08
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Code Example

time=2026-04-19T15:09:32.904-05:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-04-19T15:09:32.904-05:00 level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false"
time=2026-04-19T15:09:32.906-05:00 level=INFO source=images.go:517 msg="total blobs: 14"
time=2026-04-19T15:09:32.907-05:00 level=INFO source=images.go:524 msg="total unused blobs removed: 0"
time=2026-04-19T15:09:32.907-05:00 level=INFO source=routes.go:1810 msg="Listening on 127.0.0.1:11434 (version 0.21.0)"
time=2026-04-19T15:09:32.908-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-19T15:09:32.916-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54833"
time=2026-04-19T15:09:33.169-05:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-04-19T15:09:33.170-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54838"
time=2026-04-19T15:09:33.241-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54843"
time=2026-04-19T15:09:33.359-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54848"
time=2026-04-19T15:09:33.774-05:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon(TM) 8060S Graphics" libdirs=ollama,rocm driver=60551.38 pci_id=0000:f4:00.0 type=iGPU total="96.0 GiB" available="95.2 GiB"
time=2026-04-19T15:09:33.774-05:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="96.0 GiB" default_num_ctx=262144
[GIN] 2026/04/19 - 15:09:33 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/19 - 15:09:33 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/19 - 15:09:33 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/19 - 15:09:33 | 200 |      2.5985ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:09:33 | 200 |    151.8137ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/19 - 15:09:33 | 401 |     164.061ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/04/19 - 15:09:33 | 401 |    177.0969ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/04/19 - 15:09:41 | 200 |    127.1828ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/19 - 15:10:03 | 200 |      2.6512ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:10:20 | 200 |       1.561ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:10:20 | 200 |    131.4056ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/19 - 15:10:20 | 200 |    108.6315ms |       127.0.0.1 | POST     "/api/show"
time=2026-04-19T15:10:20.796-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60960"
time=2026-04-19T15:10:21.038-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-04-19T15:10:21.038-05:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-04-19T15:10:21.115-05:00 level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-19T15:10:21.116-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\XXXXX\\.ollama\\models\\blobs\\sha256-93c83617a40560a61cda911ee327efdb5b5fbd39caa8b777a4ec565c0af1af3d --port 60965"
time=2026-04-19T15:10:21.121-05:00 level=INFO source=sched.go:484 msg="system memory" total="31.6 GiB" free="20.3 GiB" free_swap="27.1 GiB"
time=2026-04-19T15:10:21.121-05:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="94.7 GiB" free="95.1 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-19T15:10:21.121-05:00 level=INFO source=server.go:771 msg="loading model" "model layers"=49 requested=-1
time=2026-04-19T15:10:21.160-05:00 level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-19T15:10:21.160-05:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:60965"
time=2026-04-19T15:10:21.163-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-19T15:10:21.194-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=2105 num_key_values=57
load_backend: loaded CPU backend from C:\Users\XXXX\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon(TM) 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\XXXX\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-04-19T15:10:21.233-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-04-19T15:10:22.479-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 77169.60 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate ROCm0 buffer of size 80918192768
time=2026-04-19T15:10:22.518-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.10
time=2026-04-19T15:10:22.518-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.20
time=2026-04-19T15:10:22.519-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.30
time=2026-04-19T15:10:22.519-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:43[ID:0 Layers:43(5..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 66616.32 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate ROCm0 buffer of size 69852278784
time=2026-04-19T15:10:22.552-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.40
time=2026-04-19T15:10:22.553-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:37[ID:0 Layers:37(11..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 256.00 MiB on device 0: cudaMalloc failed: out of memory
time=2026-04-19T15:10:49.466-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:32[ID:0 Layers:32(16..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/04/19 - 15:10:50 | 200 |      2.6653ms |       127.0.0.1 | GET      "/api/tags"
time=2026-04-19T15:11:13.404-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:30[ID:0 Layers:30(18..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/04/19 - 15:11:20 | 200 |      4.2938ms |       127.0.0.1 | GET      "/api/tags"
time=2026-04-19T15:11:33.611-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:30[ID:0 Layers:30(18..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="45.4 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="30.4 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="6.2 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.4 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="3.9 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:272 msg="total memory" size="90.6 GiB"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-19T15:11:33.612-05:00 level=INFO source=ggml.go:482 msg="offloading 30 repeating layers to GPU"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=ggml.go:494 msg="offloaded 30/49 layers to GPU"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-04-19T15:11:33.614-05:00 level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
[GIN] 2026/04/19 - 15:11:51 | 200 |    278.1814ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:12:22 | 200 |    146.3501ms |       127.0.0.1 | GET      "/api/tags"
time=2026-04-19T15:12:25.709-05:00 level=INFO source=server.go:1402 msg="llama runner started in 124.59 seconds"
RAW_BUFFERClick to expand / collapse

What is the issue?

Got a new Ryzen 395+ Setup with an 8060S GPU.

I used the included AMD configuration tool to set the GPU memory to 96GB out of the 128GB available.

I was testing different models I've used with my NVIDIA card, and most seem to be fine (Gemma4 was a little slow). I decided to try qwen3.5:122b - and at first from the CLI it crashed (I don't have the log for this one) I think then restarted Ollama and ran the GUI with the same model - it loaded, but only loaded about 50GB to the "GPU" and started using almost all of the reserved "system" RAM, and then was using swap it seemed for the rest.

Since this is my first time with the Ryzen 395+ is this expected or is there some other configuration needed to make this work where its all loaded into "GPU" ram since Ollama ps seems to say it should only have been using about 97GB total?

qwen3.5:122b 8b9d11d807c5 97 GB 39%/61% CPU/GPU 262144 4 minutes from now

Relevant log output

time=2026-04-19T15:09:32.904-05:00 level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:INFO OLLAMA_DEBUG_LOG_REQUESTS:false OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\XXXX\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-04-19T15:09:32.904-05:00 level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false"
time=2026-04-19T15:09:32.906-05:00 level=INFO source=images.go:517 msg="total blobs: 14"
time=2026-04-19T15:09:32.907-05:00 level=INFO source=images.go:524 msg="total unused blobs removed: 0"
time=2026-04-19T15:09:32.907-05:00 level=INFO source=routes.go:1810 msg="Listening on 127.0.0.1:11434 (version 0.21.0)"
time=2026-04-19T15:09:32.908-05:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-04-19T15:09:32.916-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54833"
time=2026-04-19T15:09:33.169-05:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-04-19T15:09:33.170-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54838"
time=2026-04-19T15:09:33.241-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54843"
time=2026-04-19T15:09:33.359-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 54848"
time=2026-04-19T15:09:33.774-05:00 level=INFO source=types.go:42 msg="inference compute" id=0 filter_id=0 library=ROCm compute=gfx1151 name=ROCm0 description="AMD Radeon(TM) 8060S Graphics" libdirs=ollama,rocm driver=60551.38 pci_id=0000:f4:00.0 type=iGPU total="96.0 GiB" available="95.2 GiB"
time=2026-04-19T15:09:33.774-05:00 level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="96.0 GiB" default_num_ctx=262144
[GIN] 2026/04/19 - 15:09:33 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/19 - 15:09:33 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/19 - 15:09:33 | 200 |            0s |       127.0.0.1 | GET      "/api/version"
[GIN] 2026/04/19 - 15:09:33 | 200 |      2.5985ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:09:33 | 200 |    151.8137ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/19 - 15:09:33 | 401 |     164.061ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/04/19 - 15:09:33 | 401 |    177.0969ms |       127.0.0.1 | POST     "/api/me"
[GIN] 2026/04/19 - 15:09:41 | 200 |    127.1828ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/19 - 15:10:03 | 200 |      2.6512ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:10:20 | 200 |       1.561ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:10:20 | 200 |    131.4056ms |       127.0.0.1 | POST     "/api/show"
[GIN] 2026/04/19 - 15:10:20 | 200 |    108.6315ms |       127.0.0.1 | POST     "/api/show"
time=2026-04-19T15:10:20.796-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 60960"
time=2026-04-19T15:10:21.038-05:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-04-19T15:10:21.038-05:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=16 efficiency=0 threads=32
time=2026-04-19T15:10:21.115-05:00 level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-04-19T15:10:21.116-05:00 level=INFO source=server.go:444 msg="starting runner" cmd="C:\\Users\\XXXXX\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\XXXXX\\.ollama\\models\\blobs\\sha256-93c83617a40560a61cda911ee327efdb5b5fbd39caa8b777a4ec565c0af1af3d --port 60965"
time=2026-04-19T15:10:21.121-05:00 level=INFO source=sched.go:484 msg="system memory" total="31.6 GiB" free="20.3 GiB" free_swap="27.1 GiB"
time=2026-04-19T15:10:21.121-05:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="94.7 GiB" free="95.1 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-04-19T15:10:21.121-05:00 level=INFO source=server.go:771 msg="loading model" "model layers"=49 requested=-1
time=2026-04-19T15:10:21.160-05:00 level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-04-19T15:10:21.160-05:00 level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:60965"
time=2026-04-19T15:10:21.163-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-19T15:10:21.194-05:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35moe file_type=Q4_K_M name="" description="" num_tensors=2105 num_key_values=57
load_backend: loaded CPU backend from C:\Users\XXXX\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-icelake.dll
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon(TM) 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32, ID: 0
load_backend: loaded ROCm backend from C:\Users\XXXX\AppData\Local\Programs\Ollama\lib\ollama\rocm\ggml-hip.dll
time=2026-04-19T15:10:21.233-05:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 ROCm.0.NO_VMM=1 ROCm.0.NO_PEER_COPY=1 ROCm.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
time=2026-04-19T15:10:22.479-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:49[ID:0 Layers:49(0..48)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 77169.60 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate ROCm0 buffer of size 80918192768
time=2026-04-19T15:10:22.518-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.10
time=2026-04-19T15:10:22.518-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.20
time=2026-04-19T15:10:22.519-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.30
time=2026-04-19T15:10:22.519-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:43[ID:0 Layers:43(5..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 66616.32 MiB on device 0: cudaMalloc failed: out of memory
alloc_tensor_range: failed to allocate ROCm0 buffer of size 69852278784
time=2026-04-19T15:10:22.552-05:00 level=INFO source=server.go:893 msg="model layout did not fit, applying backoff" backoff=0.40
time=2026-04-19T15:10:22.553-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:37[ID:0 Layers:37(11..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
ggml_backend_cuda_buffer_type_alloc_buffer: allocating 256.00 MiB on device 0: cudaMalloc failed: out of memory
time=2026-04-19T15:10:49.466-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:32[ID:0 Layers:32(16..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/04/19 - 15:10:50 | 200 |      2.6653ms |       127.0.0.1 | GET      "/api/tags"
time=2026-04-19T15:11:13.404-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:30[ID:0 Layers:30(18..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
[GIN] 2026/04/19 - 15:11:20 | 200 |      4.2938ms |       127.0.0.1 | GET      "/api/tags"
time=2026-04-19T15:11:33.611-05:00 level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:262144 KvCacheType: NumThreads:16 GPULayers:30[ID:0 Layers:30(18..47)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="45.4 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="30.4 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="6.2 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.4 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="3.9 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-04-19T15:11:33.612-05:00 level=INFO source=device.go:272 msg="total memory" size="90.6 GiB"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-04-19T15:11:33.612-05:00 level=INFO source=ggml.go:482 msg="offloading 30 repeating layers to GPU"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=ggml.go:494 msg="offloaded 30/49 layers to GPU"
time=2026-04-19T15:11:33.613-05:00 level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-04-19T15:11:33.614-05:00 level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
[GIN] 2026/04/19 - 15:11:51 | 200 |    278.1814ms |       127.0.0.1 | GET      "/api/tags"
[GIN] 2026/04/19 - 15:12:22 | 200 |    146.3501ms |       127.0.0.1 | GET      "/api/tags"
time=2026-04-19T15:12:25.709-05:00 level=INFO source=server.go:1402 msg="llama runner started in 124.59 seconds"

OS

Windows

GPU

AMD

CPU

AMD

Ollama version

0.21.0

extent analysis

TL;DR

The issue is likely due to insufficient GPU memory allocation for the model, causing it to spill over to system RAM and swap, and a potential solution is to adjust the GPU memory allocation or model configuration.

Guidance

  • The log output indicates that the model is trying to allocate a large amount of memory on the GPU, but is failing due to insufficient memory, as seen in the cudaMalloc failed: out of memory errors.
  • The ggml_backend_cuda_buffer_type_alloc_buffer messages show that the model is trying to allocate large buffers on the GPU, but is being limited by the available memory.
  • The time=2026-04-19T15:10:21.121-05:00 level=INFO source=sched.go:491 msg="gpu memory" id=0 library=ROCm available="94.7 GiB" free="95.1 GiB" minimum="457.0 MiB" overhead="0 B" line suggests that the GPU has 94.7 GiB of available memory, but the model is trying to allocate more than this.
  • Adjusting the OLLAMA_GPU_OVERHEAD environment variable or the model configuration to reduce the memory allocation requirements may help resolve the issue.

Example

No code example is provided as the issue is related to configuration and memory allocation.

Notes

The issue may be specific to the Windows operating system and the AMD GPU and CPU configuration. Further debugging and testing may be required to determine the root cause and optimal solution.

Recommendation

Apply a workaround by adjusting the GPU memory allocation or model configuration to reduce the memory requirements, as the current configuration is causing the model to exceed the available GPU memory.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING