ollama - 💡(How to fix) Fix ollama is stuck in a loop processing simple request

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

time=2026-05-09T15:53:08.395Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing "max": inval time=2026-05-09T16:02:45.124Z level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-05-09T16:07:42.688Z level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation"

Code Example

curl http://10.30.0.105:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:9b",
    "messages": [
      {
        "role": "user", "content": "Say hello in one sentence"
      }
    ],
    "options": {
      "temperature": 0
    }
  }'

---

curl http://10.30.0.105:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:9b",
    "messages": [
      {
        "role": "user",
        "content": "Say hello in one sentence"
      }
    ],
    "temperature": 0
  }'

---

sudo docker container exec -it ollama ollama --version
ollama version is 0.21.1

I even tried 0.23.1 and found the same issue. 

I tried with following NVIDIA GPU driver versions on A30 GPU
535.288.01 
580.126.20

Finally when command-02 completed execution it ended up using 49,173 tokens 
"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":49158,"total_tokens":49173}}

---

sudo docker logs -f ollama
time=2026-05-09T15:51:52.289Z level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL
: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:fals
e OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_
LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_
ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http:
//localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* h
ttps://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROC
R_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-05-09T15:51:52.289Z level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false"                                                            time=2026-05-09T15:51:52.293Z level=INFO source=images.go:517 msg="total blobs: 49"                                                                          time=2026-05-09T15:51:52.294Z level=INFO source=images.go:524 msg="total unused blobs removed: 0"                                                            time=2026-05-09T15:51:52.295Z level=INFO source=routes.go:1810 msg="Listening on [::]:11434 (version 0.21.1)"                                                time=2026-05-09T15:51:52.295Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"                                                                   time=2026-05-09T15:51:52.296Z level=INFO source=runner.go:67 msg="discovering available GPUs..."                                                             time=2026-05-09T15:51:52.296Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38761"                time=2026-05-09T15:51:52.296Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13                                                                                                                           time=2026-05-09T15:51:52.446Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=150.290886ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]                                                                                                                     time=2026-05-09T15:51:52.446Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"   
time=2026-05-09T15:51:52.447Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34523"                time=2026-05-09T15:51:52.447Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12                                                                                                                           time=2026-05-09T15:51:52.721Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=274.989771ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]                                                                                                                     time=2026-05-09T15:51:52.721Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
time=2026-05-09T15:51:52.721Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA A3
0" compute=8.0 id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 pci_id=0000:9b:00.0
time=2026-05-09T15:51:52.722Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36321"
time=2026-05-09T15:51:52.722Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=
1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/l
ib/ollama:/usr/lib/ollama/cuda_v12 CUDA_VISIBLE_DEVICES=GPU-fadf48eb-b869-8019-51d2-215713c98e04 GGML_CUDA_INIT=1
time=2026-05-09T15:51:52.984Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=262.685574ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /us
r/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-fadf48eb-b869-8019-51d2-215713c98e04 GGML_CUDA_INIT:1]"
time=2026-05-09T15:51:52.984Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=689.01194ms
time=2026-05-09T15:51:52.984Z level=INFO source=types.go:42 msg="inference compute" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 filter_id="" library=CUDA com
pute=8.0 name=CUDA0 description="NVIDIA A30" libdirs=ollama,cuda_v12 driver=12.2 pci_id=0000:9b:00.0 type=discrete total="24.0 GiB" available="23.5 GiB"
time=2026-05-09T15:51:52.984Z level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768


time=2026-05-09T15:53:08.084Z level=DEBUG source=runner.go:264 msg="refreshing free memory"                                                         [133/220]
time=2026-05-09T15:53:08.084Z level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-05-09T15:53:08.084Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41247"
time=2026-05-09T15:53:08.084Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=
1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/l
ib/ollama:/usr/lib/ollama/cuda_v12
time=2026-05-09T15:53:08.372Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=288.155632ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /us
r/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-05-09T15:53:08.372Z level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=288.415391ms
time=2026-05-09T15:53:08.395Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": inval
id syntax"
time=2026-05-09T15:53:08.395Z level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-05-09T15:53:08.395Z level=DEBUG source=sched.go:229 msg="loading first model" model=/root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee2
5846eed4f6f0b936278e3a3c900bb99d37c
time=2026-05-09T15:53:08.578Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:08.722Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:08.724Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974
752427e-07
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-05-09T15:53:08.725Z level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-05-09T15:53:08.725Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/
blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 36275"
time=2026-05-09T15:53:08.725Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=
1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/l
ib/ollama:/usr/lib/ollama/cuda_v12
time=2026-05-09T15:53:08.726Z level=INFO source=sched.go:484 msg="system memory" total="251.4 GiB" free="251.2 GiB" free_swap="8.0 GiB"
time=2026-05-09T15:53:08.726Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 library=CUDA available="23.0 GiB" f
ree="23.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-05-09T15:53:08.726Z level=INFO source=server.go:771 msg="loading model" "model layers"=33 requested=-1
time=2026-05-09T15:53:08.750Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-05-09T15:53:08.751Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:36275"
time=2026-05-09T15:53:08.761Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled K
vSize:32768 KvCacheType: NumThreads:20 GPULayers:33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU
:0 UseMmap:false}"
time=2026-05-09T15:53:08.878Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:08.879Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default=""
time=2026-05-09T15:53:08.879Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default=""
time=2026-05-09T15:53:08.879Z level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values
=52
time=2026-05-09T15:53:08.879Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-05-09T15:53:08.925Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA A30, compute capability 8.0, VMM: yes, ID: GPU-fadf48eb-b869-8019-51d2-215713c98e04
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-05-09T15:53:09.092Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI
2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA
.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974
752427e-07
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-05-09T15:53:10.088Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
time=2026-05-09T15:53:10.690Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=16775 splits=4
time=2026-05-09T15:53:10.701Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2463 splits=2
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="5.6 GiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="563.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.2 GiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="955.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="31.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=server.go:796 msg=memory success=true required.InputWeights=591052800 required.CPU.Graph=33210368 required.C
UDA0.ID=GPU-fadf48eb-b869-8019-51d2-215713c98e04 required.CUDA0.Weights="[268028672 135971584 135971584 132057088 122995456 122995456 135971584 119080960 122
995456 135971584 122995456 119080960 135971584 122995456 122995456 132057088 122995456 122995456 135971584 119080960 122995456 135971584 122995456 119080960
135971584 122995456 122995456 132057088 135971584 135971584 135971584 132057088 1747210880]" required.CUDA0.Cache="[54886400 54886400 54886400 134217728 5488
6400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 5488
6400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 0]" required.CUDA0.Graph=1002160128
time=2026-05-09T15:53:10.703Z level=DEBUG source=server.go:990 msg="available gpu" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 library=CUDA "available layer
vram"="22.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="955.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=server.go:807 msg="new layout created" layers="33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..3
2)]"
time=2026-05-09T15:53:10.704Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled
 KvSize:32768 KvCacheType: NumThreads:20 GPULayers:33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainG
PU:0 UseMmap:false}"
time=2026-05-09T15:53:10.806Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974
752427e-07
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-05-09T15:53:11.417Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
time=2026-05-09T15:53:12.176Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=16775 splits=4
time=2026-05-09T15:53:12.197Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2463 splits=2
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="5.6 GiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="563.7 MiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.2 GiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="955.7 MiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="31.7 MiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=server.go:796 msg=memory success=true required.InputWeights=591052800 required.CPU.Graph=33210368 required.C
UDA0.ID=GPU-fadf48eb-b869-8019-51d2-215713c98e04 required.CUDA0.Weights="[268028672 135971584 135971584 132057088 122995456 122995456 135971584 119080960 122
995456 135971584 122995456 119080960 135971584 122995456 122995456 132057088 122995456 122995456 135971584 119080960 122995456 135971584 122995456 119080960
135971584 122995456 122995456 132057088 135971584 135971584 135971584 132057088 1747210880]" required.CUDA0.Cache="[54886400 54886400 54886400 134217728 5488
6400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 5488
6400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 0]" required.CUDA0.Graph=1002160128
time=2026-05-09T15:53:12.198Z level=DEBUG source=server.go:990 msg="available gpu" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 library=CUDA "available layer
vram"="22.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="955.7 MiB"
time=2026-05-09T15:53:12.198Z level=DEBUG source=server.go:807 msg="new layout created" layers="33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..3
2)]"
time=2026-05-09T15:53:12.198Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enable
d KvSize:32768 KvCacheType: NumThreads:20 GPULayers:33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..32)] MultiUserCache:false ProjectorPath: Main
GPU:0 UseMmap:false}"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="5.6 GiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:245 msg="model weights" device=CPU size="563.7 MiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.2 GiB"
time=2026-05-09T15:53:12.199Z level=INFO source=ggml.go:482 msg="offloading 32 repeating layers to GPU"
time=2026-05-09T15:53:12.199Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-05-09T15:53:12.199Z level=INFO source=ggml.go:494 msg="offloaded 33/33 layers to GPU"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="955.7 MiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="31.7 MiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-05-09T15:53:12.199Z level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-05-09T15:53:12.199Z level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-05-09T15:53:12.201Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
time=2026-05-09T15:53:12.201Z level=DEBUG source=server.go:1408 msg="model load progress 0.00"
time=2026-05-09T15:53:12.452Z level=DEBUG source=server.go:1408 msg="model load progress 0.08"
time=2026-05-09T15:53:12.704Z level=DEBUG source=server.go:1408 msg="model load progress 0.15"
time=2026-05-09T15:53:12.955Z level=DEBUG source=server.go:1408 msg="model load progress 0.22"
time=2026-05-09T15:53:13.206Z level=DEBUG source=server.go:1408 msg="model load progress 0.28"
time=2026-05-09T15:53:13.458Z level=DEBUG source=server.go:1408 msg="model load progress 0.34"
time=2026-05-09T15:53:13.709Z level=DEBUG source=server.go:1408 msg="model load progress 0.40"
time=2026-05-09T15:53:13.960Z level=DEBUG source=server.go:1408 msg="model load progress 0.45"
time=2026-05-09T15:53:14.212Z level=DEBUG source=server.go:1408 msg="model load progress 0.51"
time=2026-05-09T15:53:14.463Z level=DEBUG source=server.go:1408 msg="model load progress 0.58"
time=2026-05-09T15:53:14.715Z level=DEBUG source=server.go:1408 msg="model load progress 0.69"
time=2026-05-09T15:53:14.966Z level=DEBUG source=server.go:1408 msg="model load progress 0.79"
time=2026-05-09T15:53:15.217Z level=DEBUG source=server.go:1408 msg="model load progress 0.85"
time=2026-05-09T15:53:15.468Z level=DEBUG source=server.go:1408 msg="model load progress 0.93"
time=2026-05-09T15:53:15.668Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:15.720Z level=INFO source=server.go:1402 msg="llama runner started in 6.99 seconds"
time=2026-05-09T15:53:15.720Z level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:9b runner.inference="[
{ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Library:CUDA}]" runner.size="9.3 GiB" runner.vram="9.3 GiB" runner.parallel=1 runner.pid=70 runner.model=/root/.
ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c runner.num_ctx=32768
time=2026-05-09T15:53:15.781Z level=DEBUG source=server.go:1550 msg="completion request" images=0 prompt=83 format=""
time=2026-05-09T15:53:15.822Z level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=15 used=0 remaining=15
time=2026-05-09T16:02:45.124Z level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=32768 input=32768 keep=4 discard=16382
time=2026-05-09T16:02:45.124Z level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation"
time=2026-05-09T16:07:42.688Z level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=32768 input=32768 keep=4 discard=16382
time=2026-05-09T16:07:42.688Z level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation"
time=2026-05-09T16:07:52.179Z level=DEBUG source=sched.go:581 msg="context for request finished"
[GIN] 2026/05/09 - 16:07:52 | 200 |        14m44s |      10.30.0.79 | POST     "/v1/chat/completions"
time=2026-05-09T16:07:52.179Z level=DEBUG source=sched.go:309 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.5:9b runner.inference="[{ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Library:CUDA}]" runner.size="9.3 GiB" runner.vram="9.3 GiB" runner.parallel=1 runner.pid=70 runner.model=/root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c runner.num_ctx=32768 duration=5m0s
time=2026-05-09T16:07:52.180Z level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:9b runner.inference="[{ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Library:CUDA}]" runner.size="9.3 GiB" runner.vram="9.3 GiB" runner.parallel=1 runner.pid=70 runner.model=/root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c runner.num_ctx=32768 refCount=0
RAW_BUFFERClick to expand / collapse

What is the issue?

When I try command-01 it works properly but when i trigger command-02 it is stuck for ever. I tried waiting for more than 10 minutes but ollama is still busy processing without seeing any reply

command-01

curl http://10.30.0.105:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:9b",
    "messages": [
      {
        "role": "user", "content": "Say hello in one sentence"
      }
    ],
    "options": {
      "temperature": 0
    }
  }'

command-02

curl http://10.30.0.105:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3.5:9b",
    "messages": [
      {
        "role": "user",
        "content": "Say hello in one sentence"
      }
    ],
    "temperature": 0
  }'
sudo docker container exec -it ollama ollama --version
ollama version is 0.21.1

I even tried 0.23.1 and found the same issue. 

I tried with following NVIDIA GPU driver versions on A30 GPU
535.288.01 
580.126.20

Finally when command-02 completed execution it ended up using 49,173 tokens 
"finish_reason":"stop"}],"usage":{"prompt_tokens":15,"completion_tokens":49158,"total_tokens":49173}}

Relevant log output

  sudo docker logs -f ollama
time=2026-05-09T15:51:52.289Z level=INFO source=routes.go:1752 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL
: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:0 OLLAMA_DEBUG:DEBUG OLLAMA_DEBUG_LOG_REQUESTS:fals
e OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_
LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/root/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_
ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http:
//localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* h
ttps://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROC
R_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2026-05-09T15:51:52.289Z level=INFO source=routes.go:1754 msg="Ollama cloud disabled: false"                                                            time=2026-05-09T15:51:52.293Z level=INFO source=images.go:517 msg="total blobs: 49"                                                                          time=2026-05-09T15:51:52.294Z level=INFO source=images.go:524 msg="total unused blobs removed: 0"                                                            time=2026-05-09T15:51:52.295Z level=INFO source=routes.go:1810 msg="Listening on [::]:11434 (version 0.21.1)"                                                time=2026-05-09T15:51:52.295Z level=DEBUG source=sched.go:145 msg="starting llm scheduler"                                                                   time=2026-05-09T15:51:52.296Z level=INFO source=runner.go:67 msg="discovering available GPUs..."                                                             time=2026-05-09T15:51:52.296Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 38761"                time=2026-05-09T15:51:52.296Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v13                                                                                                                           time=2026-05-09T15:51:52.446Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=150.290886ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v13]" extra_envs=map[]                                                                                                                     time=2026-05-09T15:51:52.446Z level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"   
time=2026-05-09T15:51:52.447Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 34523"                time=2026-05-09T15:51:52.447Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12                                                                                                                           time=2026-05-09T15:51:52.721Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=274.989771ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /usr/lib/ollama/cuda_v12]" extra_envs=map[]                                                                                                                     time=2026-05-09T15:51:52.721Z level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=1
time=2026-05-09T15:51:52.721Z level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=/usr/lib/ollama/cuda_v12 description="NVIDIA A3
0" compute=8.0 id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 pci_id=0000:9b:00.0
time=2026-05-09T15:51:52.722Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 36321"
time=2026-05-09T15:51:52.722Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=
1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/l
ib/ollama:/usr/lib/ollama/cuda_v12 CUDA_VISIBLE_DEVICES=GPU-fadf48eb-b869-8019-51d2-215713c98e04 GGML_CUDA_INIT=1
time=2026-05-09T15:51:52.984Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=262.685574ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /us
r/lib/ollama/cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-fadf48eb-b869-8019-51d2-215713c98e04 GGML_CUDA_INIT:1]"
time=2026-05-09T15:51:52.984Z level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=689.01194ms
time=2026-05-09T15:51:52.984Z level=INFO source=types.go:42 msg="inference compute" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 filter_id="" library=CUDA com
pute=8.0 name=CUDA0 description="NVIDIA A30" libdirs=ollama,cuda_v12 driver=12.2 pci_id=0000:9b:00.0 type=discrete total="24.0 GiB" available="23.5 GiB"
time=2026-05-09T15:51:52.984Z level=INFO source=routes.go:1860 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768


time=2026-05-09T15:53:08.084Z level=DEBUG source=runner.go:264 msg="refreshing free memory"                                                         [133/220]
time=2026-05-09T15:53:08.084Z level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-05-09T15:53:08.084Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 41247"
time=2026-05-09T15:53:08.084Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=
1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/l
ib/ollama:/usr/lib/ollama/cuda_v12
time=2026-05-09T15:53:08.372Z level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=288.155632ms OLLAMA_LIBRARY_PATH="[/usr/lib/ollama /us
r/lib/ollama/cuda_v12]" extra_envs=map[]
time=2026-05-09T15:53:08.372Z level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=288.415391ms
time=2026-05-09T15:53:08.395Z level=WARN source=cpu_linux.go:130 msg="failed to parse CPU allowed micro secs" error="strconv.ParseInt: parsing \"max\": inval
id syntax"
time=2026-05-09T15:53:08.395Z level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-05-09T15:53:08.395Z level=DEBUG source=sched.go:229 msg="loading first model" model=/root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee2
5846eed4f6f0b936278e3a3c900bb99d37c
time=2026-05-09T15:53:08.578Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:08.722Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:08.724Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974
752427e-07
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-05-09T15:53:08.725Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-05-09T15:53:08.725Z level=INFO source=server.go:259 msg="enabling flash attention"
time=2026-05-09T15:53:08.725Z level=INFO source=server.go:444 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --model /root/.ollama/models/
blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c --port 36275"
time=2026-05-09T15:53:08.725Z level=DEBUG source=server.go:445 msg=subprocess PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin OLLAMA_DEBUG=
1 LD_LIBRARY_PATH=/usr/lib/ollama:/usr/lib/ollama/cuda_v12:/usr/local/nvidia/lib:/usr/local/nvidia/lib64 OLLAMA_HOST=0.0.0.0:11434 OLLAMA_LIBRARY_PATH=/usr/l
ib/ollama:/usr/lib/ollama/cuda_v12
time=2026-05-09T15:53:08.726Z level=INFO source=sched.go:484 msg="system memory" total="251.4 GiB" free="251.2 GiB" free_swap="8.0 GiB"
time=2026-05-09T15:53:08.726Z level=INFO source=sched.go:491 msg="gpu memory" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 library=CUDA available="23.0 GiB" f
ree="23.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-05-09T15:53:08.726Z level=INFO source=server.go:771 msg="loading model" "model layers"=33 requested=-1
time=2026-05-09T15:53:08.750Z level=INFO source=runner.go:1417 msg="starting ollama engine"
time=2026-05-09T15:53:08.751Z level=INFO source=runner.go:1452 msg="Server listening on 127.0.0.1:36275"
time=2026-05-09T15:53:08.761Z level=INFO source=runner.go:1290 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled K
vSize:32768 KvCacheType: NumThreads:20 GPULayers:33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainGPU
:0 UseMmap:false}"
time=2026-05-09T15:53:08.878Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:08.879Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.name default=""
time=2026-05-09T15:53:08.879Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.description default=""
time=2026-05-09T15:53:08.879Z level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q4_K_M name="" description="" num_tensors=883 num_key_values
=52
time=2026-05-09T15:53:08.879Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama
load_backend: loaded CPU backend from /usr/lib/ollama/libggml-cpu-icelake.so
time=2026-05-09T15:53:08.925Z level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=/usr/lib/ollama/cuda_v12
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA A30, compute capability 8.0, VMM: yes, ID: GPU-fadf48eb-b869-8019-51d2-215713c98e04
load_backend: loaded CUDA backend from /usr/lib/ollama/cuda_v12/libggml-cuda.so
time=2026-05-09T15:53:09.092Z level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI
2=1 CPU.0.AVX512=1 CPU.0.AVX512_VBMI=1 CPU.0.AVX512_VNNI=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=500,520,600,610,700,750,800,860,890,900,1200 CUDA
.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(gcc)
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974
752427e-07
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-05-09T15:53:09.098Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-05-09T15:53:10.088Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
time=2026-05-09T15:53:10.690Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=16775 splits=4
time=2026-05-09T15:53:10.701Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2463 splits=2
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="5.6 GiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="563.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.2 GiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="955.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="31.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=server.go:796 msg=memory success=true required.InputWeights=591052800 required.CPU.Graph=33210368 required.C
UDA0.ID=GPU-fadf48eb-b869-8019-51d2-215713c98e04 required.CUDA0.Weights="[268028672 135971584 135971584 132057088 122995456 122995456 135971584 119080960 122
995456 135971584 122995456 119080960 135971584 122995456 122995456 132057088 122995456 122995456 135971584 119080960 122995456 135971584 122995456 119080960
135971584 122995456 122995456 132057088 135971584 135971584 135971584 132057088 1747210880]" required.CUDA0.Cache="[54886400 54886400 54886400 134217728 5488
6400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 5488
6400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 0]" required.CUDA0.Graph=1002160128
time=2026-05-09T15:53:10.703Z level=DEBUG source=server.go:990 msg="available gpu" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 library=CUDA "available layer
vram"="22.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="955.7 MiB"
time=2026-05-09T15:53:10.703Z level=DEBUG source=server.go:807 msg="new layout created" layers="33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..3
2)]"
time=2026-05-09T15:53:10.704Z level=INFO source=runner.go:1290 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled
 KvSize:32768 KvCacheType: NumThreads:20 GPULayers:33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..32)] MultiUserCache:false ProjectorPath: MainG
PU:0 UseMmap:false}"
time=2026-05-09T15:53:10.806Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=general.alignment default=32
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974
752427e-07
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-05-09T15:53:10.814Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-05-09T15:53:11.417Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1258 splits=1
time=2026-05-09T15:53:12.176Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=16775 splits=4
time=2026-05-09T15:53:12.197Z level=DEBUG source=ggml.go:852 msg="compute graph" nodes=2463 splits=2
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="5.6 GiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:245 msg="model weights" device=CPU size="563.7 MiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="2.2 GiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="955.7 MiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="31.7 MiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-05-09T15:53:12.197Z level=DEBUG source=server.go:796 msg=memory success=true required.InputWeights=591052800 required.CPU.Graph=33210368 required.C
UDA0.ID=GPU-fadf48eb-b869-8019-51d2-215713c98e04 required.CUDA0.Weights="[268028672 135971584 135971584 132057088 122995456 122995456 135971584 119080960 122
995456 135971584 122995456 119080960 135971584 122995456 122995456 132057088 122995456 122995456 135971584 119080960 122995456 135971584 122995456 119080960
135971584 122995456 122995456 132057088 135971584 135971584 135971584 132057088 1747210880]" required.CUDA0.Cache="[54886400 54886400 54886400 134217728 5488
6400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 5488
6400 54886400 134217728 54886400 54886400 54886400 134217728 54886400 54886400 54886400 134217728 0]" required.CUDA0.Graph=1002160128
time=2026-05-09T15:53:12.198Z level=DEBUG source=server.go:990 msg="available gpu" id=GPU-fadf48eb-b869-8019-51d2-215713c98e04 library=CUDA "available layer
vram"="22.1 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="955.7 MiB"
time=2026-05-09T15:53:12.198Z level=DEBUG source=server.go:807 msg="new layout created" layers="33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..3
2)]"
time=2026-05-09T15:53:12.198Z level=INFO source=runner.go:1290 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enable
d KvSize:32768 KvCacheType: NumThreads:20 GPULayers:33[ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Layers:33(0..32)] MultiUserCache:false ProjectorPath: Main
GPU:0 UseMmap:false}"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="5.6 GiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:245 msg="model weights" device=CPU size="563.7 MiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="2.2 GiB"
time=2026-05-09T15:53:12.199Z level=INFO source=ggml.go:482 msg="offloading 32 repeating layers to GPU"
time=2026-05-09T15:53:12.199Z level=INFO source=ggml.go:489 msg="offloading output layer to GPU"
time=2026-05-09T15:53:12.199Z level=INFO source=ggml.go:494 msg="offloaded 33/33 layers to GPU"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="955.7 MiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:267 msg="compute graph" device=CPU size="31.7 MiB"
time=2026-05-09T15:53:12.199Z level=INFO source=device.go:272 msg="total memory" size="9.3 GiB"
time=2026-05-09T15:53:12.199Z level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-05-09T15:53:12.199Z level=INFO source=server.go:1364 msg="waiting for llama runner to start responding"
time=2026-05-09T15:53:12.201Z level=INFO source=server.go:1398 msg="waiting for server to become available" status="llm server loading model"
time=2026-05-09T15:53:12.201Z level=DEBUG source=server.go:1408 msg="model load progress 0.00"
time=2026-05-09T15:53:12.452Z level=DEBUG source=server.go:1408 msg="model load progress 0.08"
time=2026-05-09T15:53:12.704Z level=DEBUG source=server.go:1408 msg="model load progress 0.15"
time=2026-05-09T15:53:12.955Z level=DEBUG source=server.go:1408 msg="model load progress 0.22"
time=2026-05-09T15:53:13.206Z level=DEBUG source=server.go:1408 msg="model load progress 0.28"
time=2026-05-09T15:53:13.458Z level=DEBUG source=server.go:1408 msg="model load progress 0.34"
time=2026-05-09T15:53:13.709Z level=DEBUG source=server.go:1408 msg="model load progress 0.40"
time=2026-05-09T15:53:13.960Z level=DEBUG source=server.go:1408 msg="model load progress 0.45"
time=2026-05-09T15:53:14.212Z level=DEBUG source=server.go:1408 msg="model load progress 0.51"
time=2026-05-09T15:53:14.463Z level=DEBUG source=server.go:1408 msg="model load progress 0.58"
time=2026-05-09T15:53:14.715Z level=DEBUG source=server.go:1408 msg="model load progress 0.69"
time=2026-05-09T15:53:14.966Z level=DEBUG source=server.go:1408 msg="model load progress 0.79"
time=2026-05-09T15:53:15.217Z level=DEBUG source=server.go:1408 msg="model load progress 0.85"
time=2026-05-09T15:53:15.468Z level=DEBUG source=server.go:1408 msg="model load progress 0.93"
time=2026-05-09T15:53:15.668Z level=DEBUG source=ggml.go:325 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-05-09T15:53:15.720Z level=INFO source=server.go:1402 msg="llama runner started in 6.99 seconds"
time=2026-05-09T15:53:15.720Z level=DEBUG source=sched.go:573 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:9b runner.inference="[
{ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Library:CUDA}]" runner.size="9.3 GiB" runner.vram="9.3 GiB" runner.parallel=1 runner.pid=70 runner.model=/root/.
ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c runner.num_ctx=32768
time=2026-05-09T15:53:15.781Z level=DEBUG source=server.go:1550 msg="completion request" images=0 prompt=83 format=""
time=2026-05-09T15:53:15.822Z level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=15 used=0 remaining=15
time=2026-05-09T16:02:45.124Z level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=32768 input=32768 keep=4 discard=16382
time=2026-05-09T16:02:45.124Z level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation"
time=2026-05-09T16:07:42.688Z level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=32768 input=32768 keep=4 discard=16382
time=2026-05-09T16:07:42.688Z level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation"
time=2026-05-09T16:07:52.179Z level=DEBUG source=sched.go:581 msg="context for request finished"
[GIN] 2026/05/09 - 16:07:52 | 200 |        14m44s |      10.30.0.79 | POST     "/v1/chat/completions"
time=2026-05-09T16:07:52.179Z level=DEBUG source=sched.go:309 msg="runner with non-zero duration has gone idle, adding timer" runner.name=registry.ollama.ai/library/qwen3.5:9b runner.inference="[{ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Library:CUDA}]" runner.size="9.3 GiB" runner.vram="9.3 GiB" runner.parallel=1 runner.pid=70 runner.model=/root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c runner.num_ctx=32768 duration=5m0s
time=2026-05-09T16:07:52.180Z level=DEBUG source=sched.go:327 msg="after processing request finished event" runner.name=registry.ollama.ai/library/qwen3.5:9b runner.inference="[{ID:GPU-fadf48eb-b869-8019-51d2-215713c98e04 Library:CUDA}]" runner.size="9.3 GiB" runner.vram="9.3 GiB" runner.parallel=1 runner.pid=70 runner.model=/root/.ollama/models/blobs/sha256-dec52a44569a2a25341c4e4d3fee25846eed4f6f0b936278e3a3c900bb99d37c runner.num_ctx=32768 refCount=0

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.21.1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix ollama is stuck in a loop processing simple request