ollama - 💡(How to fix) Fix Qwen3.5 0.8b keeps printing 'kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation', and finally times out and crashes. [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14939Fetched 2026-04-08 00:58:25
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1labeled ×1

Error Message

C:\Users\Administrator>ollama serve Error: listen tcp 0.0.0.0:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.

C:\Users\Administrator>ollama serve time=2026-03-18T23:52:24.556+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\Users\Administrator\.ollama\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:4 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]" time=2026-03-18T23:52:24.568+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false" time=2026-03-18T23:52:24.582+08:00 level=INFO source=images.go:477 msg="total blobs: 72" time=2026-03-18T23:52:24.589+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0" time=2026-03-18T23:52:24.593+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.1)" time=2026-03-18T23:52:24.593+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler" time=2026-03-18T23:52:24.594+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..." time=2026-03-18T23:52:24.622+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 61432" time=2026-03-18T23:52:24.622+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 time=2026-03-18T23:52:24.886+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=272.9297ms OLLAMA_LIBRARY_PATH="[C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12]" extra_envs=map[] time=2026-03-18T23:52:24.888+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 61447" time=2026-03-18T23:52:24.888+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 time=2026-03-18T23:52:25.134+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=247.62ms OLLAMA_LIBRARY_PATH="[C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13]" extra_envs=map[] time=2026-03-18T23:52:25.135+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 61461" time=2026-03-18T23:52:25.135+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\rocm;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\rocm time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=179.8718ms OLLAMA_LIBRARY_PATH="[C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\rocm]" extra_envs=map[] time=2026-03-18T23:52:25.314+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled. To enable, set OLLAMA_VULKAN=1" time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2 time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 description="NVIDIA GeForce RTX 3090" compute=8.6 id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 pci_id=0000:01:00.0 time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 description="NVIDIA GeForce RTX 3090" compute=8.6 id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 pci_id=0000:01:00.0 time=2026-03-18T23:52:25.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 61476" time=2026-03-18T23:52:25.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 61475" time=2026-03-18T23:52:25.316+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 CUDA_VISIBLE_DEVICES=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT=1 time=2026-03-18T23:52:25.316+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 time=2026-03-18T23:52:25.585+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=271.3152ms OLLAMA_LIBRARY_PATH="[C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT:1]" time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=303.0219ms OLLAMA_LIBRARY_PATH="[C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT:1]" time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:401 msg="filtering device with overlapping libraries" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 delete_index=0 kept_library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.0241201s time=2026-03-18T23:52:25.617+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.0 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="22.5 GiB" time=2026-03-18T23:52:25.617+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768 time=2026-03-18T23:52:25.732+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory" time=2026-03-18T23:52:25.732+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery" time=2026-03-18T23:52:25.734+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --port 61509" time=2026-03-18T23:52:25.734+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 time=2026-03-18T23:52:25.974+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=242.5026ms OLLAMA_LIBRARY_PATH="[C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13]" extra_envs=map[] time=2026-03-18T23:52:25.974+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=242.5026ms time=2026-03-18T23:52:25.974+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1 time=2026-03-18T23:52:25.974+08:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1 time=2026-03-18T23:52:25.975+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32 time=2026-03-18T23:52:25.975+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1 time=2026-03-18T23:52:25.993+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-18T23:52:25.994+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=C:\Users\Administrator.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-03-18T23:52:26.021+08:00 level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35 time=2026-03-18T23:52:26.044+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-18T23:52:26.046+08:00 level=INFO source=server.go:246 msg="enabling flash attention" time=2026-03-18T23:52:26.047+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\Users\Administrator\AppData\Local\Programs\Ollama\ollama.exe runner --ollama-engine --model C:\Users\Administrator\.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 --port 61523" time=2026-03-18T23:52:26.047+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" CUDA_PATH_V12_1="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1" CUDA_PATH_V12_2="C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13;C:\ProgramData\anaconda3\condabin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.2\libnvvp;C:\Program Files\Common Files\Oracle\Java\javapath;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\libnvvp;C:\WINDOWS\system32;C:\WINDOWS;C:\WINDOWS\System32\Wbem;C:\WINDOWS\System32\WindowsPowerShell\v1.0\;C:\WINDOWS\System32\OpenSSH\;C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include;C:\Program Files\Git\cmd;C:\Program Files\Git LFS;C:\Program Files\LibreOffice\program;C:\Program Files\dotnet\;C:\Program Files (x86)\Windows Kits\10\Windows Performance Toolkit\;C:\Program Files\NVIDIA Corporation\Nsight Compute 2023.2.0\;C:\Program Files\Docker\Docker\resources\bin;C:\Program Files\nodejs\;C:\Users\Administrator\AppData\Local\Microsoft\WindowsApps;C:\Users\Administrator\AppData\Local\Programs\Ollama;C:\ProgramData\anaconda3;C:\ProgramData\anaconda3\Scripts;C:\ProgramData\anaconda3\Library\bin;C:\ProgramData\anaconda3\Library\mingw-w64\bin;C:\ProgramData\anaconda3\Library\usr\bin;C:\Users\Administrator\AppData\Roaming\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 time=2026-03-18T23:52:26.062+08:00 level=INFO source=sched.go:489 msg="system memory" total="127.8 GiB" free="98.9 GiB" free_swap="89.6 GiB" time=2026-03-18T23:52:26.062+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=CUDA available="22.1 GiB" free="22.5 GiB" minimum="457.0 MiB" overhead="0 B" time=2026-03-18T23:52:26.062+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=25 requested=-1 time=2026-03-18T23:52:26.182+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine" time=2026-03-18T23:52:26.193+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:61523" time=2026-03-18T23:52:26.201+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default="" time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default="" time=2026-03-18T23:52:26.228+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q8_0 name="" description="" num_tensors=536 num_key_values=52 time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll time=2026-03-18T23:52:26.242+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 CUDA devices: Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 load_backend: loaded CUDA backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll time=2026-03-18T23:52:26.293+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)

Code Example

C:\Users\Administrator>ollama serve
Error: listen tcp 0.0.0.0:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.

C:\Users\Administrator>ollama serve
time=2026-03-18T23:52:24.556+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:4 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-18T23:52:24.568+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
time=2026-03-18T23:52:24.582+08:00 level=INFO source=images.go:477 msg="total blobs: 72"
time=2026-03-18T23:52:24.589+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-18T23:52:24.593+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.1)"
time=2026-03-18T23:52:24.593+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-18T23:52:24.594+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-18T23:52:24.622+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61432"
time=2026-03-18T23:52:24.622+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12
time=2026-03-18T23:52:24.886+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=272.9297ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[]
time=2026-03-18T23:52:24.888+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61447"
time=2026-03-18T23:52:24.888+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:25.134+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=247.62ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-03-18T23:52:25.135+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61461"
time=2026-03-18T23:52:25.135+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=179.8718ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[]
time=2026-03-18T23:52:25.314+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 description="NVIDIA GeForce RTX 3090" compute=8.6 id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 pci_id=0000:01:00.0
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 description="NVIDIA GeForce RTX 3090" compute=8.6 id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 pci_id=0000:01:00.0
time=2026-03-18T23:52:25.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61476"
time=2026-03-18T23:52:25.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61475"
time=2026-03-18T23:52:25.316+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 CUDA_VISIBLE_DEVICES=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT=1
time=2026-03-18T23:52:25.316+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178
time=2026-03-18T23:52:25.585+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=271.3152ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT:1]"
time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=303.0219ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT:1]"
time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:401 msg="filtering device with overlapping libraries" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 delete_index=0 kept_library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.0241201s
time=2026-03-18T23:52:25.617+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.0 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="22.5 GiB"
time=2026-03-18T23:52:25.617+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768
time=2026-03-18T23:52:25.732+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-18T23:52:25.732+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-18T23:52:25.734+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61509"
time=2026-03-18T23:52:25.734+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:25.974+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=242.5026ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-03-18T23:52:25.974+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=242.5026ms
time=2026-03-18T23:52:25.974+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-18T23:52:25.974+08:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-03-18T23:52:25.975+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32
time=2026-03-18T23:52:25.975+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-03-18T23:52:25.993+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-18T23:52:25.994+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=C:\Users\Administrator\.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5
time=2026-03-18T23:52:26.021+08:00 level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35
time=2026-03-18T23:52:26.044+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-18T23:52:26.046+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-18T23:52:26.047+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 --port 61523"
time=2026-03-18T23:52:26.047+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:26.062+08:00 level=INFO source=sched.go:489 msg="system memory" total="127.8 GiB" free="98.9 GiB" free_swap="89.6 GiB"
time=2026-03-18T23:52:26.062+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=CUDA available="22.1 GiB" free="22.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-18T23:52:26.062+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=25 requested=-1
time=2026-03-18T23:52:26.182+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-18T23:52:26.193+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:61523"
time=2026-03-18T23:52:26.201+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-18T23:52:26.228+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q8_0 name="" description="" num_tensors=536 num_key_values=52
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-03-18T23:52:26.242+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-b9245cc3-269c-28c7-2c20-c18ad4876178
load_backend: loaded CUDA backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-03-18T23:52:26.293+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)
RAW_BUFFERClick to expand / collapse

What is the issue?

"Qwen3.5 0.8b keeps printing 'kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation', and finally times out and crashes."

time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-18T23:52:26.294+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-18T23:52:26.546+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=583 splits=1 time=2026-03-18T23:52:26.681+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=12549 splits=4 time=2026-03-18T23:52:26.714+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1813 splits=2 time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="974.5 MiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=device.go:245 msg="model weights" device=CPU size="259.8 MiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="577.6 MiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="498.1 MiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="7.9 MiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=device.go:272 msg="total memory" size="2.3 GiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=server.go:782 msg=memory success=true required.InputWeights=272412672 required.CPU.Graph=8302592 required.CUDA0.ID=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 required.CUDA0.Weights="[42488576 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 471614464]" required.CUDA0.Cache="[28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 0]" required.CUDA0.Graph=522301440 time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=server.go:976 msg="available gpu" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=CUDA "available layer vram"="21.6 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="498.1 MiB" time=2026-03-18T23:52:26.716+08:00 level=DEBUG source=server.go:793 msg="new layout created" layers="25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)]" time=2026-03-18T23:52:26.716+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-18T23:52:26.738+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default="" time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default="" time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0 time=2026-03-18T23:52:26.742+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}" time=2026-03-18T23:52:26.917+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=583 splits=1 time=2026-03-18T23:52:27.040+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=12549 splits=4 time=2026-03-18T23:52:27.044+08:00 level=DEBUG source=ggml.go:852 msg="compute graph" nodes=1813 splits=2 time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=device.go:240 msg="model weights" device=CUDA0 size="974.5 MiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=device.go:245 msg="model weights" device=CPU size="259.8 MiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=device.go:251 msg="kv cache" device=CUDA0 size="577.6 MiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=device.go:262 msg="compute graph" device=CUDA0 size="498.1 MiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=device.go:267 msg="compute graph" device=CPU size="7.9 MiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=device.go:272 msg="total memory" size="2.3 GiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=server.go:782 msg=memory success=true required.InputWeights=272412672 required.CPU.Graph=8302592 required.CUDA0.ID=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 required.CUDA0.Weights="[42488576 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 22981376 22981376 22981376 19507200 471614464]" required.CUDA0.Cache="[28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 28057600 28057600 28057600 16777216 0]" required.CUDA0.Graph=522301440 time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=server.go:976 msg="available gpu" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=CUDA "available layer vram"="21.6 GiB" backoff=0.00 minimum="457.0 MiB" overhead="0 B" graph="498.1 MiB" time=2026-03-18T23:52:27.045+08:00 level=DEBUG source=server.go:793 msg="new layout created" layers="25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)]" time=2026-03-18T23:52:27.045+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}" time=2026-03-18T23:52:27.045+08:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU" time=2026-03-18T23:52:27.045+08:00 level=INFO source=ggml.go:489 msg="offloading output layer to GPU" time=2026-03-18T23:52:27.045+08:00 level=INFO source=ggml.go:494 msg="offloaded 25/25 layers to GPU" time=2026-03-18T23:52:27.045+08:00 level=INFO source=device.go:240 msg="model weights" device=CUDA0 size="974.5 MiB" time=2026-03-18T23:52:27.045+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="259.8 MiB" time=2026-03-18T23:52:27.045+08:00 level=INFO source=device.go:251 msg="kv cache" device=CUDA0 size="577.6 MiB" time=2026-03-18T23:52:27.046+08:00 level=INFO source=device.go:262 msg="compute graph" device=CUDA0 size="498.1 MiB" time=2026-03-18T23:52:27.046+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="7.9 MiB" time=2026-03-18T23:52:27.046+08:00 level=INFO source=device.go:272 msg="total memory" size="2.3 GiB" time=2026-03-18T23:52:27.046+08:00 level=INFO source=sched.go:565 msg="loaded runners" count=1 time=2026-03-18T23:52:27.046+08:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=C:\Users\Administrator.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-03-18T23:52:27.046+08:00 level=INFO source=server.go:1350 msg="waiting for llama runner to start responding" time=2026-03-18T23:52:27.046+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server loading model" time=2026-03-18T23:52:27.046+08:00 level=DEBUG source=server.go:1394 msg="model load progress 0.00" time=2026-03-18T23:52:27.251+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0 time=2026-03-18T23:52:27.297+08:00 level=INFO source=server.go:1388 msg="llama runner started in 1.23 seconds" time=2026-03-18T23:52:27.297+08:00 level=DEBUG source=sched.go:577 msg="finished setting up" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.inference="[{ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Library:CUDA}]" runner.size="2.3 GiB" runner.vram="2.3 GiB" runner.parallel=1 runner.pid=73136 runner.model=C:\Users\Administrator.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 runner.num_ctx=8192 time=2026-03-18T23:52:27.297+08:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=C:\Users\Administrator.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-03-18T23:52:27.298+08:00 level=DEBUG source=sched.go:729 msg="evaluating already loaded" model=C:\Users\Administrator.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 time=2026-03-18T23:52:27.318+08:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=984 format="" time=2026-03-18T23:52:27.318+08:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=969 format="" time=2026-03-18T23:52:27.318+08:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=997 format="" time=2026-03-18T23:52:27.318+08:00 level=DEBUG source=server.go:1536 msg="completion request" images=0 prompt=1160 format="" time=2026-03-18T23:52:27.336+08:00 level=DEBUG source=cache.go:151 msg="loading cache slot" id=0 cache=0 prompt=228 used=0 remaining=228 time=2026-03-18T23:52:59.323+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:52:59.323+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:53:16.441+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:53:16.441+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:53:33.563+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:53:33.563+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:53:50.655+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:53:50.655+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:54:07.745+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:54:07.745+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:54:24.773+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:54:24.773+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:54:41.875+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:54:41.875+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:54:58.948+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:54:58.948+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:55:16.062+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:55:16.062+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:55:33.131+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:55:33.131+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:55:50.262+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:55:50.262+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:56:07.325+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:56:07.325+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:56:24.404+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:56:24.404+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:56:41.476+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:56:41.476+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:56:58.541+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:56:58.541+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:57:15.760+08:00 level=DEBUG source=cache.go:295 msg="context limit hit - shifting" id=0 limit=8192 input=8192 keep=4 discard=4094 time=2026-03-18T23:57:15.760+08:00 level=DEBUG source=cache.go:301 msg="kv cache removal unsupported, clearing cache and returning inputs for reprocessing" id=0 error="model does not support operation" time=2026-03-18T23:57:24.659+08:00 level=INFO source=server.go:1568 msg="aborting completion request due to client closing the connection" time=2026-03-18T23:57:24.659+08:00 level=DEBUG source=sched.go:431 msg="context for request finished" runner.name=registry.ollama.ai/library/qwen3.5:0.8b runner.inference="[{ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Library:CUDA}]" runner.size="2.3 GiB" runner.vram="2.3 GiB" runner.parallel=1 runner.pid=73136

log.txt

Relevant log output

C:\Users\Administrator>ollama serve
Error: listen tcp 0.0.0.0:11434: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted.

C:\Users\Administrator>ollama serve
time=2026-03-18T23:52:24.556+08:00 level=INFO source=routes.go:1727 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:8192 OLLAMA_DEBUG:DEBUG OLLAMA_EDITOR: OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://0.0.0.0:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:C:\\Users\\Administrator\\.ollama\\models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NO_CLOUD:false OLLAMA_NUM_PARALLEL:4 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES:]"
time=2026-03-18T23:52:24.568+08:00 level=INFO source=routes.go:1729 msg="Ollama cloud disabled: false"
time=2026-03-18T23:52:24.582+08:00 level=INFO source=images.go:477 msg="total blobs: 72"
time=2026-03-18T23:52:24.589+08:00 level=INFO source=images.go:484 msg="total unused blobs removed: 0"
time=2026-03-18T23:52:24.593+08:00 level=INFO source=routes.go:1782 msg="Listening on [::]:11434 (version 0.18.1)"
time=2026-03-18T23:52:24.593+08:00 level=DEBUG source=sched.go:145 msg="starting llm scheduler"
time=2026-03-18T23:52:24.594+08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2026-03-18T23:52:24.622+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61432"
time=2026-03-18T23:52:24.622+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12
time=2026-03-18T23:52:24.886+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=272.9297ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs=map[]
time=2026-03-18T23:52:24.888+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61447"
time=2026-03-18T23:52:24.888+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:25.134+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=247.62ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-03-18T23:52:25.135+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61461"
time=2026-03-18T23:52:25.135+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\rocm
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=179.8718ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\rocm]" extra_envs=map[]
time=2026-03-18T23:52:25.314+08:00 level=INFO source=runner.go:106 msg="experimental Vulkan support disabled.  To enable, set OLLAMA_VULKAN=1"
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:124 msg="evaluating which, if any, devices to filter out" initial_count=2
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 description="NVIDIA GeForce RTX 3090" compute=8.6 id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 pci_id=0000:01:00.0
time=2026-03-18T23:52:25.314+08:00 level=DEBUG source=runner.go:146 msg="verifying if device is supported" library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 description="NVIDIA GeForce RTX 3090" compute=8.6 id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 pci_id=0000:01:00.0
time=2026-03-18T23:52:25.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61476"
time=2026-03-18T23:52:25.316+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61475"
time=2026-03-18T23:52:25.316+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13 CUDA_VISIBLE_DEVICES=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT=1
time=2026-03-18T23:52:25.316+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 GGML_CUDA_INIT=1 CUDA_VISIBLE_DEVICES=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178
time=2026-03-18T23:52:25.585+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=271.3152ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT:1]"
time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=303.0219ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v12]" extra_envs="map[CUDA_VISIBLE_DEVICES:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 GGML_CUDA_INIT:1]"
time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:401 msg="filtering device with overlapping libraries" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v12 delete_index=0 kept_library=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:25.617+08:00 level=DEBUG source=runner.go:40 msg="GPU bootstrap discovery took" duration=1.0241201s
time=2026-03-18T23:52:25.617+08:00 level=INFO source=types.go:42 msg="inference compute" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 filter_id="" library=CUDA compute=8.6 name=CUDA0 description="NVIDIA GeForce RTX 3090" libdirs=ollama,cuda_v13 driver=13.0 pci_id=0000:01:00.0 type=discrete total="24.0 GiB" available="22.5 GiB"
time=2026-03-18T23:52:25.617+08:00 level=INFO source=routes.go:1832 msg="vram-based default context" total_vram="24.0 GiB" default_num_ctx=32768
time=2026-03-18T23:52:25.732+08:00 level=DEBUG source=runner.go:264 msg="refreshing free memory"
time=2026-03-18T23:52:25.732+08:00 level=DEBUG source=runner.go:328 msg="unable to refresh all GPUs with existing runners, performing bootstrap discovery"
time=2026-03-18T23:52:25.734+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --port 61509"
time=2026-03-18T23:52:25.734+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:25.974+08:00 level=DEBUG source=runner.go:437 msg="bootstrap discovery took" duration=242.5026ms OLLAMA_LIBRARY_PATH="[C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13]" extra_envs=map[]
time=2026-03-18T23:52:25.974+08:00 level=DEBUG source=runner.go:40 msg="overall device VRAM discovery took" duration=242.5026ms
time=2026-03-18T23:52:25.974+08:00 level=INFO source=cpu_windows.go:148 msg=packages count=1
time=2026-03-18T23:52:25.974+08:00 level=INFO source=cpu_windows.go:164 msg="efficiency cores detected" maxEfficiencyClass=1
time=2026-03-18T23:52:25.975+08:00 level=INFO source=cpu_windows.go:195 msg="" package=0 cores=24 efficiency=16 threads=32
time=2026-03-18T23:52:25.975+08:00 level=DEBUG source=sched.go:220 msg="updating default concurrency" OLLAMA_MAX_LOADED_MODELS=3 gpu_count=1
time=2026-03-18T23:52:25.993+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-18T23:52:25.994+08:00 level=DEBUG source=sched.go:256 msg="loading first model" model=C:\Users\Administrator\.ollama\models\blobs\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5
time=2026-03-18T23:52:26.021+08:00 level=WARN source=sched.go:450 msg="model architecture does not currently support parallel requests" architecture=qwen35
time=2026-03-18T23:52:26.044+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.pooling_type default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.head_count_kv default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.type default=""
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.type default=""
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.factor default=1
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.rope.scaling.original_context_length default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.attention.scale default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_count default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.expert_used_count default=0
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.norm_top_k_prob default=true
time=2026-03-18T23:52:26.045+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.mrope_interleaved default=false
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.attention.layer_norm_epsilon default=9.999999974752427e-07
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.rope.freq_base default=10000
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=qwen35.vision.num_positional_embeddings default=2304
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.add_bos_token default=false
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.bos_token_id default=0
time=2026-03-18T23:52:26.046+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=tokenizer.ggml.eos_token_ids default="&{size:0 values:[]}"
time=2026-03-18T23:52:26.046+08:00 level=INFO source=server.go:246 msg="enabling flash attention"
time=2026-03-18T23:52:26.047+08:00 level=INFO source=server.go:430 msg="starting runner" cmd="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\ollama.exe runner --ollama-engine --model C:\\Users\\Administrator\\.ollama\\models\\blobs\\sha256-afb707b6b8fac6e475acc42bc8380fc0b8d2e0e4190be5a969fbf62fcc897db5 --port 61523"
time=2026-03-18T23:52:26.047+08:00 level=DEBUG source=server.go:431 msg=subprocess CUDA_PATH="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" CUDA_PATH_V12_1="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1" CUDA_PATH_V12_2="C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2" OLLAMA_CONTEXT_LENGTH=8192 OLLAMA_DEBUG=1 OLLAMA_HOST=0.0.0.0 OLLAMA_NUM_PARALLEL=4 PATH="C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama\\lib\\ollama\\cuda_v13;C:\\ProgramData\\anaconda3\\condabin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.2\\libnvvp;C:\\Program Files\\Common Files\\Oracle\\Java\\javapath;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\libnvvp;C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\;C:\\WINDOWS\\System32\\OpenSSH\\;C:\\Program Files (x86)\\NVIDIA Corporation\\PhysX\\Common;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\lib;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.1\\include;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Git LFS;C:\\Program Files\\LibreOffice\\program;C:\\Program Files\\dotnet\\;C:\\Program Files (x86)\\Windows Kits\\10\\Windows Performance Toolkit\\;C:\\Program Files\\NVIDIA Corporation\\Nsight Compute 2023.2.0\\;C:\\Program Files\\Docker\\Docker\\resources\\bin;C:\\Program Files\\nodejs\\;C:\\Users\\Administrator\\AppData\\Local\\Microsoft\\WindowsApps;C:\\Users\\Administrator\\AppData\\Local\\Programs\\Ollama;C:\\ProgramData\\anaconda3;C:\\ProgramData\\anaconda3\\Scripts;C:\\ProgramData\\anaconda3\\Library\\bin;C:\\ProgramData\\anaconda3\\Library\\mingw-w64\\bin;C:\\ProgramData\\anaconda3\\Library\\usr\\bin;C:\\Users\\Administrator\\AppData\\Roaming\\npm" OLLAMA_LIBRARY_PATH=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama;C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
time=2026-03-18T23:52:26.062+08:00 level=INFO source=sched.go:489 msg="system memory" total="127.8 GiB" free="98.9 GiB" free_swap="89.6 GiB"
time=2026-03-18T23:52:26.062+08:00 level=INFO source=sched.go:496 msg="gpu memory" id=GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 library=CUDA available="22.1 GiB" free="22.5 GiB" minimum="457.0 MiB" overhead="0 B"
time=2026-03-18T23:52:26.062+08:00 level=INFO source=server.go:757 msg="loading model" "model layers"=25 requested=-1
time=2026-03-18T23:52:26.182+08:00 level=INFO source=runner.go:1411 msg="starting ollama engine"
time=2026-03-18T23:52:26.193+08:00 level=INFO source=runner.go:1446 msg="Server listening on 127.0.0.1:61523"
time=2026-03-18T23:52:26.201+08:00 level=INFO source=runner.go:1284 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:Enabled KvSize:8192 KvCacheType: NumThreads:8 GPULayers:25[ID:GPU-b9245cc3-269c-28c7-2c20-c18ad4876178 Layers:25(0..24)] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.alignment default=32
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.name default=""
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:324 msg="key with type not found" key=general.description default=""
time=2026-03-18T23:52:26.228+08:00 level=INFO source=ggml.go:136 msg="" architecture=qwen35 file_type=Q8_0 name="" description="" num_tensors=536 num_key_values=52
time=2026-03-18T23:52:26.228+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama
load_backend: loaded CPU backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\ggml-cpu-alderlake.dll
time=2026-03-18T23:52:26.242+08:00 level=DEBUG source=ggml.go:94 msg="ggml backend load all from path" path=C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, ID: GPU-b9245cc3-269c-28c7-2c20-c18ad4876178
load_backend: loaded CUDA backend from C:\Users\Administrator\AppData\Local\Programs\Ollama\lib\ollama\cuda_v13\ggml-cuda.dll
time=2026-03-18T23:52:26.293+08:00 level=INFO source=ggml.go:104 msg=system CPU.0.SSE3=1 CPU.0.SSSE3=1 CPU.0.AVX=1 CPU.0.AVX_VNNI=1 CPU.0.AVX2=1 CPU.0.F16C=1 CPU.0.FMA=1 CPU.0.BMI2=1 CPU.0.LLAMAFILE=1 CPU.1.LLAMAFILE=1 CUDA.0.ARCHS=750,800,860,870,890,900,1000,1030,1100,1200,1210 CUDA.0.USE_GRAPHS=1 CUDA.0.PEER_MAX_BATCH_SIZE=128 compiler=cgo(clang)

OS

Windows

GPU

Nvidia

CPU

Intel

Ollama version

0.18.1

extent analysis

Fix Plan

The issue seems to be related to the kv cache removal unsupported error, which is caused by the model not supporting the operation. To fix this, we can try the following steps:

  • Disable Kv Cache: Try disabling the kv cache by setting OLLAMA_KV_CACHE_TYPE to an empty string. This can be done by adding the following flag to the ollama serve command: --ollama-kv-cache-type="".
  • Update Ollama: Make sure you are running the latest version of Ollama. You can check for updates by running ollama update.
  • Check Model Compatibility: Ensure that the model you are using is compatible with the version of Ollama you are running. You can check the model's compatibility by running ollama models and looking for any warnings or errors.

Example code to disable kv cache:

ollama serve --ollama-kv-cache-type=""

Verification

To verify that the fix worked, you can try running the ollama serve command again and check the logs for any errors related to kv cache removal. If the error is resolved, you should see a message indicating that the kv cache has been disabled.

Extra Tips

  • Make sure you have the latest version of the CUDA toolkit installed, as this can sometimes cause issues with Ollama.
  • If you are still experiencing issues, try resetting the Ollama configuration by running ollama config reset.
  • You can also try increasing the OLLAMA_CONTEXT_LENGTH to see if it resolves the issue. However, be aware that increasing this value can increase memory usage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING