ollama - 💡(How to fix) Fix [Windows 10] Older NVIDIA GPUs (Maxwell) force fallback to CPU mode, returns 500 error [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15167Fetched 2026-04-08 01:58:31
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
labeled ×2commented ×1

Error Message

  • Chat requests return 500 Internal Server Error

Code Example

time=2026-03-31T16:56:18.686+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="3.0 GiB"
time=2026-03-31T16:56:18.686+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2026-03-31T16:56:18.690+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.5 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:272 msg="total memory" size="7.7 GiB"
time=2026-03-31T16:56:18.695+08:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:56:18.697+08:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:56:18.699+08:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:56:21.454+08:00 level=INFO source=server.go:1390 msg="llama runner started in 7.87 seconds"
[GIN] 2026/03/31 - 16:57:42 | 500 |         1m28s |       127.0.0.1 | POST     "/api/chat"
RAW_BUFFERClick to expand / collapse

What is the issue?

Problem Description Ollama 0.19.0 on Windows 10 fails to detect the older NVIDIA GPU and forces fallback to CPU mode despite correct CUDA 11.8 setup.

  • Log shows id=cpu and total_vram="0 B".
  • GPU usage is 0% (Task Manager).
  • Chat requests return 500 Internal Server Error

Expected behavior Ollama should detect the GPU, use CUDA acceleration, and handle requests normally.

Relevant log output

time=2026-03-31T16:56:18.686+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="3.0 GiB"
time=2026-03-31T16:56:18.686+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2026-03-31T16:56:18.690+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.5 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:272 msg="total memory" size="7.7 GiB"
time=2026-03-31T16:56:18.695+08:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:56:18.697+08:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:56:18.699+08:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:56:21.454+08:00 level=INFO source=server.go:1390 msg="llama runner started in 7.87 seconds"
[GIN] 2026/03/31 - 16:57:42 | 500 |         1m28s |       127.0.0.1 | POST     "/api/chat"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

  • The issue might be resolved by ensuring the NVIDIA GPU is properly recognized and utilized by Ollama, potentially through environment variable configuration or updating CUDA drivers.

Guidance

  • Verify that the CUDA 11.8 setup is correctly configured and compatible with the older NVIDIA GPU.
  • Check the environment variables to ensure that the GPU is properly set up for use with Ollama, potentially setting CUDA_VISIBLE_DEVICES to the correct device ID.
  • Investigate if there are any specific requirements or configurations needed for Ollama to work with older NVIDIA GPUs.
  • Review the log output for any hints about why the GPU is not being utilized, such as errors or warnings related to CUDA or GPU initialization.

Example

  • Setting the CUDA_VISIBLE_DEVICES environment variable before running Ollama might help, for example: CUDA_VISIBLE_DEVICES=0 ollama

Notes

  • The exact solution may depend on the specifics of the NVIDIA GPU model and the version of Ollama being used.
  • There might be limitations or incompatibilities between Ollama, CUDA 11.8, and the older NVIDIA GPU that need to be addressed.

Recommendation

  • Apply workaround: Setting environment variables or configuring CUDA settings might help resolve the issue, as it seems like a configuration or compatibility problem rather than a need for an upgrade.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING