ollama - 💡(How to fix) Fix [Windows 10] Older NVIDIA GPUs (Maxwell) force fallback to CPU mode, returns 500 error [1 comments, 2 participants]

ollama2026-03-31 09:20:56

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15167•Fetched 2026-04-08 01:58:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Haru95572

Participants

Haru95572

rick-github

Timeline (top)

labeled ×2commented ×1

Error Message

Chat requests return 500 Internal Server Error

Code Example

time=2026-03-31T16:56:18.686+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="3.0 GiB"
time=2026-03-31T16:56:18.686+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2026-03-31T16:56:18.690+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.5 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:272 msg="total memory" size="7.7 GiB"
time=2026-03-31T16:56:18.695+08:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:56:18.697+08:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:56:18.699+08:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:56:21.454+08:00 level=INFO source=server.go:1390 msg="llama runner started in 7.87 seconds"
[GIN] 2026/03/31 - 16:57:42 | 500 |         1m28s |       127.0.0.1 | POST     "/api/chat"

RAW_BUFFERClick to expand / collapse

What is the issue?

Problem Description Ollama 0.19.0 on Windows 10 fails to detect the older NVIDIA GPU and forces fallback to CPU mode despite correct CUDA 11.8 setup.

Log shows id=cpu and total_vram="0 B".
GPU usage is 0% (Task Manager).
Chat requests return 500 Internal Server Error

Expected behavior Ollama should detect the GPU, use CUDA acceleration, and handle requests normally.

Relevant log output

time=2026-03-31T16:56:18.686+08:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="3.0 GiB"
time=2026-03-31T16:56:18.686+08:00 level=INFO source=ggml.go:482 msg="offloading 0 repeating layers to GPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-03-31T16:56:18.687+08:00 level=INFO source=ggml.go:494 msg="offloaded 0/25 layers to GPU"
time=2026-03-31T16:56:18.690+08:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="3.5 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="1.2 GiB"
time=2026-03-31T16:56:18.693+08:00 level=INFO source=device.go:272 msg="total memory" size="7.7 GiB"
time=2026-03-31T16:56:18.695+08:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-03-31T16:56:18.697+08:00 level=INFO source=server.go:1352 msg="waiting for llama runner to start responding"
time=2026-03-31T16:56:18.699+08:00 level=INFO source=server.go:1386 msg="waiting for server to become available" status="llm server loading model"
time=2026-03-31T16:56:21.454+08:00 level=INFO source=server.go:1390 msg="llama runner started in 7.87 seconds"
[GIN] 2026/03/31 - 16:57:42 | 500 |         1m28s |       127.0.0.1 | POST     "/api/chat"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

The issue might be resolved by ensuring the NVIDIA GPU is properly recognized and utilized by Ollama, potentially through environment variable configuration or updating CUDA drivers.

Guidance

Verify that the CUDA 11.8 setup is correctly configured and compatible with the older NVIDIA GPU.
Check the environment variables to ensure that the GPU is properly set up for use with Ollama, potentially setting CUDA_VISIBLE_DEVICES to the correct device ID.
Investigate if there are any specific requirements or configurations needed for Ollama to work with older NVIDIA GPUs.
Review the log output for any hints about why the GPU is not being utilized, such as errors or warnings related to CUDA or GPU initialization.

Example

Setting the CUDA_VISIBLE_DEVICES environment variable before running Ollama might help, for example: CUDA_VISIBLE_DEVICES=0 ollama

Notes

The exact solution may depend on the specifics of the NVIDIA GPU model and the version of Ollama being used.
There might be limitations or incompatibilities between Ollama, CUDA 11.8, and the older NVIDIA GPU that need to be addressed.

Recommendation

Apply workaround: Setting environment variables or configuring CUDA settings might help resolve the issue, as it seems like a configuration or compatibility problem rather than a need for an upgrade.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix [Windows 10] Older NVIDIA GPUs (Maxwell) force fallback to CPU mode, returns 500 error [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix [Windows 10] Older NVIDIA GPUs (Maxwell) force fallback to CPU mode, returns 500 error [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING