ollama - 💡(How to fix) Fix Silent CPU fallback after driver update: CUDA forward compatibility error causes 100% CPU usage with no user warning [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15918Fetched 2026-05-02 05:27:43
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Error Message

Expected behavior: ollama should either refuse to load the model with a clear error, or emit a visible WARNING log when falling back to CPU due to a CUDA initialization failure.

Code Example

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
time=... source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=... source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
time=... source=device.go:256 msg="kv cache" device=CPU size="9.2 GiB"
time=... source=device.go:267 msg="compute graph" device=CPU size="1.5 GiB"
time=... source=device.go:272 msg="total memory" size="16.8 GiB"
[GIN] 2026/05/01 - 13:29:04 | 500 | 59.999378025s | 192.168.0.38 | POST "/api/chat"
RAW_BUFFERClick to expand / collapse

What is the issue?

After updating NVIDIA drivers today, ollama silently fell back to CPU-only inference instead of failing fast or warning the user. The model (qwen3.5:9b, Q4_K_M, 6.1 GiB) loaded entirely on CPU, consuming 766% CPU and 16 GB RAM for ~38 minutes, causing the system to reach 101°C, while the API kept returning 500 errors after 60s timeouts.

The fallback happened 3 times at 30-minute intervals (12:28, 12:58, 13:28) triggered by an external tool (OpenAI Codex) calling POST /api/chat. No warning was shown anywhere — ollama appeared to be running normally.

Expected behavior: ollama should either refuse to load the model with a clear error, or emit a visible WARNING log when falling back to CPU due to a CUDA initialization failure.

Ollama version: 0.20.6 NVIDIA driver updated today (NVML library version: 580.159) nvidia-smi itself fails with "Driver/library version mismatch" — kernel module still loaded from previous driver version, reboot pending.

Relevant log output

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
time=... source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=... source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
time=... source=device.go:256 msg="kv cache" device=CPU size="9.2 GiB"
time=... source=device.go:267 msg="compute graph" device=CPU size="1.5 GiB"
time=... source=device.go:272 msg="total memory" size="16.8 GiB"
[GIN] 2026/05/01 - 13:29:04 | 500 | 59.999378025s | 192.168.0.38 | POST "/api/chat"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

Reboot the system to ensure the updated NVIDIA driver's kernel module is loaded, potentially resolving the CUDA initialization failure causing ollama to fall back to CPU-only inference.

Guidance

  • The "Driver/library version mismatch" error from nvidia-smi suggests that the kernel module from the previous driver version is still loaded, which could be causing the CUDA initialization failure.
  • Rebooting the system should load the updated kernel module, potentially fixing the issue.
  • Verify that the NVIDIA driver update was successful and that the NVML library version matches the driver version after rebooting.
  • Check the ollama logs for any WARNING messages or errors related to CUDA initialization after rebooting to ensure the issue is resolved.

Notes

The issue seems to be related to the NVIDIA driver update and the kernel module not being updated. Rebooting the system should fix the issue, but if the problem persists, further investigation into the CUDA initialization failure may be necessary.

Recommendation

Apply workaround: Reboot the system to load the updated NVIDIA driver's kernel module, as this is likely to resolve the CUDA initialization failure and prevent ollama from falling back to CPU-only inference.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING