ollama - 💡(How to fix) Fix Silent CPU fallback after driver update: CUDA forward compatibility error causes 100% CPU usage with no user warning [1 participants]

ollama2026-05-01 16:45:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15918•Fetched 2026-05-02 05:27:43

View on GitHub

Comments

Participants

Timeline

Reactions

Author

0xpietri

Participants

0xpietri

Timeline (top)

labeled ×1

Error Message

Expected behavior: ollama should either refuse to load the model with a clear error, or emit a visible WARNING log when falling back to CPU due to a CUDA initialization failure.

Code Example

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
time=... source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=... source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
time=... source=device.go:256 msg="kv cache" device=CPU size="9.2 GiB"
time=... source=device.go:267 msg="compute graph" device=CPU size="1.5 GiB"
time=... source=device.go:272 msg="total memory" size="16.8 GiB"
[GIN] 2026/05/01 - 13:29:04 | 500 | 59.999378025s | 192.168.0.38 | POST "/api/chat"

RAW_BUFFERClick to expand / collapse

What is the issue?

After updating NVIDIA drivers today, ollama silently fell back to CPU-only inference instead of failing fast or warning the user. The model (qwen3.5:9b, Q4_K_M, 6.1 GiB) loaded entirely on CPU, consuming 766% CPU and 16 GB RAM for ~38 minutes, causing the system to reach 101°C, while the API kept returning 500 errors after 60s timeouts.

The fallback happened 3 times at 30-minute intervals (12:28, 12:58, 13:28) triggered by an external tool (OpenAI Codex) calling POST /api/chat. No warning was shown anywhere — ollama appeared to be running normally.

Expected behavior: ollama should either refuse to load the model with a clear error, or emit a visible WARNING log when falling back to CPU due to a CUDA initialization failure.

Ollama version: 0.20.6 NVIDIA driver updated today (NVML library version: 580.159) nvidia-smi itself fails with "Driver/library version mismatch" — kernel module still loaded from previous driver version, reboot pending.

Relevant log output

ggml_cuda_init: failed to initialize CUDA: forward compatibility was attempted on non supported HW
time=... source=ggml.go:494 msg="offloaded 0/33 layers to GPU"
time=... source=device.go:245 msg="model weights" device=CPU size="6.1 GiB"
time=... source=device.go:256 msg="kv cache" device=CPU size="9.2 GiB"
time=... source=device.go:267 msg="compute graph" device=CPU size="1.5 GiB"
time=... source=device.go:272 msg="total memory" size="16.8 GiB"
[GIN] 2026/05/01 - 13:29:04 | 500 | 59.999378025s | 192.168.0.38 | POST "/api/chat"

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

TL;DR

Reboot the system to ensure the updated NVIDIA driver's kernel module is loaded, potentially resolving the CUDA initialization failure causing ollama to fall back to CPU-only inference.

Guidance

The "Driver/library version mismatch" error from nvidia-smi suggests that the kernel module from the previous driver version is still loaded, which could be causing the CUDA initialization failure.
Rebooting the system should load the updated kernel module, potentially fixing the issue.
Verify that the NVIDIA driver update was successful and that the NVML library version matches the driver version after rebooting.
Check the ollama logs for any WARNING messages or errors related to CUDA initialization after rebooting to ensure the issue is resolved.

Notes

The issue seems to be related to the NVIDIA driver update and the kernel module not being updated. Rebooting the system should fix the issue, but if the problem persists, further investigation into the CUDA initialization failure may be necessary.

Recommendation

Apply workaround: Reboot the system to load the updated NVIDIA driver's kernel module, as this is likely to resolve the CUDA initialization failure and prevent ollama from falling back to CPU-only inference.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Silent CPU fallback after driver update: CUDA forward compatibility error causes 100% CPU usage with no user warning [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Silent CPU fallback after driver update: CUDA forward compatibility error causes 100% CPU usage with no user warning [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING