ollama - 💡(How to fix) Fix `CUDA_VISIBLE_DEVICES` ignored with eGPU (RTX 5060 Ti)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Ollama v0.24.0 acknowledges CUDA_VISIBLE_DEVICES=1 in its startup log and even emits a WARN about it, yet proceeds to load all models and run all inference exclusively on GPU 0 (RTX 500 Ada, 4 GB), completely ignoring my eGPU 1 (RTX 5060 Ti, 16 GB eGPU).

Ollama sees the override and emits a WARN:

time=2026-05-18T08:02:19.191 level=WARN source=runner.go:536 time=2026-05-18T08:02:19.191 level=WARN source=runner.go:540

Code Example

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=..."

[Install]
WantedBy=default.target

---

[Service]
Environment="CUDA_VISIBLE_DEVICES=1"

---

# Server config — CUDA_VISIBLE_DEVICES:1 is present and correct:
time=2026-05-18T08:02:19.179 level=INFO source=routes.go:1802 msg="server config"
  env="map[CUDA_VISIBLE_DEVICES:1 ...]"

# Ollama sees the override and emits a WARN:
time=2026-05-18T08:02:19.191 level=WARN source=runner.go:536
  msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=1
time=2026-05-18T08:02:19.191 level=WARN source=runner.go:540
  msg="if GPUs are not correctly discovered, unset and try again"

# Despite the above, Ollama selects GPU-802e459c = RTX 500 Ada (GPU 0):
time=2026-05-18T08:02:20.686 level=INFO source=types.go:42 msg="inference compute"
  id=GPU-802e459c-a2ec-5e73-5ba8-825cc61760cf
  filter_id=""
  library=CUDA compute=8.9
  name=CUDA0
  description="NVIDIA RTX 500 Ada Generation Laptop GPU"
  pci_id=0000:01:00.0
  total="4.0 GiB" available="3.6 GiB"

---
RAW_BUFFERClick to expand / collapse

What is the issue?

Ollama v0.24.0 acknowledges CUDA_VISIBLE_DEVICES=1 in its startup log and even emits a WARN about it, yet proceeds to load all models and run all inference exclusively on GPU 0 (RTX 500 Ada, 4 GB), completely ignoring my eGPU 1 (RTX 5060 Ti, 16 GB eGPU).


Environment

| OS | Ubuntu Linux (kernel 7.0.0-15-generic) | | CPU | Intel Core Ultra 7 155H | | Ollama version | 0.24.0 | | NVIDIA driver | 595.58.03 | | CUDA version | 13.2 |

GPUs

  • NVIDIA RTX 500 Ada Generation Laptop GPU | 4 GB GDDR6
  • NVIDIA GeForce RTX 5060 Ti | 16 GB GDDR7 eGPU (Thunderbolt 3)

/etc/systemd/system/ollama.service

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=..."

[Install]
WantedBy=default.target

/etc/systemd/system/ollama.service.d/override.conf

[Service]
Environment="CUDA_VISIBLE_DEVICES=1"

Expected Behaviour

With CUDA_VISIBLE_DEVICES=1 set in the service environment, Ollama should

  1. Discover only GPU 1 (RTX 5060 Ti, 16 GB) as the available CUDA device
  2. Load models onto the RTX 5060 Ti
  3. Run all inference on the RTX 5060 Ti

Actual Behaviour

Ollama ignores CUDA_VISIBLE_DEVICES=1 and uses GPU 0 (RTX 500 Ada, 4 GB) instead.


Evidence

1 — Ollama startup log acknowledges CUDA_VISIBLE_DEVICES=1 then ignores it

# Server config — CUDA_VISIBLE_DEVICES:1 is present and correct:
time=2026-05-18T08:02:19.179 level=INFO source=routes.go:1802 msg="server config"
  env="map[CUDA_VISIBLE_DEVICES:1 ...]"

# Ollama sees the override and emits a WARN:
time=2026-05-18T08:02:19.191 level=WARN source=runner.go:536
  msg="user overrode visible devices" CUDA_VISIBLE_DEVICES=1
time=2026-05-18T08:02:19.191 level=WARN source=runner.go:540
  msg="if GPUs are not correctly discovered, unset and try again"

# Despite the above, Ollama selects GPU-802e459c = RTX 500 Ada (GPU 0):
time=2026-05-18T08:02:20.686 level=INFO source=types.go:42 msg="inference compute"
  id=GPU-802e459c-a2ec-5e73-5ba8-825cc61760cf
  filter_id=""
  library=CUDA compute=8.9
  name=CUDA0
  description="NVIDIA RTX 500 Ada Generation Laptop GPU"
  pci_id=0000:01:00.0
  total="4.0 GiB" available="3.6 GiB"

The RTX 5060 Ti (GPU 1, GPU-4f2cde50-643c-cc25-229e-c68abe8775bd) never appears in the inference compute log line.

Reproduction Steps

  1. System has two NVIDIA GPUs: GPU 0 = smaller/internal, GPU 1 = larger/eGPU.
  2. Set CUDA_VISIBLE_DEVICES=1 in the Ollama systemd service environment (override.conf or inline).
  3. Restart the Ollama service: sudo systemctl restart ollama.
  4. Confirm the env var is in the process environment: sudo cat /proc/$(pidof ollama)/environ | tr '\0' '\n' | grep CUDA.
  5. Run any inference: ollama run ministral-3:3b "Hello".
  6. Monitor GPU usage: nvidia-smi --query-gpu=index,name,memory.used,utilization.gpu --format=csv.

Observed: GPU 0 activates. GPU 1 stays idle.
Expected: GPU 1 activates. GPU 0 stays idle.


Models tested

ModelSize
ministral-3:3b3.0 GB
mistral-nemo:latest7.1 GB

Both loaded onto GPU 0 (4 GB VRAM) despite GPU 1 having 16 GB available on the eGPU.

Relevant log output

OS

Linux

GPU

Nvidia

CPU

Intel

Ollama version

0.24.0

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix `CUDA_VISIBLE_DEVICES` ignored with eGPU (RTX 5060 Ti)