ollama - 💡(How to fix) Fix AMD Radeon RX 9070 partial offload segfaults in ggml_backend_tensor_set

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:494 msg="offloaded 24/31 layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="11.9 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="5.4 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="352.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="88.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="423.7 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="8.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB"

SIGSEGV: segmentation violation
PC=0x7fe47550254f m=34 sigcode=1 addr=0x7fdaadd53b92
signal arrived during cgo execution

goroutine 580 gp=0xc000583880 m=34 mp=0xc001f12808 [syscall]:
runtime.cgocall(0x55ab5556e850, 0xc000dd7bd0)
        runtime/cgocall.go:167 +0x4b fp=0xc000dd7ba8 sp=0xc000dd7b70 pc=0x55ab5444becb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_tensor_set(0x7fe47e2ebce0, 0xc002b88000, 0x7e40000, 0x20000)
        _cgo_gotypes.go:1097 +0x45 fp=0xc000dd7bd0 sp=0xc000dd7ba8 pc=0x55ab5496f825
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3.3(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3()
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613 +0xb3c fp=0xc000dd7f78 sp=0xc000dd7bd0 pc=0x55ab5497c05c

---

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="GGML_VK_VISIBLE_DEVICES="
Environment="HSA_OVERRIDE_GFX_VERSION=12.0.1"

---

systemctl restart ollama
ollama run gemma4:26b
RAW_BUFFERClick to expand / collapse

What is the issue?

Running gemma4:26b (Q4_K_M) on an AMD Radeon RX 9070 (gfx1201, RDNA4, 16GB VRAM) with partial ROCm offload causes a SIGSEGV in ggml_backend_tensor_set during model load, consistently and immediately.

The model weights (16.6 GiB) exceed available VRAM (~13–14 GiB after display driver reservation), so ollama falls back to partial offload (24/31 layers on GPU). The crash occurs within a few seconds of the load beginning, before the model is ready to serve requests. I'm using PARAMETER num_ctx 2048 for testing to minimize the overflow/offload.

Two env vars are required to reach this path: HSA_OVERRIDE_GFX_VERSION=12.0.1 (gfx1201 is not in ollama's supported GPU list) and GGML_VK_VISIBLE_DEVICES="" (without this, ollama selects my Intel iGPU via Vulkan — which reports ~47 GiB "available" by using system RAM — over the AMD dGPU via ROCm).

Relevant log output

time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:494 msg="offloaded 24/31 layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="11.9 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="5.4 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="352.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="88.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="423.7 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="8.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB"

SIGSEGV: segmentation violation
PC=0x7fe47550254f m=34 sigcode=1 addr=0x7fdaadd53b92
signal arrived during cgo execution

goroutine 580 gp=0xc000583880 m=34 mp=0xc001f12808 [syscall]:
runtime.cgocall(0x55ab5556e850, 0xc000dd7bd0)
        runtime/cgocall.go:167 +0x4b fp=0xc000dd7ba8 sp=0xc000dd7b70 pc=0x55ab5444becb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_tensor_set(0x7fe47e2ebce0, 0xc002b88000, 0x7e40000, 0x20000)
        _cgo_gotypes.go:1097 +0x45 fp=0xc000dd7bd0 sp=0xc000dd7ba8 pc=0x55ab5496f825
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3.3(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3()
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613 +0xb3c fp=0xc000dd7f78 sp=0xc000dd7bd0 pc=0x55ab5497c05c

System information

  • Ollama version: 0.21.1 (official release)
  • OS: Gentoo Linux, kernel 6.18.26-gentoo-x86_64
  • GPU: AMD Radeon RX 9070 (gfx1201 / RDNA4), 16 GB VRAM
  • GPU firmware: gc_12_0_1, psp_14_0_3, dcn_4_0_1, sdma_7_0_1, vcn_5_0_0
  • CPU: Intel (with integrated UHD 630)
  • HSA Runtime: 1.1
  • Model: gemma4:26b Q4_K_M

Reproduction

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="GGML_VK_VISIBLE_DEVICES="
Environment="HSA_OVERRIDE_GFX_VERSION=12.0.1"
systemctl restart ollama
ollama run gemma4:26b

Additional context

  • CPU-only inference (PARAMETER num_gpu 0) works correctly
  • Full GPU offload is not possible as the model exceeds available VRAM
  • gfx1201 is not in ollama's supported GPU list, hence the HSA_OVERRIDE_GFX_VERSION requirement
  • GGML_VK_VISIBLE_DEVICES="" is needed to prevent ollama from routing to the Intel iGPU via Vulkan instead of the AMD dGPU via ROCm

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.21.1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING