ollama - 💡(How to fix) Fix AMD Radeon RX 9070 partial offload segfaults in ggml_backend_tensor

Code Example

time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:494 msg="offloaded 24/31 layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="11.9 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="5.4 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="352.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="88.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="423.7 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="8.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB"

SIGSEGV: segmentation violation
PC=0x7fe47550254f m=34 sigcode=1 addr=0x7fdaadd53b92
signal arrived during cgo execution

goroutine 580 gp=0xc000583880 m=34 mp=0xc001f12808 [syscall]:
runtime.cgocall(0x55ab5556e850, 0xc000dd7bd0)
        runtime/cgocall.go:167 +0x4b fp=0xc000dd7ba8 sp=0xc000dd7b70 pc=0x55ab5444becb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_tensor_set(0x7fe47e2ebce0, 0xc002b88000, 0x7e40000, 0x20000)
        _cgo_gotypes.go:1097 +0x45 fp=0xc000dd7bd0 sp=0xc000dd7ba8 pc=0x55ab5496f825
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3.3(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3()
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613 +0xb3c fp=0xc000dd7f78 sp=0xc000dd7bd0 pc=0x55ab5497c05c

---

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="GGML_VK_VISIBLE_DEVICES="
Environment="HSA_OVERRIDE_GFX_VERSION=12.0.1"

---

systemctl restart ollama
ollama run gemma4:26b

What is the issue?

Running gemma4:26b (Q4_K_M) on an AMD Radeon RX 9070 (gfx1201, RDNA4, 16GB VRAM) with partial ROCm offload causes a SIGSEGV in ggml_backend_tensor_set during model load, consistently and immediately.

The model weights (16.6 GiB) exceed available VRAM (~13–14 GiB after display driver reservation), so ollama falls back to partial offload (24/31 layers on GPU). The crash occurs within a few seconds of the load beginning, before the model is ready to serve requests. I'm using PARAMETER num_ctx 2048 for testing to minimize the overflow/offload.

Two env vars are required to reach this path: HSA_OVERRIDE_GFX_VERSION=12.0.1 (gfx1201 is not in ollama's supported GPU list) and GGML_VK_VISIBLE_DEVICES="" (without this, ollama selects my Intel iGPU via Vulkan — which reports ~47 GiB "available" by using system RAM — over the AMD dGPU via ROCm).

Relevant log output

time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:482 msg="offloading 24 repeating layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:486 msg="offloading output layer to CPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=ggml.go:494 msg="offloaded 24/31 layers to GPU"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:240 msg="model weights" device=ROCm0 size="11.9 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:245 msg="model weights" device=CPU size="5.4 GiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:251 msg="kv cache" device=ROCm0 size="352.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:256 msg="kv cache" device=CPU size="88.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:262 msg="compute graph" device=ROCm0 size="423.7 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:267 msg="compute graph" device=CPU size="8.0 MiB"
time=2026-05-21T15:51:50.820-05:00 level=INFO source=device.go:272 msg="total memory" size="18.2 GiB"

SIGSEGV: segmentation violation
PC=0x7fe47550254f m=34 sigcode=1 addr=0x7fdaadd53b92
signal arrived during cgo execution

goroutine 580 gp=0xc000583880 m=34 mp=0xc001f12808 [syscall]:
runtime.cgocall(0x55ab5556e850, 0xc000dd7bd0)
        runtime/cgocall.go:167 +0x4b fp=0xc000dd7ba8 sp=0xc000dd7b70 pc=0x55ab5444becb
github.com/ollama/ollama/ml/backend/ggml._Cfunc_ggml_backend_tensor_set(0x7fe47e2ebce0, 0xc002b88000, 0x7e40000, 0x20000)
        _cgo_gotypes.go:1097 +0x45 fp=0xc000dd7bd0 sp=0xc000dd7ba8 pc=0x55ab5496f825
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3.3(...)
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613
github.com/ollama/ollama/ml/backend/ggml.(*Backend).Load.func3()
        github.com/ollama/ollama/ml/backend/ggml/ggml.go:613 +0xb3c fp=0xc000dd7f78 sp=0xc000dd7bd0 pc=0x55ab5497c05c

System information

Ollama version: 0.21.1 (official release)
OS: Gentoo Linux, kernel 6.18.26-gentoo-x86_64
GPU: AMD Radeon RX 9070 (gfx1201 / RDNA4), 16 GB VRAM
GPU firmware: gc_12_0_1, psp_14_0_3, dcn_4_0_1, sdma_7_0_1, vcn_5_0_0
CPU: Intel (with integrated UHD 630)
HSA Runtime: 1.1
Model: gemma4:26b Q4_K_M

Reproduction

# /etc/systemd/system/ollama.service.d/override.conf
[Service]
Environment="GGML_VK_VISIBLE_DEVICES="
Environment="HSA_OVERRIDE_GFX_VERSION=12.0.1"

systemctl restart ollama
ollama run gemma4:26b

Additional context

CPU-only inference (PARAMETER num_gpu 0) works correctly
Full GPU offload is not possible as the model exceeds available VRAM
gfx1201 is not in ollama's supported GPU list, hence the HSA_OVERRIDE_GFX_VERSION requirement
GGML_VK_VISIBLE_DEVICES="" is needed to prevent ollama from routing to the Intel iGPU via Vulkan instead of the AMD dGPU via ROCm

OS

Linux

GPU

AMD

CPU

Intel

Ollama version

0.21.1

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix AMD Radeon RX 9070 partial offload segfaults in ggml_backend_tensor_set

Recommended Tools

GitHub issue graph ai analysis