ollama - 💡(How to fix) Fix ROCm error: invalid device function in ggml_cuda_mul_mat_q on RX 6750 GRE (gfx1031) with ollama-for-amd patch on Windows [2 comments, 3 participants]

ollama2026-03-08 07:42:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14702•Fetched 2026-04-08 00:32:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2closed ×1cross-referenced ×1labeled ×1

Error Message

Run ollama run qwen2.5:7b or any model → crash with above error ROCm error: invalid device function C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-08T07:24:03.618+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error" time=2026-03-08T07:24:03.774+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" time=2026-03-08T07:24:03.870+08:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\Administrator.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 error="llama runner process has terminated: ROCm error" Model loads and offloads layers to ROCm without kernel execution error.

Fix Action

Fix / Workaround

Environment

OS: Windows 10/11 (assuming from path C:/a/ollama/...)
Ollama version: 0.17.7 (also tested v0.11.4, same crash)
GPU: AMD Radeon RX 6750 GRE 10GB (gfx1031, Navi 22)
Driver: AMD Adrenalin 60450.10 (or latest, same issue)
Patch used: ollama-for-amd (from https://github.com/likelovewant/ollama-for-amd) + ROCm libs (tried rocm.gfx1031.for.hip.sdk.6.1.2, 6.2.4, littlewu's logic variants)
Model: qwen2.5:7b (Q4_K_M), also tested smaller models
Environment variables: HSA_OVERRIDE_GFX_VERSION=10.3.0 (or 10.3.1), OLLAMA_DEBUG=1

Steps to reproduce

Install Ollama Windows
Replace rocm folder and rocblas library with gfx1031 patch from likelovewant/ollama-for-amd or ROCmLibs-for-gfx1103-AMD780M-APU
Set HSA_OVERRIDE_GFX_VERSION=10.3.0
Run ollama serve
Run ollama run qwen2.5:7b or any model → crash with above error

Possible cause

Likely ggml-cuda/hip kernel regression in newer Ollama versions not compatible with gfx1031 patched rocblas library (Tensile kernels not rebuilt for the new ggml interface?).

RAW_BUFFERClick to expand / collapse

What is the issue?

This happens on both HSA_OVERRIDE_GFX_VERSION=10.3.0 and 10.3.1.

Environment

OS: Windows 10/11 (assuming from path C:/a/ollama/...)
Ollama version: 0.17.7 (also tested v0.11.4, same crash)
GPU: AMD Radeon RX 6750 GRE 10GB (gfx1031, Navi 22)
Driver: AMD Adrenalin 60450.10 (or latest, same issue)
Patch used: ollama-for-amd (from https://github.com/likelovewant/ollama-for-amd) + ROCm libs (tried rocm.gfx1031.for.hip.sdk.6.1.2, 6.2.4, littlewu's logic variants)
Model: qwen2.5:7b (Q4_K_M), also tested smaller models
Environment variables: HSA_OVERRIDE_GFX_VERSION=10.3.0 (or 10.3.1), OLLAMA_DEBUG=1

Steps to reproduce

Install Ollama Windows
Replace rocm folder and rocblas library with gfx1031 patch from likelovewant/ollama-for-amd or ROCmLibs-for-gfx1103-AMD780M-APU
Set HSA_OVERRIDE_GFX_VERSION=10.3.0
Run ollama serve
Run ollama run qwen2.5:7b or any model → crash with above error

Logs snippet

llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context: ROCm_Host output buffer size = 0.59 MiB
llama_kv_cache: ROCm0 KV buffer size = 224.00 MiB
llama_kv_cache: size = 224.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 112.00 MiB, V (f16): 112.00 MiB
llama_context: Flash Attention was auto, set to enabled
ROCm error: invalid device function
current device: 0, in function ggml_cuda_mul_mat_q at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/mmq.cu:128 hipGetLastError()
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-08T07:24:03.618+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error"
time=2026-03-08T07:24:03.774+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" time=2026-03-08T07:24:03.870+08:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\Administrator.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 error="llama runner process has terminated: ROCm error"
[GIN] 2026/03/08 - 07:24:03 | 500 | 7.3890475s | 127.0.0.1 | POST "/api/generate"

Additional info

Expected behavior

Model loads and offloads layers to ROCm without kernel execution error.

Possible cause

Likely ggml-cuda/hip kernel regression in newer Ollama versions not compatible with gfx1031 patched rocblas library (Tensile kernels not rebuilt for the new ggml interface?).

Thanks for the great project! Happy to provide more logs or test patches.

Relevant log output

OS

windows 10

GPU

AMD Radeon RX 6750 GRE 10GB

CPU

i5-13400F

Ollama version

0.17.7

extent analysis

Fix Plan

To resolve the ROCm error caused by the incompatibility between the ggml-cuda/hip kernel and the gfx1031 patched rocblas library, follow these steps:

Rebuild Tensile kernels:
- Clone the Tensile repository: git clone https://github.com/RadeonOpenCompute/tensile.git
- Checkout the branch compatible with your ROCm version (e.g., git checkout roc-6.1)
- Configure and build Tensile:
```
mkdir build
cd build
cmake.. -DCMAKE_BUILD_TYPE=Release
cmake --build.
```
Update ggml-cuda:
- Clone the ggml repository: git clone https://github.com/ggerganov/ggml.git
- Checkout the latest branch (or a branch known to work with your setup)
- Apply any necessary patches for ROCm compatibility
- Build ggml-cuda:
```
mkdir build
cd build
cmake.. -DCMAKE_BUILD_TYPE=Release -DUSE_ROCM=ON
cmake --build.
```
Integrate with Ollama:
- Replace the existing ggml-cuda library in your Ollama installation with the newly built one
- Ensure that the path to the rebuilt Tensile kernels is correctly set in your environment variables
Test Ollama:
- Run ollama serve and then ollama run qwen2.5:7b to verify that the model loads and runs without the ROCm error

Verification

After applying these steps, verify that:

The ROCm error no longer appears in the logs
The model successfully loads and runs without kernel execution errors
Monitor system logs and Ollama output for any signs of instability or performance issues

Extra Tips

Regularly check for updates to Tensile, ggml-cuda, and ROCm to ensure compatibility and optimal performance
Consider contributing back any patches or fixes to the respective open-source projects to help the community
If issues persist, provide detailed logs and steps to reproduce to the Ollama and ROCm communities for further assistance

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Model loads and offloads layers to ROCm without kernel execution error.

#api #ssr #installation #tensor shape #permission error #memory optimization #batch processing #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix ROCm error: invalid device function in ggml_cuda_mul_mat_q on RX 6750 GRE (gfx1031) with ollama-for-amd patch on Windows [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Environment

Steps to reproduce

Possible cause

What is the issue?

Environment

Steps to reproduce

Logs snippet

Additional info

Expected behavior

Possible cause

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix ROCm error: invalid device function in ggml_cuda_mul_mat_q on RX 6750 GRE (gfx1031) with ollama-for-amd patch on Windows [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Environment

Steps to reproduce

Possible cause

What is the issue?

Environment

Steps to reproduce

Logs snippet

Additional info

Expected behavior

Possible cause

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING