ollama - 💡(How to fix) Fix ROCm error: invalid device function in ggml_cuda_mul_mat_q on RX 6750 GRE (gfx1031) with ollama-for-amd patch on Windows [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14702Fetched 2026-04-08 00:32:44
View on GitHub
Comments
2
Participants
3
Timeline
5
Reactions
0
Author
Timeline (top)
commented ×2closed ×1cross-referenced ×1labeled ×1

Error Message

  1. Run ollama run qwen2.5:7b or any model → crash with above error ROCm error: invalid device function C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error time=2026-03-08T07:24:03.618+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error" time=2026-03-08T07:24:03.774+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" time=2026-03-08T07:24:03.870+08:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\Administrator.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 error="llama runner process has terminated: ROCm error" Model loads and offloads layers to ROCm without kernel execution error.

Fix Action

Fix / Workaround

Environment

  • OS: Windows 10/11 (assuming from path C:/a/ollama/...)
  • Ollama version: 0.17.7 (also tested v0.11.4, same crash)
  • GPU: AMD Radeon RX 6750 GRE 10GB (gfx1031, Navi 22)
  • Driver: AMD Adrenalin 60450.10 (or latest, same issue)
  • Patch used: ollama-for-amd (from https://github.com/likelovewant/ollama-for-amd) + ROCm libs (tried rocm.gfx1031.for.hip.sdk.6.1.2, 6.2.4, littlewu's logic variants)
  • Model: qwen2.5:7b (Q4_K_M), also tested smaller models
  • Environment variables: HSA_OVERRIDE_GFX_VERSION=10.3.0 (or 10.3.1), OLLAMA_DEBUG=1

Steps to reproduce

  1. Install Ollama Windows
  2. Replace rocm folder and rocblas library with gfx1031 patch from likelovewant/ollama-for-amd or ROCmLibs-for-gfx1103-AMD780M-APU
  3. Set HSA_OVERRIDE_GFX_VERSION=10.3.0
  4. Run ollama serve
  5. Run ollama run qwen2.5:7b or any model → crash with above error

Possible cause

Likely ggml-cuda/hip kernel regression in newer Ollama versions not compatible with gfx1031 patched rocblas library (Tensile kernels not rebuilt for the new ggml interface?).

RAW_BUFFERClick to expand / collapse

What is the issue?

This happens on both HSA_OVERRIDE_GFX_VERSION=10.3.0 and 10.3.1.

Environment

  • OS: Windows 10/11 (assuming from path C:/a/ollama/...)
  • Ollama version: 0.17.7 (also tested v0.11.4, same crash)
  • GPU: AMD Radeon RX 6750 GRE 10GB (gfx1031, Navi 22)
  • Driver: AMD Adrenalin 60450.10 (or latest, same issue)
  • Patch used: ollama-for-amd (from https://github.com/likelovewant/ollama-for-amd) + ROCm libs (tried rocm.gfx1031.for.hip.sdk.6.1.2, 6.2.4, littlewu's logic variants)
  • Model: qwen2.5:7b (Q4_K_M), also tested smaller models
  • Environment variables: HSA_OVERRIDE_GFX_VERSION=10.3.0 (or 10.3.1), OLLAMA_DEBUG=1

Steps to reproduce

  1. Install Ollama Windows
  2. Replace rocm folder and rocblas library with gfx1031 patch from likelovewant/ollama-for-amd or ROCmLibs-for-gfx1103-AMD780M-APU
  3. Set HSA_OVERRIDE_GFX_VERSION=10.3.0
  4. Run ollama serve
  5. Run ollama run qwen2.5:7b or any model → crash with above error

Logs snippet

llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_seq = 4096
llama_context: n_batch = 512
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
llama_context: ROCm_Host output buffer size = 0.59 MiB
llama_kv_cache: ROCm0 KV buffer size = 224.00 MiB
llama_kv_cache: size = 224.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 112.00 MiB, V (f16): 112.00 MiB
llama_context: Flash Attention was auto, set to enabled
ROCm error: invalid device function
current device: 0, in function ggml_cuda_mul_mat_q at C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/mmq.cu:128 hipGetLastError()
C:/a/ollama/ollama/ml/backend/ggml/ggml/src/ggml-cuda/ggml-cuda.cu:94: ROCm error
time=2026-03-08T07:24:03.618+08:00 level=INFO source=server.go:1384 msg="waiting for server to become available" status="llm server error"
time=2026-03-08T07:24:03.774+08:00 level=ERROR source=server.go:303 msg="llama runner terminated" error="exit status 1" time=2026-03-08T07:24:03.870+08:00 level=INFO source=sched.go:516 msg="Load failed" model=C:\Users\Administrator.ollama\models\blobs\sha256-96c415656d377afbff962f6cdb2394ab092ccbcbaab4b82525bc4ca800fe8a49 error="llama runner process has terminated: ROCm error"
[GIN] 2026/03/08 - 07:24:03 | 500 | 7.3890475s | 127.0.0.1 | POST "/api/generate"

Additional info

Expected behavior

Model loads and offloads layers to ROCm without kernel execution error.

Possible cause

Likely ggml-cuda/hip kernel regression in newer Ollama versions not compatible with gfx1031 patched rocblas library (Tensile kernels not rebuilt for the new ggml interface?).

Thanks for the great project! Happy to provide more logs or test patches.

Relevant log output

OS

windows 10

GPU

AMD Radeon RX 6750 GRE 10GB

CPU

i5-13400F

Ollama version

0.17.7

extent analysis

Fix Plan

To resolve the ROCm error caused by the incompatibility between the ggml-cuda/hip kernel and the gfx1031 patched rocblas library, follow these steps:

  1. Rebuild Tensile kernels:
    • Clone the Tensile repository: git clone https://github.com/RadeonOpenCompute/tensile.git
    • Checkout the branch compatible with your ROCm version (e.g., git checkout roc-6.1)
    • Configure and build Tensile:
      mkdir build
      cd build
      cmake.. -DCMAKE_BUILD_TYPE=Release
      cmake --build.
  2. Update ggml-cuda:
    • Clone the ggml repository: git clone https://github.com/ggerganov/ggml.git
    • Checkout the latest branch (or a branch known to work with your setup)
    • Apply any necessary patches for ROCm compatibility
    • Build ggml-cuda:
      mkdir build
      cd build
      cmake.. -DCMAKE_BUILD_TYPE=Release -DUSE_ROCM=ON
      cmake --build.
  3. Integrate with Ollama:
    • Replace the existing ggml-cuda library in your Ollama installation with the newly built one
    • Ensure that the path to the rebuilt Tensile kernels is correctly set in your environment variables
  4. Test Ollama:
    • Run ollama serve and then ollama run qwen2.5:7b to verify that the model loads and runs without the ROCm error

Verification

After applying these steps, verify that:

  • The ROCm error no longer appears in the logs
  • The model successfully loads and runs without kernel execution errors
  • Monitor system logs and Ollama output for any signs of instability or performance issues

Extra Tips

  • Regularly check for updates to Tensile, ggml-cuda, and ROCm to ensure compatibility and optimal performance
  • Consider contributing back any patches or fixes to the respective open-source projects to help the community
  • If issues persist, provide detailed logs and steps to reproduce to the Ollama and ROCm communities for further assistance

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Model loads and offloads layers to ROCm without kernel execution error.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING