ollama - 💡(How to fix) Fix Vulkan/i915: gemma4:26b produces garbled output on Intel Arc Arrow Lake-P iGPU (regression); gemma4:e4b alloc_tensor_range failure [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15248Fetched 2026-04-08 02:33:42
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
1
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1subscribed ×1

Error Message

gemma4:e4b fails to load on Vulkan with a hard allocation error:

RAW_BUFFERClick to expand / collapse

What is the issue?

Environment

OS Ubuntu 24.04.4 LTS, kernel 6.17.0-19-generic CPU Intel Core Ultra 5 225H GPU Intel Arc (Arrow Lake-P) iGPU, device 0x7d51 Driver i915 (not xe) Vulkan Mesa ANV 25.2.8, Intel(R) Graphics (ARL), Vulkan 1.4 Ollama ollama/ollama:latest (container, via Podman) Vulkan ICD Mounted from host: -v /usr/share/vulkan:/usr/share/vulkan:ro

Issue 1 — gemma4:e4b: hard buffer allocation failure

gemma4:e4b fails to load on Vulkan with a hard allocation error:

alloc_tensor_range: failed to allocate Vulkan0 buffer of size 5637144576 offloading output layer to CPU offloaded 42/43 layers to GPU Model weights fall back to CPU while KV cache remains on GPU. All output is garbage (multilingual/corrupted tokens). The model is unusable..

Issue 2 — gemma4:26b: worked then regressed

gemma4:26b initially loaded and ran correctly:

offloaded 31/31 layers to GPU model weights device=Vulkan0 Output was clean at ~2.6 tok/s. The following day (no host driver or kernel changes confirmed via dpkg logs), the same model on the same container produced garbled output identical to the e4b failure pattern. Removed.

qwen2.5-coder:7b (dense, non-MoE) continues to work correctly on the same setup.

Relevant log output

OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.20.0

extent analysis

TL;DR

The issue can be mitigated by reducing the buffer size allocation for Vulkan or optimizing the model to reduce memory requirements.

Guidance

  • Investigate the alloc_tensor_range function to understand the buffer allocation failure and potential workarounds.
  • Consider reducing the model size or complexity to decrease memory requirements, as the qwen2.5-coder:7b model works correctly on the same setup.
  • Verify that the issue is not related to the Vulkan driver or kernel by checking for updates and testing with a different driver version.
  • Test the model with a smaller input size to see if the issue is related to the input data.

Example

No specific code example is provided due to the lack of detailed information about the alloc_tensor_range function or the model implementation.

Notes

The issue may be related to the specific model architecture or the Vulkan driver implementation. Further investigation is required to determine the root cause.

Recommendation

Apply a workaround by reducing the model size or complexity to decrease memory requirements, as this is a safer approach than upgrading the driver or kernel without confirmation of a fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING