ollama - 💡(How to fix) Fix Vulkan/i915: gemma4:26b produces garbled output on Intel Arc Arrow Lake-P iGPU (regression); gemma4:e4b alloc_tensor_range failure [1 participants]

ollama2026-04-03 00:02:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15248•Fetched 2026-04-08 02:33:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

fjwood69

Participants

fjwood69

Timeline (top)

cross-referenced ×1labeled ×1subscribed ×1

Error Message

gemma4:e4b fails to load on Vulkan with a hard allocation error:

RAW_BUFFERClick to expand / collapse

What is the issue?

Environment

OS Ubuntu 24.04.4 LTS, kernel 6.17.0-19-generic CPU Intel Core Ultra 5 225H GPU Intel Arc (Arrow Lake-P) iGPU, device 0x7d51 Driver i915 (not xe) Vulkan Mesa ANV 25.2.8, Intel(R) Graphics (ARL), Vulkan 1.4 Ollama ollama/ollama:latest (container, via Podman) Vulkan ICD Mounted from host: -v /usr/share/vulkan:/usr/share/vulkan:ro

Issue 1 — gemma4:e4b: hard buffer allocation failure

gemma4:e4b fails to load on Vulkan with a hard allocation error:

alloc_tensor_range: failed to allocate Vulkan0 buffer of size 5637144576 offloading output layer to CPU offloaded 42/43 layers to GPU Model weights fall back to CPU while KV cache remains on GPU. All output is garbage (multilingual/corrupted tokens). The model is unusable..

Issue 2 — gemma4:26b: worked then regressed

gemma4:26b initially loaded and ran correctly:

offloaded 31/31 layers to GPU model weights device=Vulkan0 Output was clean at ~2.6 tok/s. The following day (no host driver or kernel changes confirmed via dpkg logs), the same model on the same container produced garbled output identical to the e4b failure pattern. Removed.

qwen2.5-coder:7b (dense, non-MoE) continues to work correctly on the same setup.

Relevant log output

OS

Linux

GPU

Intel

CPU

Intel

Ollama version

0.20.0

extent analysis

TL;DR

The issue can be mitigated by reducing the buffer size allocation for Vulkan or optimizing the model to reduce memory requirements.

Guidance

Investigate the alloc_tensor_range function to understand the buffer allocation failure and potential workarounds.
Consider reducing the model size or complexity to decrease memory requirements, as the qwen2.5-coder:7b model works correctly on the same setup.
Verify that the issue is not related to the Vulkan driver or kernel by checking for updates and testing with a different driver version.
Test the model with a smaller input size to see if the issue is related to the input data.

Example

No specific code example is provided due to the lack of detailed information about the alloc_tensor_range function or the model implementation.

Notes

The issue may be related to the specific model architecture or the Vulkan driver implementation. Further investigation is required to determine the root cause.

Recommendation

Apply a workaround by reducing the model size or complexity to decrease memory requirements, as this is a safer approach than upgrading the driver or kernel without confirmation of a fix.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#parallel task #integration issue #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Vulkan/i915: gemma4:26b produces garbled output on Intel Arc Arrow Lake-P iGPU (regression); gemma4:e4b alloc_tensor_range failure [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Vulkan/i915: gemma4:26b produces garbled output on Intel Arc Arrow Lake-P iGPU (regression); gemma4:e4b alloc_tensor_range failure [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING