ollama - 💡(How to fix) Fix Vulkan backend ignores maxComputeWorkGroupInvocations and creates 512-invocation pipelines (breaks Raspberry Pi 5 / V3D, cap = 256) [1 participants]

Q: Expected behavior

Ollama's Vulkan backend should respect `VkPhysicalDeviceLimits::maxComputeWorkGroupInvocations` for every pipeline it creates, the same way it already does for argsort.

ollama2026-04-29 11:51:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15877•Fetched 2026-04-30 06:18:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

txenoo

Participants

txenoo

Timeline (top)

labeled ×1

Ollama's Vulkan backend hardcodes {512, 1, 1} workgroup sizes in dozens of ggml_vk_create_pipeline(...) calls and in the corresponding .comp shaders (layout(local_size_x = 512, ...)). On Raspberry Pi 5 (Broadcom V3D, V3DV driver) the hardware advertises VkPhysicalDeviceLimits::maxComputeWorkGroupInvocations = 256, so every one of those pipelines exceeds the device limit and is rejected by the V3D shader compiler:

nir_to_vir.c:  assert(c->local_invocation_index_bits <= 8);

(8 bits = 256 invocations.)

This is the same root cause behind the "Vulkan doesn't work on Raspberry Pi 5" reports against llama.cpp:

ggml-org/llama.cpp#9801 — "Bug: [vulkan] llama.cpp not work on Raspberry Pi 5"
ggml-org/llama.cpp#5237 — original Jan 2024 report

This issue identifies the specific bug and points to the V3DV-side workaround pending to land in Mesa.

Root Cause

This is the same root cause behind the "Vulkan doesn't work on Raspberry Pi 5" reports against llama.cpp:

Fix Action

Fix / Workaround

This issue identifies the specific bug and points to the V3DV-side workaround pending to land in Mesa.

On a Raspberry Pi 5 with stock Mesa V3DV (no workaround applied):

The runner crashes during shader pipeline creation on the first compute dispatch. The affected ops (cpy, get_rows, add/mul, gelu, soft_max, rope, …) are on the hot path of every transformer forward pass, so any model fails immediately.

Code Example

nir_to_vir.c:  assert(c->local_invocation_index_bits <= 8);

---

device->max_workgroup_size_log2 =
       uint32_t(log2f(float(device->properties.limits.maxComputeWorkGroupInvocations)));

---

# 1. Start the server with Vulkan enabled (it's off by default).
OLLAMA_VULKAN=1 ollama serve

# 2. In another shell, run any model.
ollama run gemma3:1b

---

RAW_BUFFERClick to expand / collapse

What is the issue?

Summary

nir_to_vir.c:  assert(c->local_invocation_index_bits <= 8);

(8 bits = 256 invocations.)

This is the same root cause behind the "Vulkan doesn't work on Raspberry Pi 5" reports against llama.cpp:

ggml-org/llama.cpp#9801 — "Bug: [vulkan] llama.cpp not work on Raspberry Pi 5"
ggml-org/llama.cpp#5237 — original Jan 2024 report

This issue identifies the specific bug and points to the V3DV-side workaround pending to land in Mesa.

Environment

Hardware: Raspberry Pi 5 (Broadcom BCM2712, V3D 7.1.7)
OS: Debian Trixie (aarch64)
Driver: Mesa V3DV (Vulkan 1.3)
Ollama: built with -DGGML_VULKAN=ON, server launched with OLLAMA_VULKAN=1
Device limits (relevant ones):
- maxComputeWorkGroupInvocations = 256
- maxComputeWorkGroupSize = [256, 256, 256]
- subgroupSize = 16

What's happening

ggml-vulkan.cpp:4478 correctly reads the device limit:

device->max_workgroup_size_log2 =
    uint32_t(log2f(float(device->properties.limits.maxComputeWorkGroupInvocations)));

On V3D this yields log2(256) = 8.

A handful of code paths honor that cap but most of the pipelines hardcode {512, 1, 1} and ignore max_workgroup_size_log2. The matching .comp shaders declare the same hardcoded size, e.g. abs.comp, concat.comp, copy.comp, … all local_size_x = 512.

Steps to reproduce

On a Raspberry Pi 5 with stock Mesa V3DV (no workaround applied):

# 1. Start the server with Vulkan enabled (it's off by default).
OLLAMA_VULKAN=1 ollama serve

# 2. In another shell, run any model.
ollama run gemma3:1b

Expected behavior

Ollama's Vulkan backend should respect VkPhysicalDeviceLimits::maxComputeWorkGroupInvocations for every pipeline it creates, the same way it already does for argsort.

Driver-side workaround (V3DV)

So we implemented at the driver side a workaround for applications that don't honour the maxComputeWorkGroupInvocations lowering oversized workgroups itself so it doesn't assert. Mesa MRs:

https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41257 — likely sufficient on its own; lowers oversized compute workgroups to 256 invocations so ggml-vulkan pipelines compile on V3DV.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41256 - Fixes a shader compilation bug.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/41255 - Needed for performance enabling v8dot HW.

With 3 previous Mesa MR RPi5 can use run ollama, altough there are many opportunities of performance improvements both in Mesa and ggml size making adjustments for Broadcom arquitecture like it is dome for other vendors.

These let RPi5 users run Ollama today, but the wrapping loop the driver inserts is pure overhead — fixing it in the backend would let V3D run dispatches at native size.

Relevant log output

OS

Linux

GPU

Other

CPU

Other

Ollama version

0.21.2

extent analysis

TL;DR

The most likely fix is to update the Vulkan backend to respect the VkPhysicalDeviceLimits::maxComputeWorkGroupInvocations limit for every pipeline creation.

Guidance

Review the ggml-vulkan.cpp file and update the hardcoded workgroup sizes in the ggml_vk_create_pipeline calls to use the max_workgroup_size_log2 value instead of hardcoding {512, 1, 1}.
Update the corresponding .comp shaders to use the same dynamic workgroup size.
Verify that the updated pipelines respect the device limit by checking the maxComputeWorkGroupInvocations value.
Consider applying the driver-side workaround by merging the Mesa MRs (41257, 41256, and 41255) for improved performance.

Example

// Update ggml-vulkan.cpp to use dynamic workgroup size
device->max_workgroup_size_log2 = uint32_t(log2f(float(device->properties.limits.maxComputeWorkGroupInvocations)));
// ...
ggml_vk_create_pipeline(..., device->max_workgroup_size_log2, ...);

Notes

The fix requires updating the Vulkan backend to respect the device limits, which may involve modifying multiple code paths. The driver-side workaround can provide a temporary solution, but fixing the backend is recommended for optimal performance.

Recommendation

Apply the workaround by merging the Mesa MRs (41257, 41256, and 41255) to enable running Ollama on Raspberry Pi 5, and then update the Vulkan backend to respect the device limits for a more permanent fix.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Ollama's Vulkan backend should respect VkPhysicalDeviceLimits::maxComputeWorkGroupInvocations for every pipeline it creates, the same way it already does for argsort.

#permission error #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Vulkan backend ignores maxComputeWorkGroupInvocations and creates 512-invocation pipelines (breaks Raspberry Pi 5 / V3D, cap = 256) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

What is the issue?

Summary

Environment

What's happening

Steps to reproduce

Expected behavior

Driver-side workaround (V3DV)

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Vulkan backend ignores maxComputeWorkGroupInvocations and creates 512-invocation pipelines (breaks Raspberry Pi 5 / V3D, cap = 256) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

What is the issue?

Summary

Environment

What's happening

Steps to reproduce

Expected behavior

Driver-side workaround (V3DV)

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING