ollama - 💡(How to fix) Fix [Bug] 500 Error with OLLAMA_FLASH_ATTENTION=true on Intel iGPU (Vulkan) when processing high-res images [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15993Fetched 2026-05-07 03:32:03
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
labeled ×1

Error Message

When using the Vulkan backend on an Intel iGPU, setting OLLAMA_FLASH_ATTENTION=true causes a 500 Error when attempting to process high-resolution images with multimodal models. 5. The server immediately returns a 500 Internal Server Error.

RAW_BUFFERClick to expand / collapse

What is the issue?

Describe the bug

When using the Vulkan backend on an Intel iGPU, setting OLLAMA_FLASH_ATTENTION=true causes a 500 Error when attempting to process high-resolution images with multimodal models.

If I set OLLAMA_FLASH_ATTENTION=false, the models can process the images without crashing, but the generation quality and output accuracy drop significantly.

Steps to reproduce

  1. Set the environment variable: OLLAMA_FLASH_ATTENTION=true
  2. Run Ollama using the Vulkan backend on an Intel Core Ultra hardware setup.
  3. Load a multimodal model (e.g., qwen3.5:9b or qwen3.6:35b).
  4. Input a high-resolution image in the prompt.
  5. The server immediately returns a 500 Internal Server Error.

Expected behavior

The model should successfully process the high-resolution image with Flash Attention enabled, similar to how it behaves on other GPU architectures.

Environment

  • OS: Windows 11 25H2
  • Hardware: Intel Core Ultra 7 358H + iGPU (B390)
  • Ollama Version: 0.19.0 ~ 0.23.1
  • Backend: Vulkan
  • Models Tested: qwen3.5:9b, qwen3.6:35b

Additional context & Troubleshooting

I have done some isolation testing to narrow down the issue:

  • Upstream Testing: I tested the exact same workflow using the latest version of llama.cpp (Vulkan build) directly. It works perfectly fine with Flash Attention enabled on this Intel hardware.
  • Alternative Hardware: I tested Ollama with OLLAMA_FLASH_ATTENTION=true on an AMD iGPU environment, and it works perfectly without any 500 errors.

Proposed Solution

Given that the latest upstream llama.cpp handles this correctly, this appears to be an Intel-specific Vulkan bug that has already been resolved upstream. Syncing/updating the ggml-vulkan backend in Ollama to the latest version should fix this issue.

Relevant log output

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.19.0 ~ 0.23.1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The model should successfully process the high-resolution image with Flash Attention enabled, similar to how it behaves on other GPU architectures.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix [Bug] 500 Error with OLLAMA_FLASH_ATTENTION=true on Intel iGPU (Vulkan) when processing high-res images [1 participants]