ollama - 💡(How to fix) Fix [Bug] 500 Error with OLLAMA_FLASH_ATTENTION=true on Intel iGPU (Vulkan) when processing high-res images [1 participants]

ollama2026-05-06 11:37:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15993•Fetched 2026-05-07 03:32:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

crackerfly

Participants

crackerfly

Timeline (top)

labeled ×1

Error Message

When using the Vulkan backend on an Intel iGPU, setting OLLAMA_FLASH_ATTENTION=true causes a 500 Error when attempting to process high-resolution images with multimodal models. 5. The server immediately returns a 500 Internal Server Error.

RAW_BUFFERClick to expand / collapse

What is the issue?

Describe the bug

When using the Vulkan backend on an Intel iGPU, setting OLLAMA_FLASH_ATTENTION=true causes a 500 Error when attempting to process high-resolution images with multimodal models.

If I set OLLAMA_FLASH_ATTENTION=false, the models can process the images without crashing, but the generation quality and output accuracy drop significantly.

Steps to reproduce

Set the environment variable: OLLAMA_FLASH_ATTENTION=true
Run Ollama using the Vulkan backend on an Intel Core Ultra hardware setup.
Load a multimodal model (e.g., qwen3.5:9b or qwen3.6:35b).
Input a high-resolution image in the prompt.
The server immediately returns a 500 Internal Server Error.

Expected behavior

The model should successfully process the high-resolution image with Flash Attention enabled, similar to how it behaves on other GPU architectures.

Environment

OS: Windows 11 25H2
Hardware: Intel Core Ultra 7 358H + iGPU (B390)
Ollama Version: 0.19.0 ~ 0.23.1
Backend: Vulkan
Models Tested: qwen3.5:9b, qwen3.6:35b

Additional context & Troubleshooting

I have done some isolation testing to narrow down the issue:

Upstream Testing: I tested the exact same workflow using the latest version of llama.cpp (Vulkan build) directly. It works perfectly fine with Flash Attention enabled on this Intel hardware.
Alternative Hardware: I tested Ollama with OLLAMA_FLASH_ATTENTION=true on an AMD iGPU environment, and it works perfectly without any 500 errors.

Proposed Solution

Given that the latest upstream llama.cpp handles this correctly, this appears to be an Intel-specific Vulkan bug that has already been resolved upstream. Syncing/updating the ggml-vulkan backend in Ollama to the latest version should fix this issue.

Relevant log output

OS

Windows

GPU

Intel

CPU

Intel

Ollama version

0.19.0 ~ 0.23.1

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The model should successfully process the high-resolution image with Flash Attention enabled, similar to how it behaves on other GPU architectures.

#model save/load #optimization #mixed precision #training loop #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix [Bug] 500 Error with OLLAMA_FLASH_ATTENTION=true on Intel iGPU (Vulkan) when processing high-res images [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What is the issue?

Describe the bug

Steps to reproduce

Expected behavior

Environment

Additional context & Troubleshooting

Proposed Solution

Relevant log output

OS

GPU

CPU

Ollama version

FAQ

Expected behavior

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix [Bug] 500 Error with OLLAMA_FLASH_ATTENTION=true on Intel iGPU (Vulkan) when processing high-res images [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

What is the issue?

Describe the bug

Steps to reproduce

Expected behavior

Environment

Additional context & Troubleshooting

Proposed Solution

Relevant log output

OS

GPU

CPU

Ollama version

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING