ollama - 💡(How to fix) Fix llava:7b / llava-llama3 crash on Windows: "Assertion failed: found, file llama-sampling.cpp, line 660"

StepCodex · 2026-05-27T14:01:31Z

[ollama] All llava-based vision models llava:7b, llava-phi3, llava-llama3 crash or produce garbage output when processing images on Windows with CUDA driver 13… All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system. ## Description All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system. ## Environment - OS: Windows 11 Pro 10.0.26200 - GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM) - CUDA Driver: 13.0 (Driver 581.57) - Ollama versions tested: 0.22.1 and 0.24.0 (same crash on both) ## Steps to reproduce 1. `ollama pull llava:7b` 2. Send any image via `/api/generate` 3. Model runner crashes ## Server log output ``` image_tokens->nx = 576 image_tokens->ny = 1 batch_f32 size = 1 Assertion failed: found, file llama-sampling.cpp, line 660 ``` ## Models tested | Model | Result | |---|---| | llava:7b | Assertion crash (llama-sampling.cpp:660) | | llava-phi3 | Runner unexpectedly stopped | | llava-llama3 | No crash but garbage output instead of text | | moondream | ✅ Works perfectly | ## Additional notes - Crash happens both with GPU AND with `OLLAMA_NUM_GPU=0` (CPU-only) — not purely a CUDA/GPU issue - Text-only inference with llava:7b (no image) works fine - Image tokens are processed correctly (`image_tokens->nx = 576`) — crash happens during text generation after image embedding

ollama2026-05-27 14:01:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system.

Root Cause

All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system.

Code Example

image_tokens->nx = 576
image_tokens->ny = 1
batch_f32 size = 1
Assertion failed: found, file llama-sampling.cpp, line 660

RAW_BUFFERClick to expand / collapse

Description

All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system.

Environment

OS: Windows 11 Pro 10.0.26200
GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM)
CUDA Driver: 13.0 (Driver 581.57)
Ollama versions tested: 0.22.1 and 0.24.0 (same crash on both)

Steps to reproduce

ollama pull llava:7b
Send any image via /api/generate
Model runner crashes

Server log output

image_tokens->nx = 576
image_tokens->ny = 1
batch_f32 size = 1
Assertion failed: found, file llama-sampling.cpp, line 660

Models tested

Model	Result
llava:7b	Assertion crash (llama-sampling.cpp:660)
llava-phi3	Runner unexpectedly stopped
llava-llama3	No crash but garbage output instead of text
moondream	✅ Works perfectly

Additional notes

Crash happens both with GPU AND with OLLAMA_NUM_GPU=0 (CPU-only) — not purely a CUDA/GPU issue
Text-only inference with llava:7b (no image) works fine
Image tokens are processed correctly (image_tokens->nx = 576) — crash happens during text generation after image embedding

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix llava:7b / llava-llama3 crash on Windows: "Assertion failed: found, file llama-sampling.cpp, line 660"

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Description

Environment

Steps to reproduce

Server log output

Models tested

Additional notes

Still need to ship something?

TRENDING