ollama - 💡(How to fix) Fix llava:7b / llava-llama3 crash on Windows: "Assertion failed: found, file llama-sampling.cpp, line 660"

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system.

Root Cause

All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system.

Code Example

image_tokens->nx = 576
image_tokens->ny = 1
batch_f32 size = 1
Assertion failed: found, file llama-sampling.cpp, line 660
RAW_BUFFERClick to expand / collapse

Description

All llava-based vision models (llava:7b, llava-phi3, llava-llama3) crash or produce garbage output when processing images on Windows with CUDA driver 13.0. moondream works fine on the same system.

Environment

  • OS: Windows 11 Pro 10.0.26200
  • GPU: NVIDIA GeForce RTX 2060 (6 GB VRAM)
  • CUDA Driver: 13.0 (Driver 581.57)
  • Ollama versions tested: 0.22.1 and 0.24.0 (same crash on both)

Steps to reproduce

  1. ollama pull llava:7b
  2. Send any image via /api/generate
  3. Model runner crashes

Server log output

image_tokens->nx = 576
image_tokens->ny = 1
batch_f32 size = 1
Assertion failed: found, file llama-sampling.cpp, line 660

Models tested

ModelResult
llava:7bAssertion crash (llama-sampling.cpp:660)
llava-phi3Runner unexpectedly stopped
llava-llama3No crash but garbage output instead of text
moondream✅ Works perfectly

Additional notes

  • Crash happens both with GPU AND with OLLAMA_NUM_GPU=0 (CPU-only) — not purely a CUDA/GPU issue
  • Text-only inference with llava:7b (no image) works fine
  • Image tokens are processed correctly (image_tokens->nx = 576) — crash happens during text generation after image embedding

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix llava:7b / llava-llama3 crash on Windows: "Assertion failed: found, file llama-sampling.cpp, line 660"