ollama - 💡(How to fix) Fix Imported Qwen3-VL-8B GGUF + mmproj registers as vision-capable but crashes on first image request on Apple Silicon (exit status 2)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  • post predict error="Post \"http://127.0.0.1:<port>/completion\": EOF"
  • llama runner terminated" error="exit status 2" HTTP/1.1 500 Internal Server Error {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"} HTTP/1.1 500 Internal Server Error {"error":{"message":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details","type":"api_error","param":null,"code":null}} time=2026-05-22T10:36:54.042-04:00 level=ERROR source=server.go:1654 msg="post predict" error="Post "http://127.0.0.1:64595/completion\": EOF" time=2026-05-22T10:36:54.042-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2" time=2026-05-22T10:37:13.646-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2" time=2026-05-22T10:37:13.646-04:00 level=ERROR source=server.go:1654 msg="post predict" error="Post "http://127.0.0.1:64608/completion\": EOF"

Code Example

FROM /path/to/Qwen_Qwen3-VL-8B-Instruct-Q8_0.gguf
FROM /path/to/mmproj-Qwen_Qwen3-VL-8B-Instruct-f16.gguf

PARAMETER num_ctx 32768
PARAMETER num_gpu 99
PARAMETER temperature 0.7
PARAMETER top_p 0.9

---

ollama create qwen3-vl-8b-instruct -f Modelfile

---

ollama show qwen3-vl-8b-instruct:latest

---

Model
  architecture        qwen3vl
  parameters          8.2B
  context length      262144
  embedding length    4096
  quantization        Q8_0

Capabilities
  completion
  vision

Projector
  architecture        clip
  parameters          576.39M
  embedding length    1152
  dimensions          4096

---

env OLLAMA_HOST=127.0.0.1:11434 /Applications/Ollama.app/Contents/Resources/ollama serve

---

curl http://127.0.0.1:11434/v1/models

---

IMG=$(base64 < test.png | tr -d '\n')
curl -sS -D - http://127.0.0.1:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d "{\"model\":\"qwen3-vl-8b-instruct:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"Describe this image briefly.\",\"images\":[\"$IMG\"]}],\"stream\":false}"

---

HTTP/1.1 500 Internal Server Error
{"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"}

---

IMG=$(base64 < test.png | tr -d '\n')
curl -sS -D - http://127.0.0.1:11434/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d "{\"model\":\"qwen3-vl-8b-instruct:latest\",\"messages\":[{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe this image briefly.\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"data:image/png;base64,$IMG\"}}]}],\"stream\":false}"

---

HTTP/1.1 500 Internal Server Error
{"error":{"message":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details","type":"api_error","param":null,"code":null}}

---

time=2026-05-22T10:36:54.042-04:00 level=ERROR source=server.go:1654 msg="post predict" error="Post \"http://127.0.0.1:64595/completion\": EOF"
[GIN] 2026/05/22 - 10:36:54 | 500 |  4.553328875s | 127.0.0.1 | POST "/api/chat"
time=2026-05-22T10:36:54.042-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"

---

time=2026-05-22T10:37:13.646-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
time=2026-05-22T10:37:13.646-04:00 level=ERROR source=server.go:1654 msg="post predict" error="Post \"http://127.0.0.1:64608/completion\": EOF"
[GIN] 2026/05/22 - 10:37:13 | 500 | 4.599645s | 127.0.0.1 | POST "/v1/chat/completions"
RAW_BUFFERClick to expand / collapse

What is the issue?

On Apple Silicon / Metal, an imported Qwen3-VL-8B-Instruct GGUF + mmproj model registers successfully, is reported as vision-capable by ollama show, but the runner crashes on the first real image request.

I reproduced the crash through both API surfaces:

  • POST /api/chat
  • POST /v1/chat/completions

Both return HTTP 500, and the server logs show:

  • post predict error="Post \"http://127.0.0.1:<port>/completion\": EOF"
  • llama runner terminated" error="exit status 2"

This does not look like a bad import or a Merlin integration bug:

  • ollama create succeeds
  • /v1/models lists the model
  • ollama show qwen3-vl-8b-instruct:latest reports:
    • architecture qwen3vl
    • capability vision
    • projector architecture clip
  • the same Ollama instance serves a separate text-only qwen3-coder-30b-a3b-instruct:latest model successfully

Environment

  • Ollama client/server version: 0.24.0
  • OS: macOS 26.5 (25F71)
  • Hardware: Apple Silicon M4 Max
  • Runtime: Metal

Model import

Modelfile used for import:

FROM /path/to/Qwen_Qwen3-VL-8B-Instruct-Q8_0.gguf
FROM /path/to/mmproj-Qwen_Qwen3-VL-8B-Instruct-f16.gguf

PARAMETER num_ctx 32768
PARAMETER num_gpu 99
PARAMETER temperature 0.7
PARAMETER top_p 0.9

Create command:

ollama create qwen3-vl-8b-instruct -f Modelfile

After import:

ollama show qwen3-vl-8b-instruct:latest

reported:

Model
  architecture        qwen3vl
  parameters          8.2B
  context length      262144
  embedding length    4096
  quantization        Q8_0

Capabilities
  completion
  vision

Projector
  architecture        clip
  parameters          576.39M
  embedding length    1152
  dimensions          4096

Reproduction

  1. Start Ollama:
env OLLAMA_HOST=127.0.0.1:11434 /Applications/Ollama.app/Contents/Resources/ollama serve
  1. Confirm model is listed:
curl http://127.0.0.1:11434/v1/models
  1. Send a native vision request:
IMG=$(base64 < test.png | tr -d '\n')
curl -sS -D - http://127.0.0.1:11434/api/chat \
  -H 'Content-Type: application/json' \
  -d "{\"model\":\"qwen3-vl-8b-instruct:latest\",\"messages\":[{\"role\":\"user\",\"content\":\"Describe this image briefly.\",\"images\":[\"$IMG\"]}],\"stream\":false}"

Observed response:

HTTP/1.1 500 Internal Server Error
{"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"}
  1. Send the OpenAI-compatible vision request:
IMG=$(base64 < test.png | tr -d '\n')
curl -sS -D - http://127.0.0.1:11434/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d "{\"model\":\"qwen3-vl-8b-instruct:latest\",\"messages\":[{\"role\":\"user\",\"content\":[{\"type\":\"text\",\"text\":\"Describe this image briefly.\"},{\"type\":\"image_url\",\"image_url\":{\"url\":\"data:image/png;base64,$IMG\"}}]}],\"stream\":false}"

Observed response:

HTTP/1.1 500 Internal Server Error
{"error":{"message":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details","type":"api_error","param":null,"code":null}}

Relevant log output

Server log excerpts from the two failing requests:

time=2026-05-22T10:36:54.042-04:00 level=ERROR source=server.go:1654 msg="post predict" error="Post \"http://127.0.0.1:64595/completion\": EOF"
[GIN] 2026/05/22 - 10:36:54 | 500 |  4.553328875s | 127.0.0.1 | POST "/api/chat"
time=2026-05-22T10:36:54.042-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
time=2026-05-22T10:37:13.646-04:00 level=ERROR source=server.go:316 msg="llama runner terminated" error="exit status 2"
time=2026-05-22T10:37:13.646-04:00 level=ERROR source=server.go:1654 msg="post predict" error="Post \"http://127.0.0.1:64608/completion\": EOF"
[GIN] 2026/05/22 - 10:37:13 | 500 | 4.599645s | 127.0.0.1 | POST "/v1/chat/completions"

The runner also emitted a fatal native crash dump immediately before those lines. I can add the full dump if useful, but the key observable is that both API surfaces trigger the same runner termination as soon as an image is actually processed.

Expected behavior

If the imported model is accepted, listed, and advertised as vision, then real image requests should execute successfully.

If this import shape is not actually supported, ollama create or model load should fail earlier and clearly instead of advertising a working vision model and then crashing on first use.

Related issues

This looks related to existing Qwen3-VL crash reports, but I did not find an exact Apple Silicon / Metal report for:

  • imported Qwen3-VL-8B-Instruct GGUF + mmproj
  • successful registration + vision capability detection
  • crash on first real image request through both /api/chat and /v1/chat/completions

Possibly related:

  • #13150
  • #13113
  • #15898

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If the imported model is accepted, listed, and advertised as vision, then real image requests should execute successfully.

If this import shape is not actually supported, ollama create or model load should fail earlier and clearly instead of advertising a working vision model and then crashing on first use.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Imported Qwen3-VL-8B GGUF + mmproj registers as vision-capable but crashes on first image request on Apple Silicon (exit status 2)