ollama - 💡(How to fix) Fix qwen3.5:9b model runner crashes on Apple M4 16GB - "model runner has unexpectedly stopped" [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#14748Fetched 2026-04-08 00:32:10
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Author
Timeline (top)
labeled ×2closed ×1commented ×1subscribed ×1

The qwen3.5:9b model (and the custom modelfile qwen-vialetto-fix based on qwen2.5vl:7b) crashes immediately when loaded on Apple M4 16GB unified memory with error:

"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Error Message

crashes immediately when loaded on Apple M4 16GB unified memory with error:

or an internal error, check ollama server logs for details" {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"}

Root Cause

The qwen3.5:9b model (and the custom modelfile qwen-vialetto-fix based on qwen2.5vl:7b) crashes immediately when loaded on Apple M4 16GB unified memory with error:

"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Fix Action

Workaround

Using qwen2.5vl:7b instead as a vision model works correctly.

Code Example

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "qwen-vialetto-fix",
  "prompt": "ciao",
  "keep_alive": "24h"
}'

---

{"error":"model runner has unexpectedly stopped, this may be due to resource limitations
or an internal error, check ollama server logs for details"}

---

time=2026-03-09T22:49:55.794+01:00 level=INFO source=types.go:42
msg="inference compute" id=0 library=Metal name=Metal description="Apple M4"
total="11.8 GiB" available="11.8 GiB"
time=2026-03-09T22:49:55.794+01:00 level=INFO source=routes.go:1763
msg="vram-based default context" total_vram="11.8 GiB" default_num_ctx=4096

---
RAW_BUFFERClick to expand / collapse

What is the issue?

Description

The qwen3.5:9b model (and the custom modelfile qwen-vialetto-fix based on qwen2.5vl:7b) crashes immediately when loaded on Apple M4 16GB unified memory with error:

"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Environment

  • Hardware: Apple Mac mini M4
  • Unified Memory: 16 GB
  • OS: macOS (latest)
  • Ollama version: latest (started via Ollama.app)
  • GPU: Apple Metal, 11.8 GiB available VRAM (as reported by Ollama)

Steps to Reproduce

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "qwen-vialetto-fix",
  "prompt": "ciao",
  "keep_alive": "24h"
}'

Expected Behavior

Model loads and responds normally.

Actual Behavior

{"error":"model runner has unexpectedly stopped, this may be due to resource limitations
or an internal error, check ollama server logs for details"}

Server Log

time=2026-03-09T22:49:55.794+01:00 level=INFO source=types.go:42
msg="inference compute" id=0 library=Metal name=Metal description="Apple M4"
total="11.8 GiB" available="11.8 GiB"
time=2026-03-09T22:49:55.794+01:00 level=INFO source=routes.go:1763
msg="vram-based default context" total_vram="11.8 GiB" default_num_ctx=4096

No further log entries appear when the crash occurs — the runner stops silently.

Additional Notes

  • qwen2.5:7b (text only) loads and runs correctly on the same machine ✅
  • qwen2.5vl:7b (vision) also loads and runs correctly ✅
  • The crash happens specifically with qwen3.5:9b family models
  • The model size is 8.6 GB, which should fit within the available 11.8 GiB VRAM
  • No other models are loaded simultaneously when the crash occurs (ollama ps is empty)
  • The server log does not show any loading attempt before the crash, suggesting the runner exits before even starting to load weights

Workaround

Using qwen2.5vl:7b instead as a vision model works correctly.

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

Fix Plan

The fix involves increasing the memory allocation for the model runner to prevent it from crashing due to resource limitations.

  • Increase the VRAM_LIMIT environment variable to allocate more memory for the model runner.
  • Modify the ollama configuration to increase the default_num_ctx value to allow for more context switches.
  • Update the model loading code to handle large models by increasing the batch_size and reducing the sequence_length.

Example code snippet to increase VRAM_LIMIT:

import os
os.environ['VRAM_LIMIT'] = '12GiB'

Example configuration update:

ollama config set default_num_ctx 8192

Example model loading code update:

model_loader = ModelLoader(batch_size=16, sequence_length=128)

Verification

To verify that the fix worked, load the qwen3.5:9b model and check for any errors.

  • Run the curl command to load the model and generate text.
  • Check the server logs for any error messages.
  • Verify that the model responds normally and generates text correctly.

Example verification command:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "qwen-vialetto-fix",
  "prompt": "ciao",
  "keep_alive": "24h"
}'

Extra Tips

  • Monitor the server logs and adjust the VRAM_LIMIT and default_num_ctx values as needed to prevent resource limitations.
  • Consider updating the ollama version to the latest release to ensure that any known issues are fixed.
  • Test the model with different batch_size and sequence_length values to optimize performance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING