ollama - 💡(How to fix) Fix qwen3.5:9b model runner crashes on Apple M4 16GB - "model runner has unexpectedly stopped" [1 comments, 2 participants]

ollama2026-03-09 22:47:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#14748•Fetched 2026-04-08 00:32:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

vpoma777

Participants

rick-github

vpoma777

Timeline (top)

labeled ×2closed ×1commented ×1subscribed ×1

The qwen3.5:9b model (and the custom modelfile qwen-vialetto-fix based on qwen2.5vl:7b) crashes immediately when loaded on Apple M4 16GB unified memory with error:

"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Error Message

crashes immediately when loaded on Apple M4 16GB unified memory with error:

or an internal error, check ollama server logs for details" {"error":"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"}

Root Cause

The qwen3.5:9b model (and the custom modelfile qwen-vialetto-fix based on qwen2.5vl:7b) crashes immediately when loaded on Apple M4 16GB unified memory with error:

"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Fix Action

Workaround

Using qwen2.5vl:7b instead as a vision model works correctly.

Code Example

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "qwen-vialetto-fix",
  "prompt": "ciao",
  "keep_alive": "24h"
}'

---

{"error":"model runner has unexpectedly stopped, this may be due to resource limitations
or an internal error, check ollama server logs for details"}

---

time=2026-03-09T22:49:55.794+01:00 level=INFO source=types.go:42
msg="inference compute" id=0 library=Metal name=Metal description="Apple M4"
total="11.8 GiB" available="11.8 GiB"
time=2026-03-09T22:49:55.794+01:00 level=INFO source=routes.go:1763
msg="vram-based default context" total_vram="11.8 GiB" default_num_ctx=4096

---

RAW_BUFFERClick to expand / collapse

What is the issue?

Description

The qwen3.5:9b model (and the custom modelfile qwen-vialetto-fix based on qwen2.5vl:7b) crashes immediately when loaded on Apple M4 16GB unified memory with error:

"model runner has unexpectedly stopped, this may be due to resource limitations or an internal error, check ollama server logs for details"

Environment

Hardware: Apple Mac mini M4
Unified Memory: 16 GB
OS: macOS (latest)
Ollama version: latest (started via Ollama.app)
GPU: Apple Metal, 11.8 GiB available VRAM (as reported by Ollama)

Steps to Reproduce

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "qwen-vialetto-fix",
  "prompt": "ciao",
  "keep_alive": "24h"
}'

Expected Behavior

Model loads and responds normally.

Actual Behavior

{"error":"model runner has unexpectedly stopped, this may be due to resource limitations
or an internal error, check ollama server logs for details"}

Server Log

time=2026-03-09T22:49:55.794+01:00 level=INFO source=types.go:42
msg="inference compute" id=0 library=Metal name=Metal description="Apple M4"
total="11.8 GiB" available="11.8 GiB"
time=2026-03-09T22:49:55.794+01:00 level=INFO source=routes.go:1763
msg="vram-based default context" total_vram="11.8 GiB" default_num_ctx=4096

No further log entries appear when the crash occurs — the runner stops silently.

Additional Notes

qwen2.5:7b (text only) loads and runs correctly on the same machine ✅
qwen2.5vl:7b (vision) also loads and runs correctly ✅
The crash happens specifically with qwen3.5:9b family models
The model size is 8.6 GB, which should fit within the available 11.8 GiB VRAM
No other models are loaded simultaneously when the crash occurs (ollama ps is empty)
The server log does not show any loading attempt before the crash, suggesting the runner exits before even starting to load weights

Workaround

Using qwen2.5vl:7b instead as a vision model works correctly.

Relevant log output

OS

No response

GPU

No response

CPU

No response

Ollama version

No response

extent analysis

Fix Plan

The fix involves increasing the memory allocation for the model runner to prevent it from crashing due to resource limitations.

Increase the VRAM_LIMIT environment variable to allocate more memory for the model runner.
Modify the ollama configuration to increase the default_num_ctx value to allow for more context switches.
Update the model loading code to handle large models by increasing the batch_size and reducing the sequence_length.

Example code snippet to increase VRAM_LIMIT:

import os
os.environ['VRAM_LIMIT'] = '12GiB'

Example configuration update:

ollama config set default_num_ctx 8192

Example model loading code update:

model_loader = ModelLoader(batch_size=16, sequence_length=128)

Verification

To verify that the fix worked, load the qwen3.5:9b model and check for any errors.

Run the curl command to load the model and generate text.
Check the server logs for any error messages.
Verify that the model responds normally and generates text correctly.

Example verification command:

curl -X POST http://localhost:11434/api/generate -d '{
  "model": "qwen-vialetto-fix",
  "prompt": "ciao",
  "keep_alive": "24h"
}'

Extra Tips

Monitor the server logs and adjust the VRAM_LIMIT and default_num_ctx values as needed to prevent resource limitations.
Consider updating the ollama version to the latest release to ensure that any known issues are fixed.
Test the model with different batch_size and sequence_length values to optimize performance.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix qwen3.5:9b model runner crashes on Apple M4 16GB - "model runner has unexpectedly stopped" [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

What is the issue?

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Server Log

Additional Notes

Workaround

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix qwen3.5:9b model runner crashes on Apple M4 16GB - "model runner has unexpectedly stopped" [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

What is the issue?

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Server Log

Additional Notes

Workaround

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING