ollama - 💡(How to fix) Fix Apple M5 backend/runtime compatibility issue in Ollama [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15643Fetched 2026-04-18 05:52:06
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
renamed ×1subscribed ×1

On Apple M5, Ollama fails to run qwen2.5-coder:14b due to a Metal backend initialization crash. The server starts and /api/tags works, but /api/generate fails when loading the model.

Error Message

Returns server error:

  • {"error":"llama runner process has terminated: %!w(<nil>)"}
  • ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
  • ggml_metal_init: error: failed to initialize the Metal library

Root Cause

On Apple M5, Ollama fails to run qwen2.5-coder:14b due to a Metal backend initialization crash. The server starts and /api/tags works, but /api/generate fails when loading the model.

RAW_BUFFERClick to expand / collapse

Summary

On Apple M5, Ollama fails to run qwen2.5-coder:14b due to a Metal backend initialization crash. The server starts and /api/tags works, but /api/generate fails when loading the model.

Environment

  • Ollama: 0.20.7 (also reproduced with 0.21.0 binary from cached updater bundle)
  • OS: macOS 15.2 (Darwin 25.2.0)
  • Hardware: Apple M5
  • RAM: 32 GB
  • Model: qwen2.5-coder:14b

Repro Steps

  1. Ensure no existing instance: pkill -f "Ollama|ollama" || true
  2. Start server: OLLAMA_DEBUG=1 ollama serve > /tmp/ollama-m5.log 2>&1 &
  3. Confirm server readiness: curl -s http://127.0.0.1:11434/api/tags
  4. Trigger generation: curl -s http://127.0.0.1:11434/api/generate -d '{"model":"qwen2.5-coder:14b","prompt":"hello","stream":false}'

Expected

Model should load and return a normal response.

Actual

Returns server error:

  • {"error":"llama runner process has terminated: %!w(<nil>)"}

Key Log Evidence

  • ggml_metal_init: picking default device: Apple M5
  • ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected
  • ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3
  • static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types"
  • ggml_metal_init: error: failed to initialize the Metal library
  • llama_init_from_model: failed to initialize the context: failed to initialize Metal backend
  • panic: unable to create llama context
  • llama runner terminated (exit status 2)

Additional Notes

  • OLLAMA_LLM_LIBRARY=cpu was also tested, but the run path still attempts Metal initialization and fails with the same signature.
  • This appears to be a backend compatibility/regression issue on Apple M5 Metal path.

Control Tests

  • Same model works outside Ollama:
    • Using llama.cpp llama-server with the exact same GGUF blob from ~/.ollama/models/blobs/... serves requests successfully on /v1/chat/completions.
    • This indicates the model artifact itself is valid.
  • Different model also fails in Ollama:
    • tinyllama:latest was tested in Ollama and failed with the same llama runner process has terminated behavior.
    • This suggests the issue is runtime/backend-wide in Ollama on this machine, not specific to qwen2.5-coder:14b.

extent analysis

TL;DR

The most likely fix is to disable Metal backend initialization in Ollama by forcing it to use the CPU library.

Guidance

  • Verify that the issue is indeed related to the Metal backend by checking the log evidence, which indicates a failure to initialize the Metal library.
  • Try setting the OLLAMA_LLM_LIBRARY environment variable to cpu before starting the Ollama server to force it to use the CPU library instead of Metal.
  • If the issue persists, investigate the compatibility of the Metal backend with the Apple M5 hardware and macOS 15.2.
  • Test with a different model to confirm that the issue is not specific to the qwen2.5-coder:14b model.

Example

No code snippet is provided as the issue seems to be related to configuration and compatibility rather than code.

Notes

The issue appears to be a backend compatibility issue on Apple M5 Metal path, and the provided control tests suggest that the issue is not specific to the model being used.

Recommendation

Apply workaround: Set OLLAMA_LLM_LIBRARY to cpu to force Ollama to use the CPU library instead of Metal, as this has been shown to work in control tests with the llama.cpp server.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING