ollama - 💡(How to fix) Fix Metal backend crash on Apple M5 - bfloat/half type mismatch in MPPTensorOpsMatMul2d (v0.21.2) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15813Fetched 2026-04-26 05:06:04
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
1
Participants

Ollama fails to run any model on Apple M5 hardware. The Metal shader compilation fails with a bfloat/half type mismatch in MetalPerformancePrimitives.

Error Message

From ~/.ollama/logs/server.log:

ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected ggml_metal_init: will try to compile it on the fly ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" ggml_metal_init: error: failed to initialize the Metal library ggml_backend_metal_device_init: error: failed to allocate context llama_init_from_model: failed to initialize the context: failed to initialize Metal backend panic: unable to create llama context

The server reports the GPU as MTLGPUFamilyApple10 (1010) which is M5-specific. The embedded Metal library does not have a precompiled library for this GPU family, and on-the-fly compilation fails due to a bfloat16/half type mismatch in the MetalPerformancePrimitives framework.

Root Cause

Ollama fails to run any model on Apple M5 hardware. The Metal shader compilation fails with a bfloat/half type mismatch in MetalPerformancePrimitives.

Fix Action

Workaround

Setting GGML_METAL_BF16_DISABLE=1 before starting the server allows models to load and run (CPU fallback, no GPU acceleration):

GGML_METAL_BF16_DISABLE=1 ollama serve
RAW_BUFFERClick to expand / collapse

Description

Ollama fails to run any model on Apple M5 hardware. The Metal shader compilation fails with a bfloat/half type mismatch in MetalPerformancePrimitives.

Environment

  • Ollama version: 0.21.2
  • macOS: Sequoia 15.3+ (Apple M5)
  • GPU: Apple M5 (MTLGPUFamilyApple10)
  • VRAM: ~19GB available

Error

From ~/.ollama/logs/server.log:

ggml_metal_init: the device does not have a precompiled Metal library - this is unexpected ggml_metal_init: will try to compile it on the fly ggml_metal_library_init: error: Error Domain=MTLLibraryErrorDomain Code=3 static_assert failed due to requirement '__tensor_ops_detail::__is_same_v<bfloat, half>' "Input types must match cooperative tensor types" ggml_metal_init: error: failed to initialize the Metal library ggml_backend_metal_device_init: error: failed to allocate context llama_init_from_model: failed to initialize the context: failed to initialize Metal backend panic: unable to create llama context

The server reports the GPU as MTLGPUFamilyApple10 (1010) which is M5-specific. The embedded Metal library does not have a precompiled library for this GPU family, and on-the-fly compilation fails due to a bfloat16/half type mismatch in the MetalPerformancePrimitives framework.

Impact

All models fail with: 500 Internal Server Error: llama runner process has terminated OLLAMA_NO_METAL=1 is silently ignored in v0.21.2.

Workaround

Setting GGML_METAL_BF16_DISABLE=1 before starting the server allows models to load and run (CPU fallback, no GPU acceleration):

GGML_METAL_BF16_DISABLE=1 ollama serve

Expected Behavior

Ollama should either:

  1. Ship a precompiled Metal library for MTLGPUFamilyApple10 (M5), or
  2. Correctly fall back to CPU when Metal compilation fails, or
  3. Honor OLLAMA_NO_METAL=1 to explicitly disable Metal

extent analysis

TL;DR

Setting the environment variable GGML_METAL_BF16_DISABLE=1 before starting the Ollama server allows models to load and run, albeit with CPU fallback and no GPU acceleration.

Guidance

  • The error is caused by a bfloat16/half type mismatch in the MetalPerformancePrimitives framework, which prevents Metal shader compilation on Apple M5 hardware.
  • To verify the workaround, set GGML_METAL_BF16_DISABLE=1 and restart the Ollama server, then check if models can be loaded and run successfully.
  • The workaround can be applied by running the command GGML_METAL_BF16_DISABLE=1 ollama serve to start the server with CPU fallback.
  • Note that this workaround disables GPU acceleration, so performance may be impacted.

Example

GGML_METAL_BF16_DISABLE=1 ollama serve

Notes

The provided workaround is a temporary solution until a precompiled Metal library for MTLGPUFamilyApple10 (M5) is available or the Metal compilation fallback issue is resolved.

Recommendation

Apply the workaround by setting GGML_METAL_BF16_DISABLE=1 to enable CPU fallback and allow models to run, as this is the only available solution until the underlying issue is addressed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING