ollama - 💡(How to fix) Fix M1 Mac: z-image-turbo:fp8 VRAM regression 0.1.8.1→0.1.8.2 & 0.1.8.3 ... (11.9>11.3 GiB) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15110Fetched 2026-04-08 01:41:19
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×3commented ×2subscribed ×2

Error Message

  1. Error trotz kleiner Auflösung Error: 500 Internal Server Error: model requires 11.9 GiB but only 11.3 GiB are available (after 512.0 MiB overhead)
RAW_BUFFERClick to expand / collapse

Was ist das Problem?

x/z-image-turbo:fp8 lief stabil auf v0.1.8.1 (M1 Mac Mini 16GB). Nach Update: "model requires 11.9 GiB but only 11.3 GiB available (512 MiB overhead)" – auch bei 512x512.

Erwartetes Verhalten

Modell lädt wie vorher (flexible M1 unified memory).

Betriebssystem

macOS Sonoma/Ventura (M1 Mac Mini 16GB unified memory)

GPU

Apple M1 (16GB)

Schritte zur Reproduktion

  1. ollama run x/z-image-turbo:fp8 "test" --width 512 --height 512
  2. Error trotz kleiner Auflösung

Error: 500 Internal Server Error: model requires 11.9 GiB but only 11.3 GiB are available (after 512.0 MiB overhead)

Zusätzlich

  • v0.1.8.1: Funktioniert
  • Nach Update: Fail

extent analysis

Fix Plan

The fix involves increasing the available memory for the model by adjusting the memory allocation settings.

Steps to Fix

  • Increase the memory limit by setting the --memory-limit flag when running the ollama command. For example:
ollama run x/z-image-turbo:fp8 "test" --width 512 --height 512 --memory-limit 12GiB
  • Alternatively, you can also try reducing the model's memory requirements by using a smaller model or optimizing the model's architecture.
  • If using a Docker container, ensure that the container has sufficient memory allocated to it. You can do this by setting the --memory flag when running the Docker container. For example:
docker run -m 12GiB x/z-image-turbo:fp8 "test" --width 512 --height 512

Verification

To verify that the fix worked, run the ollama command with the increased memory limit and check that the model loads successfully without any memory-related errors.

Example Code

You can also modify the ollama command to automatically increase the memory limit when running the model. For example:

import subprocess

# Set the memory limit to 12GiB
memory_limit = "12GiB"

# Run the ollama command with the increased memory limit
subprocess.run([
    "ollama", "run", "x/z-image-turbo:fp8", "test", 
    "--width", "512", "--height", "512", 
    "--memory-limit", memory_limit
])

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING