ollama - 💡(How to fix) Fix MLX - nvfp4 models on MacOs extremely slow [1 participants]

ollama2026-05-06 23:07:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#16030•Fetched 2026-05-07 03:31:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

alemata2006

Participants

alemata2006

Timeline (top)

labeled ×1

Code Example

No visible errors, seems to work ok, but extremely slow.

time=2026-05-06T19:47:02.529-03:00 level=INFO source=client.go:359 msg="starting mlx runner subprocess" model=qwen3.5:27b-coding-nvfp4 port=58467
time=2026-05-06T19:47:02.532-03:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-05-06T19:47:02.587-03:00 level=INFO source=server.go:44 msg="MLX engine initialized" "MLX version"=0.31.2 device=gpu
time=2026-05-06T19:47:02.667-03:00 level=INFO source=base.go:110 msg="Model architecture" arch=Qwen3_5ForConditionalGeneration
time=2026-05-06T19:47:02.966-03:00 level=INFO source=runner.go:159 msg="Loaded tensors from manifest" count=1584
time=2026-05-06T19:47:09.859-03:00 level=INFO source=runner.go:194 msg="Starting HTTP server" host=127.0.0.1 port=58467
time=2026-05-06T19:47:09.980-03:00 level=INFO source=server.go:213 msg=ServeHTTP method=GET path=/v1/status took=10.853875ms status="200 OK"

RAW_BUFFERClick to expand / collapse

What is the issue?

When running models (like qwen3.6:27b-nvfp4, qwen3.5:27b-coding-nvfp4, qwen3.6:35b-a3b-nvfp4) until version 0.20.0 they worked fine. A simple prompt took around 2 minutes to produce a result (ollama run MODEL "give me a definition for strategy and for tactics. Provide at least 3 examples to differentiate the concepts."

Ollama: 0.20.1: All models took around 1 minute 50 seconds to 2 minutes 25 seconds to answer (results between 120 and 250 lines with content). Ollama 0.23.1: These models takes 45 to write a single line.

Rest of the models with the same test work with the basically same time and result as before (no notable changes), only MLX models seems affected.

Relevant log output

No visible errors, seems to work ok, but extremely slow.

time=2026-05-06T19:47:02.529-03:00 level=INFO source=client.go:359 msg="starting mlx runner subprocess" model=qwen3.5:27b-coding-nvfp4 port=58467
time=2026-05-06T19:47:02.532-03:00 level=INFO source=sched.go:561 msg="loaded runners" count=1
time=2026-05-06T19:47:02.587-03:00 level=INFO source=server.go:44 msg="MLX engine initialized" "MLX version"=0.31.2 device=gpu
time=2026-05-06T19:47:02.667-03:00 level=INFO source=base.go:110 msg="Model architecture" arch=Qwen3_5ForConditionalGeneration
time=2026-05-06T19:47:02.966-03:00 level=INFO source=runner.go:159 msg="Loaded tensors from manifest" count=1584
time=2026-05-06T19:47:09.859-03:00 level=INFO source=runner.go:194 msg="Starting HTTP server" host=127.0.0.1 port=58467
time=2026-05-06T19:47:09.980-03:00 level=INFO source=server.go:213 msg=ServeHTTP method=GET path=/v1/status took=10.853875ms status="200 OK"

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.23.1

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retrieval issue #search optimization #API routing #API middleware #SSR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix MLX - nvfp4 models on MacOs extremely slow [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix MLX - nvfp4 models on MacOs extremely slow [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

RELATED_DISCOVERY

TRENDING