ollama - 💡(How to fix) Fix Poor performance of the nvfp4 models on MacBook Pro M3 [1 participants]

ollama2026-05-13 11:56:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#16127•Fetched 2026-05-14 03:29:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

johnkea

Participants

johnkea

Timeline (top)

closed ×1labeled ×1

Root Cause

This feels like a bug - either in the article, the software, or somewhere deep in the universe where things are supposed to make sense but clearly don't. Because right now, NVFP4 on MLX makes zero practical sense, and that's both confusing and honestly pretty frustrating.

Code Example

% ollama run gemma4:31b-nvfp4
>>> /set verbose
>>> what can you do?
...
total duration:       2m45.425681083s
load duration:        85.741208ms
prompt eval count:    21 token(s)
prompt eval duration: 2.164352083s
prompt eval rate:     9.70 tokens/s
eval count:           1026 token(s)
eval duration:        2m43.175126458s
eval rate:            6.29 tokens/s


% ollama run gemma4:31b-it-q4_K_M
>>> /set verbose
>>> what can you do?
...
total duration:       2m57.964217083s
load duration:        184.854791ms
prompt eval count:    21 token(s)
prompt eval duration: 479.860375ms
prompt eval rate:     43.76 tokens/s
eval count:           1171 token(s)
eval duration:        2m56.923986858s
eval rate:            6.62 tokens/s

RAW_BUFFERClick to expand / collapse

What is the issue?

If I'm reading this article correctly, NVFP4 models are supposed to be the shiny new turbo boost for MLX on Apple Silicon - faster inference, happy developers, rainbows and unicorns. Great! Except… my benchmarks are telling a very different story. Not only are these models NOT faster, some of them are actually slower.

Please tell me I'm missing something obvious here. I would love nothing more than to be wrong about this. Thanks!

Relevant log output

% ollama run gemma4:31b-nvfp4
>>> /set verbose
>>> what can you do?
...
total duration:       2m45.425681083s
load duration:        85.741208ms
prompt eval count:    21 token(s)
prompt eval duration: 2.164352083s
prompt eval rate:     9.70 tokens/s
eval count:           1026 token(s)
eval duration:        2m43.175126458s
eval rate:            6.29 tokens/s


% ollama run gemma4:31b-it-q4_K_M
>>> /set verbose
>>> what can you do?
...
total duration:       2m57.964217083s
load duration:        184.854791ms
prompt eval count:    21 token(s)
prompt eval duration: 479.860375ms
prompt eval rate:     43.76 tokens/s
eval count:           1171 token(s)
eval duration:        2m56.923986858s
eval rate:            6.62 tokens/s

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.23.3

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#index setup #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Poor performance of the nvfp4 models on MacBook Pro M3 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Poor performance of the nvfp4 models on MacBook Pro M3 [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

Still need to ship something?

RELATED_DISCOVERY

TRENDING