ollama - ✅(Solved) Fix Misrepresented parameter_size for gemma4:26b MLX models [1 pull requests, 1 participants]

ollama2026-04-18 19:46:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15679•Fetched 2026-04-19 15:04:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Archiklein

Participants

Archiklein

Timeline (top)

cross-referenced ×1labeled ×1referenced ×1

Fix Action

Fixed

Fixed by PR: server: preserve thinking in /api/generate and populate parameter_size in /api/tags for safetensors (https://github.com/ollama/ollama/pull/15683)

PR fix notes

PR #15683: server: preserve thinking in /api/generate and populate parameter_size in /api/tags for safetensors

Repository: ollama/ollama
Author: serenposh
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/15683

Description (problem / solution / changelog)

Summary

Fixes two independent bugs surfaced on gemma4:26b-mxfp8 / gemma4:26b-nvfp4.

1. `/api/generate` silently drops thinking for models that think by default (#15681)

GenerateHandler initialized the builtin parser before the capability-gated default for req.Think was applied. Parsers that gate thinking output on the value passed to Init — notably Gemma4Parser, which has an explicit // When thinking is disabled, silently discard channel content branch — therefore saw thinkValue == nil and dropped the reasoning, even though the model was emitting it (visible via a large eval_count but short response).

Moving the capability check + default above the parser Init call so the parser sees the resolved req.Think value. This matches ChatHandler, which already does the two steps in the correct order — and is why /api/chat / /v1/chat/completions return reasoning correctly on the same model.

Callers that explicitly set think: false are unaffected — the default only kicks in when req.Think == nil.

2. `/api/tags` returns empty `parameter_size` for safetensors models (#15679)

ListHandler populated Details purely from the manifest's ConfigV2, whose ModelType / FileType are not written for safetensors models during create. /api/show already works around this by reading the safetensors headers via xserver.GetSafetensorsLLMInfo / GetSafetensorsDtype; mirror the same enrichment in ListHandler so the two endpoints stay consistent.

Note: the separate observation in #15679 that the reported count (e.g. 8.7B) is the active-parameter count for MoE variants rather than a "26B-A4B"-style total is a deeper metadata question — out of scope for this PR. This change at minimum stops /api/tags from returning an empty string and makes it match the existing /api/show value.

Verified locally

go vet ./server/ — clean
go build ./server/ — clean
go test ./server/ — all pass (2.5s)
go test ./model/parsers/ — all pass

Needs manual verification by reviewer

Neither author has a machine with gemma4:26b-mxfp8 / a safetensors model available, so these runtime checks weren't performed:

curl /api/generate -d '{"model":"gemma4:26b-mxfp8","prompt":"Moin","stream":false}' → response now includes populated thinking field.
Same request with "think": false → thinking empty (no regression for explicit opt-out).
curl /api/generate -d '{"model":"llama3.2","prompt":"hi"}' (non-thinking model) → unchanged behaviour.
curl /api/tags on a machine with a safetensors model → details.parameter_size is populated (matches /api/show).
curl /api/tags on a machine with GGUF-only models → unchanged (manifest config still used).

Changed files

server/routes.go (modified, +40/-16)

Code Example

curl http://localhost:11434/api/tags

{
    "models":[
        {
            "name": "gemma4:26b-mxfp8",
            "model": "gemma4:26b-mxfp8",
            "modified_at": "2026-04-18T20:14:14.536292905+02:00",
            "size": 26812605336,
            "digest": "3950c545841fdff310cc84e187ff0538e4a4962fff507e60caf2965cd3749a04",
            "details":{
                "parent_model": "",
                "format": "safetensors",
                "family": "",
                "families": null,
                "parameter_size": "",
                "quantization_level": "mxfp8"
            }
        }
    ]
}


curl http://localhost:11434/api/show -d '{
  "model": "gemma4:26b-mxfp8"  
}'

{
    "license": "Apache License ...",
    "parameters": "temperature 1\ntop_k 64\ntop_p 0.95",
    "template": "{{ .Prompt }}",
    "details": {
        "parent_model": "",
        "format": "safetensors",
        "family": "gemma4",
        "families": null,
        "parameter_size": "8.7B",
        "quantization_level": "mxfp8"
    },
    "model_info":{
        "gemma4.block_count":30,
        "gemma4.context_length":262144,
        "gemma4.embedding_length":2816,
        "general.architecture":"gemma4",
        "general.parameter_count":8677362766
    },
    "capabilities":[
        "completion",
        "tools",
        "thinking"
    ],
    "modified_at":"2026-04-18T20:14:14.536292905+02:00",
    "requires":"0.19.0"
}

RAW_BUFFERClick to expand / collapse

What is the issue?

Both gemma4:26b-mxfp8 and gemma4:26b-nvfp4 don't seem to be reporting their parameter size correctly. With /api/tags just an empty 'parameter_size' value is being returned, and with /api/show the 'parameter_size' is reported as "8.7B" and "6.3B" respectively.

Even with MLX / safetensors models, shouldn't the parameter size still be "26B" or "26B-A4B" or similar?

Relevant log output

curl http://localhost:11434/api/tags

{
    "models":[
        {
            "name": "gemma4:26b-mxfp8",
            "model": "gemma4:26b-mxfp8",
            "modified_at": "2026-04-18T20:14:14.536292905+02:00",
            "size": 26812605336,
            "digest": "3950c545841fdff310cc84e187ff0538e4a4962fff507e60caf2965cd3749a04",
            "details":{
                "parent_model": "",
                "format": "safetensors",
                "family": "",
                "families": null,
                "parameter_size": "",
                "quantization_level": "mxfp8"
            }
        }
    ]
}


curl http://localhost:11434/api/show -d '{
  "model": "gemma4:26b-mxfp8"  
}'

{
    "license": "Apache License ...",
    "parameters": "temperature 1\ntop_k 64\ntop_p 0.95",
    "template": "{{ .Prompt }}",
    "details": {
        "parent_model": "",
        "format": "safetensors",
        "family": "gemma4",
        "families": null,
        "parameter_size": "8.7B",
        "quantization_level": "mxfp8"
    },
    "model_info":{
        "gemma4.block_count":30,
        "gemma4.context_length":262144,
        "gemma4.embedding_length":2816,
        "general.architecture":"gemma4",
        "general.parameter_count":8677362766
    },
    "capabilities":[
        "completion",
        "tools",
        "thinking"
    ],
    "modified_at":"2026-04-18T20:14:14.536292905+02:00",
    "requires":"0.19.0"
}

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.21.0

extent analysis

TL;DR

The issue with incorrect parameter size reporting for models like gemma4:26b-mxfp8 and gemma4:26b-nvfp4 may be resolved by checking the calculation or source of the 'parameter_size' value in the API responses.

Guidance

Verify the calculation of 'parameter_size' in the API to ensure it correctly accounts for the model's parameters, considering the quantization level and format.
Check if the 'parameter_size' is being overwritten or incorrectly updated somewhere in the code, especially for MLX/safetensors models.
Compare the 'parameter_size' with the 'general.parameter_count' in the model_info to see if there's a discrepancy that could indicate where the issue lies.
Consider logging or debugging the specific values used to calculate 'parameter_size' to identify any potential errors or inconsistencies.

Notes

The provided log output suggests that the 'general.parameter_count' is correctly reported, which might imply that the issue is specific to how 'parameter_size' is calculated or displayed in the API responses.

Recommendation

Apply a workaround by manually calculating the 'parameter_size' based on the 'general.parameter_count' and quantization level, if possible, to ensure accurate reporting until the root cause is identified and fixed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix Misrepresented parameter_size for gemma4:26b MLX models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15683: server: preserve thinking in /api/generate and populate parameter_size in /api/tags for safetensors

Description (problem / solution / changelog)

Summary

1. `/api/generate` silently drops thinking for models that think by default (#15681)

2. `/api/tags` returns empty `parameter_size` for safetensors models (#15679)

Verified locally

Needs manual verification by reviewer

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix Misrepresented parameter_size for gemma4:26b MLX models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15683: server: preserve thinking in /api/generate and populate parameter_size in /api/tags for safetensors

Description (problem / solution / changelog)

Summary

1. /api/generate silently drops thinking for models that think by default (#15681)

2. /api/tags returns empty parameter_size for safetensors models (#15679)

Verified locally

Needs manual verification by reviewer

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `/api/generate` silently drops thinking for models that think by default (#15681)

2. `/api/tags` returns empty `parameter_size` for safetensors models (#15679)