ollama - ✅(Solved) Fix Misrepresented parameter_size for gemma4:26b MLX models [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15679Fetched 2026-04-19 15:04:19
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
cross-referenced ×1labeled ×1referenced ×1

Fix Action

Fixed

PR fix notes

PR #15683: server: preserve thinking in /api/generate and populate parameter_size in /api/tags for safetensors

Description (problem / solution / changelog)

Summary

Fixes two independent bugs surfaced on gemma4:26b-mxfp8 / gemma4:26b-nvfp4.

1. /api/generate silently drops thinking for models that think by default (#15681)

GenerateHandler initialized the builtin parser before the capability-gated default for req.Think was applied. Parsers that gate thinking output on the value passed to Init — notably Gemma4Parser, which has an explicit // When thinking is disabled, silently discard channel content branch — therefore saw thinkValue == nil and dropped the reasoning, even though the model was emitting it (visible via a large eval_count but short response).

Moving the capability check + default above the parser Init call so the parser sees the resolved req.Think value. This matches ChatHandler, which already does the two steps in the correct order — and is why /api/chat / /v1/chat/completions return reasoning correctly on the same model.

Callers that explicitly set think: false are unaffected — the default only kicks in when req.Think == nil.

2. /api/tags returns empty parameter_size for safetensors models (#15679)

ListHandler populated Details purely from the manifest's ConfigV2, whose ModelType / FileType are not written for safetensors models during create. /api/show already works around this by reading the safetensors headers via xserver.GetSafetensorsLLMInfo / GetSafetensorsDtype; mirror the same enrichment in ListHandler so the two endpoints stay consistent.

Note: the separate observation in #15679 that the reported count (e.g. 8.7B) is the active-parameter count for MoE variants rather than a "26B-A4B"-style total is a deeper metadata question — out of scope for this PR. This change at minimum stops /api/tags from returning an empty string and makes it match the existing /api/show value.

Verified locally

  • go vet ./server/ — clean
  • go build ./server/ — clean
  • go test ./server/ — all pass (2.5s)
  • go test ./model/parsers/ — all pass

Needs manual verification by reviewer

Neither author has a machine with gemma4:26b-mxfp8 / a safetensors model available, so these runtime checks weren't performed:

  • curl /api/generate -d '{"model":"gemma4:26b-mxfp8","prompt":"Moin","stream":false}' → response now includes populated thinking field.
  • Same request with "think": falsethinking empty (no regression for explicit opt-out).
  • curl /api/generate -d '{"model":"llama3.2","prompt":"hi"}' (non-thinking model) → unchanged behaviour.
  • curl /api/tags on a machine with a safetensors model → details.parameter_size is populated (matches /api/show).
  • curl /api/tags on a machine with GGUF-only models → unchanged (manifest config still used).

Changed files

  • server/routes.go (modified, +40/-16)

Code Example

curl http://localhost:11434/api/tags

{
    "models":[
        {
            "name": "gemma4:26b-mxfp8",
            "model": "gemma4:26b-mxfp8",
            "modified_at": "2026-04-18T20:14:14.536292905+02:00",
            "size": 26812605336,
            "digest": "3950c545841fdff310cc84e187ff0538e4a4962fff507e60caf2965cd3749a04",
            "details":{
                "parent_model": "",
                "format": "safetensors",
                "family": "",
                "families": null,
                "parameter_size": "",
                "quantization_level": "mxfp8"
            }
        }
    ]
}


curl http://localhost:11434/api/show -d '{
  "model": "gemma4:26b-mxfp8"  
}'

{
    "license": "Apache License ...",
    "parameters": "temperature 1\ntop_k 64\ntop_p 0.95",
    "template": "{{ .Prompt }}",
    "details": {
        "parent_model": "",
        "format": "safetensors",
        "family": "gemma4",
        "families": null,
        "parameter_size": "8.7B",
        "quantization_level": "mxfp8"
    },
    "model_info":{
        "gemma4.block_count":30,
        "gemma4.context_length":262144,
        "gemma4.embedding_length":2816,
        "general.architecture":"gemma4",
        "general.parameter_count":8677362766
    },
    "capabilities":[
        "completion",
        "tools",
        "thinking"
    ],
    "modified_at":"2026-04-18T20:14:14.536292905+02:00",
    "requires":"0.19.0"
}
RAW_BUFFERClick to expand / collapse

What is the issue?

Both gemma4:26b-mxfp8 and gemma4:26b-nvfp4 don't seem to be reporting their parameter size correctly. With /api/tags just an empty 'parameter_size' value is being returned, and with /api/show the 'parameter_size' is reported as "8.7B" and "6.3B" respectively.

Even with MLX / safetensors models, shouldn't the parameter size still be "26B" or "26B-A4B" or similar?

Relevant log output

curl http://localhost:11434/api/tags

{
    "models":[
        {
            "name": "gemma4:26b-mxfp8",
            "model": "gemma4:26b-mxfp8",
            "modified_at": "2026-04-18T20:14:14.536292905+02:00",
            "size": 26812605336,
            "digest": "3950c545841fdff310cc84e187ff0538e4a4962fff507e60caf2965cd3749a04",
            "details":{
                "parent_model": "",
                "format": "safetensors",
                "family": "",
                "families": null,
                "parameter_size": "",
                "quantization_level": "mxfp8"
            }
        }
    ]
}


curl http://localhost:11434/api/show -d '{
  "model": "gemma4:26b-mxfp8"  
}'

{
    "license": "Apache License ...",
    "parameters": "temperature 1\ntop_k 64\ntop_p 0.95",
    "template": "{{ .Prompt }}",
    "details": {
        "parent_model": "",
        "format": "safetensors",
        "family": "gemma4",
        "families": null,
        "parameter_size": "8.7B",
        "quantization_level": "mxfp8"
    },
    "model_info":{
        "gemma4.block_count":30,
        "gemma4.context_length":262144,
        "gemma4.embedding_length":2816,
        "general.architecture":"gemma4",
        "general.parameter_count":8677362766
    },
    "capabilities":[
        "completion",
        "tools",
        "thinking"
    ],
    "modified_at":"2026-04-18T20:14:14.536292905+02:00",
    "requires":"0.19.0"
}

OS

macOS

GPU

Apple

CPU

Apple

Ollama version

0.21.0

extent analysis

TL;DR

The issue with incorrect parameter size reporting for models like gemma4:26b-mxfp8 and gemma4:26b-nvfp4 may be resolved by checking the calculation or source of the 'parameter_size' value in the API responses.

Guidance

  • Verify the calculation of 'parameter_size' in the API to ensure it correctly accounts for the model's parameters, considering the quantization level and format.
  • Check if the 'parameter_size' is being overwritten or incorrectly updated somewhere in the code, especially for MLX/safetensors models.
  • Compare the 'parameter_size' with the 'general.parameter_count' in the model_info to see if there's a discrepancy that could indicate where the issue lies.
  • Consider logging or debugging the specific values used to calculate 'parameter_size' to identify any potential errors or inconsistencies.

Notes

The provided log output suggests that the 'general.parameter_count' is correctly reported, which might imply that the issue is specific to how 'parameter_size' is calculated or displayed in the API responses.

Recommendation

Apply a workaround by manually calculating the 'parameter_size' based on the 'general.parameter_count' and quantization level, if possible, to ensure accurate reporting until the root cause is identified and fixed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - ✅(Solved) Fix Misrepresented parameter_size for gemma4:26b MLX models [1 pull requests, 1 participants]