ollama - 💡(How to fix) Fix Expose `max_soft_tokens` (image token budget) as a runtime parameter for Gemma 4 models [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15626Fetched 2026-04-17 08:27:08
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×1subscribed ×1
RAW_BUFFERClick to expand / collapse

Gemma 4's vision encoder supports a variable-resolution token budget via max_soft_tokens, but this value is currently hardcoded to 280 in model/models/gemma4/process_image.go (see L25–31). There is no way to override it at runtime through the API or via ollama-python library.

Google's own documentation for Gemma 4 explicitly recommends tuning this budget for OCR tasks that require higher image resolution: https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget

The default budget of 280 tokens is insufficient for fine-grained visual tasks. In practice, this causes measurable accuracy regressions. A concrete example using license plate recognition with gemma4-e4b:

ConfigurationResultExpected
Ollama default (280 token budget)YRSGNBYRSGNBY
HuggingFace Transformers with max_soft_tokens=560YRSGNBYYRSGNBY

The same degradation is reproducible with the unquantized model when the budget is left at the default, even affecting higher parameter model variants such as 27B. Please refer to this discussion https://github.com/ollama/ollama-python/issues/651.

extent analysis

TL;DR

Increase the max_soft_tokens value to improve accuracy in fine-grained visual tasks, such as OCR.

Guidance

  • Review the Google documentation for Gemma 4 to understand the recommended approach for tuning the variable-resolution token budget.
  • Consider overriding the hardcoded max_soft_tokens value of 280 in model/models/gemma4/process_image.go to a higher value, such as 560, to improve accuracy.
  • Test the impact of different max_soft_tokens values on your specific use case, such as license plate recognition, to determine the optimal value.
  • Refer to the discussion in https://github.com/ollama/ollama-python/issues/651 for more information on the issue and potential workarounds.

Example

No code snippet is provided as the issue is more related to configuration and tuning rather than code changes.

Notes

The optimal value for max_soft_tokens may vary depending on the specific use case and model variant. It is recommended to experiment with different values to find the best trade-off between accuracy and performance.

Recommendation

Apply workaround: Increase the max_soft_tokens value to a higher value, such as 560, to improve accuracy in fine-grained visual tasks. This is recommended as the default value of 280 is insufficient for certain tasks, and increasing the value has been shown to improve accuracy in the provided example.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix Expose `max_soft_tokens` (image token budget) as a runtime parameter for Gemma 4 models [1 participants]