ollama - 💡(How to fix) Fix Expose `max_soft_tokens` (image token budget) as a runtime parameter for Gemma 4 models [1 participants]

ollama2026-04-16 14:25:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15626•Fetched 2026-04-17 08:27:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

somthing3000

Participants

somthing3000

Timeline (top)

labeled ×1subscribed ×1

RAW_BUFFERClick to expand / collapse

Gemma 4's vision encoder supports a variable-resolution token budget via max_soft_tokens, but this value is currently hardcoded to 280 in model/models/gemma4/process_image.go (see L25–31). There is no way to override it at runtime through the API or via ollama-python library.

Google's own documentation for Gemma 4 explicitly recommends tuning this budget for OCR tasks that require higher image resolution: https://ai.google.dev/gemma/docs/capabilities/vision/image#variable_resolution_token_budget

The default budget of 280 tokens is insufficient for fine-grained visual tasks. In practice, this causes measurable accuracy regressions. A concrete example using license plate recognition with gemma4-e4b:

Configuration	Result	Expected
Ollama default (280 token budget)	`YRSGNB`	`YRSGNBY`
HuggingFace Transformers with `max_soft_tokens=560`	`YRSGNBY` ✓	`YRSGNBY`

The same degradation is reproducible with the unquantized model when the budget is left at the default, even affecting higher parameter model variants such as 27B. Please refer to this discussion https://github.com/ollama/ollama-python/issues/651.

extent analysis

TL;DR

Increase the max_soft_tokens value to improve accuracy in fine-grained visual tasks, such as OCR.

Guidance

Review the Google documentation for Gemma 4 to understand the recommended approach for tuning the variable-resolution token budget.
Consider overriding the hardcoded max_soft_tokens value of 280 in model/models/gemma4/process_image.go to a higher value, such as 560, to improve accuracy.
Test the impact of different max_soft_tokens values on your specific use case, such as license plate recognition, to determine the optimal value.
Refer to the discussion in https://github.com/ollama/ollama-python/issues/651 for more information on the issue and potential workarounds.

Example

No code snippet is provided as the issue is more related to configuration and tuning rather than code changes.

Notes

The optimal value for max_soft_tokens may vary depending on the specific use case and model variant. It is recommended to experiment with different values to find the best trade-off between accuracy and performance.

Recommendation

Apply workaround: Increase the max_soft_tokens value to a higher value, such as 560, to improve accuracy in fine-grained visual tasks. This is recommended as the default value of 280 is insufficient for certain tasks, and increasing the value has been shown to improve accuracy in the provided example.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory management #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix Expose `max_soft_tokens` (image token budget) as a runtime parameter for Gemma 4 models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix Expose `max_soft_tokens` (image token budget) as a runtime parameter for Gemma 4 models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING