ollama - ✅(Solved) Fix Go sampler (ollamarunner) silently ignores repeat_penalty, frequency_penalty, and presence_penalty [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15783Fetched 2026-04-24 10:36:09
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
unsubscribed ×2cross-referenced ×1

Error Message

This affects all models using the ollamarunner path. It's most visible on audio transcription (Gemma 4 e4b) where the lack of repeat penalty causes severe repetition loops — 84-93% word error rate on longer utterances due to the model generating the same phrases in a loop.

Root Cause

Even with repeat_penalty: 5.0, the model produces identical output to repeat_penalty: 1.0 because the value is never read by the sampler.

Fix Action

Fixed

PR fix notes

PR #15784: sample: implement repeat, frequency, and presence penalties in Go sampler

Description (problem / solution / changelog)

Summary

The Go-native sampler (sample/samplers.go) used by models on the ollamarunner path accepts repeat_penalty, frequency_penalty, and presence_penalty via the API but silently ignores them. This PR implements the missing penalties, matching llama.cpp's sampling_repetition_penalties behavior.

Changes:

  • Add repeatPenalize() transform to sample/transforms.go — penalizes logits for recently generated tokens
  • Add penalty fields and token history ring buffer to the Sampler struct
  • Wire repeat_penalty, repeat_last_n, frequency_penalty, and presence_penalty from API options through to the sampler in runner/ollamarunner/runner.go
  • Add unit tests for the transform and integration tests for the full sampling pipeline

No API changes neededapi/types.go already defines these fields with defaults (repeat_penalty: 1.1, repeat_last_n: 64).

Impact

Most visible on audio transcription with Gemma 4 e4b, where the lack of repeat penalty caused severe repetition loops:

Sentence typeBefore (no penalty)After (penalty working)
Long paragraph (25s)84.7% WER30.6% WER
Technical jargon60.6% WER33.3% WER
Simple/short sentences0.0% WER0.0% WER

Test plan

  • go test ./sample/ — all existing + new tests pass
  • TestRepeatPenalize — verifies transform math (positive/negative logits, frequency, presence, combined, edge cases)
  • TestRepeatPenaltyIntegration — verifies penalty changes greedy token selection, history accumulates, ring buffer caps at repeatLastN
  • Benchmarked against a 7-sentence voice corpus with Gemma 4 e4b audio transcription
  • Verified no regression on models using the llamarunner path

Fixes #15783 Related: #9278

🤖 Generated with Claude Code

Changed files

  • runner/ollamarunner/runner.go (modified, +4/-0)
  • sample/samplers.go (modified, +40/-13)
  • sample/samplers_benchmark_test.go (modified, +4/-4)
  • sample/samplers_test.go (modified, +54/-6)
  • sample/transforms.go (modified, +32/-0)
  • sample/transforms_test.go (modified, +90/-0)

Code Example

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role": "user", "content": "Transcribe:", "images": ["<base64 audio>"]}],
  "options": {"repeat_penalty": 5.0}
}'
RAW_BUFFERClick to expand / collapse

What happened

The Go-native sampler used by models on the ollamarunner path (Gemma 4, and other newer models) accepts repeat_penalty, frequency_penalty, and presence_penalty via the API but silently ignores them. Only temperature, top_k, top_p, and min_p are implemented.

The llamarunner path (used by older models) delegates these to llama.cpp's C++ sampler where they work correctly.

How to reproduce

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role": "user", "content": "Transcribe:", "images": ["<base64 audio>"]}],
  "options": {"repeat_penalty": 5.0}
}'

Even with repeat_penalty: 5.0, the model produces identical output to repeat_penalty: 1.0 because the value is never read by the sampler.

Where the bug is

sample/samplers.go — the Sampler struct and NewSampler() only accept temperature, topK, topP, minP, seed, and grammar. The repeat/frequency/presence penalty fields defined in api/types.go (lines 593-597) with defaults (lines 1066-1069) are never passed through.

runner/ollamarunner/runner.go line ~890 — the NewSampler() call only passes 6 arguments, omitting the penalty options from req.Options.

Impact

This affects all models using the ollamarunner path. It's most visible on audio transcription (Gemma 4 e4b) where the lack of repeat penalty causes severe repetition loops — 84-93% word error rate on longer utterances due to the model generating the same phrases in a loop.

Benchmark results from a 7-sentence voice corpus (actual voice recordings):

Sentence typeWithout penalty (current)With penalty (fix)
Simple sentence0.0% WER0.0% WER
Long paragraph (25s)84.7% WER30.6% WER
Technical jargon60.6% WER33.3% WER
Paragraph (15s)2.5% WER0.0% WER

Proposed fix

  1. Add repeatPenalty, frequencyPenalty, presencePenalty, and repeatLastN fields to the Sampler struct
  2. Implement a repeatPenalize() transform in sample/transforms.go matching llama.cpp's algorithm
  3. Maintain a token history ring buffer (capped at repeatLastN)
  4. Apply the penalty before topK/temperature/softmax in the sampling pipeline
  5. Wire the options through in runner/ollamarunner/runner.go

I have a working implementation with tests ready to submit as a PR. ~224 lines across 6 files (including tests).

Related: #9278 (sampler interface TODO)

extent analysis

TL;DR

To fix the issue, update the Sampler struct and NewSampler() function to accept and implement repeat_penalty, frequency_penalty, and presence_penalty options.

Guidance

  • Review the proposed fix steps to ensure all necessary changes are included, such as adding penalty fields to the Sampler struct and implementing the repeatPenalize() transform.
  • Verify that the NewSampler() call in runner/ollamarunner/runner.go is updated to pass the penalty options from req.Options.
  • Test the changes using the provided benchmark results to ensure the word error rate is improved, especially for longer utterances.
  • Consider reviewing related issues, such as #9278, to ensure the sampler interface is properly updated.

Example

No code snippet is provided as the issue already includes a clear proposed fix and the implementation details are not explicitly stated.

Notes

The fix requires updating multiple files, including sample/samplers.go, sample/transforms.go, and runner/ollamarunner/runner.go, and adding tests to ensure the changes are correct.

Recommendation

Apply the proposed workaround by updating the Sampler struct and implementing the repeatPenalize() transform, as it has been tested and shown to improve the word error rate.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - ✅(Solved) Fix Go sampler (ollamarunner) silently ignores repeat_penalty, frequency_penalty, and presence_penalty [1 pull requests, 1 participants]