ollama - ✅(Solved) Fix Go sampler (ollamarunner) silently ignores repeat_penalty, frequency_penalty, and presence_penalty [1 pull requests, 1 participants]

42euge · 2026-04-24T04:20:10Z

[ollama] PR 15784: sample: implement repeat, frequency, and presence penalties in Go sampler - Repository: ollama/ollama - Author: 42euge - State: open | merge… # PR #15784: sample: implement repeat, frequency, and presence penalties in Go sampler - Repository: ollama/ollama - Author: 42euge - State: open | merged: False - Link: https://github.com/ollama/ollama/pull/15784 ## Description (problem / solution / changelog) ## Summary The Go-native sampler (`sample/samplers.go`) used by models on the `ollamarunner` path accepts `repeat_penalty`, `frequency_penalty`, and `presence_penalty` via the API but silently ignores them. This PR implements the missing penalties, matching llama.cpp's `sampling_repetition_penalties` behavior. **Changes:** - Add `repeatPenalize()` transform to `sample/transforms.go` — penalizes logits for recently generated tokens - Add penalty fields and token history ring buffer to the `Sampler` struct - Wire `repeat_penalty`, `repeat_last_n`, `frequency_penalty`, and `presence_penalty` from API options through to the sampler in `runner/ollamarunner/runner.go` - Add unit tests for the transform and integration tests for the full sampling pipeline **No API changes needed** — `api/types.go` already defines these fields with defaults (`repeat_penalty: 1.1`, `repeat_last_n: 64`). ## Impact Most visible on audio transcription with Gemma 4 e4b, where the lack of repeat penalty caused severe repetition loops: | Sentence type | Before (no penalty) | After (penalty working) | |--------------|---------------------|------------------------| | Long paragraph (25s) | 84.7% WER | 30.6% WER | | Technical jargon | 60.6% WER | 33.3% WER | | Simple/short sentences | 0.0% WER | 0.0% WER | ## Test plan - [x] `go test ./sample/` — all existing + new tests pass - [x] `TestRepeatPenalize` — verifies transform math (positive/negative logits, frequency, presence, combined, edge cases) - [x] `TestRepeatPenaltyIntegration` — verifies penalty changes greedy token selection, history accumulates, ring buffer caps at `repeatLastN` - [x] Benchmarked against a 7-sentence voice corpus with Gemma 4 e4b audio transcription - [x] Verified no regression on models using the `llamarunner` path Fixes #15783 Related: #9278 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Changed files - `runner/ollamarunner/runner.go` (modified, +4/-0) - `sample/samplers.go` (modified, +40/-13) - `sample/samplers_benchmark_test.go` (modified, +4/-4) - `sample/samplers_test.go` (modified, +54/-6) - `sample/transforms.go` (modified, +32/-0) - `sample/transforms_test.go` (modified, +90/-0) ## Fixed - Fixed by PR: sample: implement repeat, frequency, and presence penalties in Go sampler (https://github.com/ollama/ollama/pull/15784) ## What happened The Go-native sampler used by models on the `ollamarunner` path (Gemma 4, and other newer models) accepts `repeat_penalty`, `frequency_penalty`, and `presence_penalty` via the API but silently ignores them. Only `temperature`, `top_k`, `top_p`, and `min_p` are implemented. The `llamarunner` path (used by older models) delegates these to llama.cpp's C++ sampler where they work correctly. ## How to reproduce ```bash curl http://localhost:11434/api/chat -d '{ "model": "gemma4:e4b", "messages": [{"role": "user", "content": "Transcribe:", "images": [" "]}], "options": {"repeat_penalty": 5.0} }' ``` Even with `repeat_penalty: 5.0`, the model produces identical output to `repeat_penalty: 1.0` because the value is never read by the sampler. ## Where the bug is `sample/samplers.go` — the `Sampler` struct and `NewSampler()` only accept `temperature`, `topK`, `topP`, `minP`, `seed`, and `grammar`. The repeat/frequency/presence penalty fields defined in `api/types.go` (lines 593-597) with defaults (lines 1066-1069) are never passed through. `runner/ollamarunner/runner.go` line ~890 — the `NewSampler()` call only passes 6 arguments, omitting the penalty options from `req.Options`. ## Impact This affects all models using the `ollamarunner` path. It's most visible on audio transcription (Gemma 4 e4b) where the lack of repeat penalty causes severe repetition loops — 84-93% word error rate on longer utterances due to the model generating the same phrases in a loop. Benchmark results from a 7-sentence voice corpus (actual voice recordings): | Sentence type | Without penalty (current) | With penalty (fix) | |--------------|--------------------------|-------------------| | Simple sentence | 0.0% WER | 0.0% WER | | Long paragraph (25s) | 84.7% WER | 30.6% WER | | Technical jargon | 60.6% WER | 33.3% WER | | Paragraph (15s) | 2.5% WER | 0.0% WER | ## Proposed fix 1. Add `repeatPenalty`, `frequencyPenalty`, `presencePenalty`, and `repeatLastN` fields to the `Sampler` struct 2. Implement a `repeatPenalize()` transform in `sample/transforms.go` matching llama.cpp's algorithm 3. Maintain a token history ring buffer (capped at `repeatLastN`) 4. Apply the penalty before topK/temperature/softmax in the sam

ollama2026-04-24 04:20:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15783•Fetched 2026-04-24 10:36:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

42euge

Participants

42euge

Timeline (top)

unsubscribed ×2cross-referenced ×1

Error Message

This affects all models using the ollamarunner path. It's most visible on audio transcription (Gemma 4 e4b) where the lack of repeat penalty causes severe repetition loops — 84-93% word error rate on longer utterances due to the model generating the same phrases in a loop.

Root Cause

Even with repeat_penalty: 5.0, the model produces identical output to repeat_penalty: 1.0 because the value is never read by the sampler.

Fix Action

Fixed

Fixed by PR: sample: implement repeat, frequency, and presence penalties in Go sampler (https://github.com/ollama/ollama/pull/15784)

PR fix notes

PR #15784: sample: implement repeat, frequency, and presence penalties in Go sampler

Repository: ollama/ollama
Author: 42euge
State: open | merged: False
Link: https://github.com/ollama/ollama/pull/15784

Description (problem / solution / changelog)

Summary

The Go-native sampler (sample/samplers.go) used by models on the ollamarunner path accepts repeat_penalty, frequency_penalty, and presence_penalty via the API but silently ignores them. This PR implements the missing penalties, matching llama.cpp's sampling_repetition_penalties behavior.

Changes:

Add repeatPenalize() transform to sample/transforms.go — penalizes logits for recently generated tokens
Add penalty fields and token history ring buffer to the Sampler struct
Wire repeat_penalty, repeat_last_n, frequency_penalty, and presence_penalty from API options through to the sampler in runner/ollamarunner/runner.go
Add unit tests for the transform and integration tests for the full sampling pipeline

No API changes needed — api/types.go already defines these fields with defaults (repeat_penalty: 1.1, repeat_last_n: 64).

Impact

Most visible on audio transcription with Gemma 4 e4b, where the lack of repeat penalty caused severe repetition loops:

Sentence type	Before (no penalty)	After (penalty working)
Long paragraph (25s)	84.7% WER	30.6% WER
Technical jargon	60.6% WER	33.3% WER
Simple/short sentences	0.0% WER	0.0% WER

Test plan

go test ./sample/ — all existing + new tests pass
TestRepeatPenalize — verifies transform math (positive/negative logits, frequency, presence, combined, edge cases)
TestRepeatPenaltyIntegration — verifies penalty changes greedy token selection, history accumulates, ring buffer caps at repeatLastN
Benchmarked against a 7-sentence voice corpus with Gemma 4 e4b audio transcription
Verified no regression on models using the llamarunner path

Fixes #15783 Related: #9278

🤖 Generated with Claude Code

Changed files

runner/ollamarunner/runner.go (modified, +4/-0)
sample/samplers.go (modified, +40/-13)
sample/samplers_benchmark_test.go (modified, +4/-4)
sample/samplers_test.go (modified, +54/-6)
sample/transforms.go (modified, +32/-0)
sample/transforms_test.go (modified, +90/-0)

Code Example

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role": "user", "content": "Transcribe:", "images": ["<base64 audio>"]}],
  "options": {"repeat_penalty": 5.0}
}'

RAW_BUFFERClick to expand / collapse

What happened

The Go-native sampler used by models on the ollamarunner path (Gemma 4, and other newer models) accepts repeat_penalty, frequency_penalty, and presence_penalty via the API but silently ignores them. Only temperature, top_k, top_p, and min_p are implemented.

The llamarunner path (used by older models) delegates these to llama.cpp's C++ sampler where they work correctly.

How to reproduce

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role": "user", "content": "Transcribe:", "images": ["<base64 audio>"]}],
  "options": {"repeat_penalty": 5.0}
}'

Even with repeat_penalty: 5.0, the model produces identical output to repeat_penalty: 1.0 because the value is never read by the sampler.

Where the bug is

sample/samplers.go — the Sampler struct and NewSampler() only accept temperature, topK, topP, minP, seed, and grammar. The repeat/frequency/presence penalty fields defined in api/types.go (lines 593-597) with defaults (lines 1066-1069) are never passed through.

runner/ollamarunner/runner.go line ~890 — the NewSampler() call only passes 6 arguments, omitting the penalty options from req.Options.

Impact

Benchmark results from a 7-sentence voice corpus (actual voice recordings):

Sentence type	Without penalty (current)	With penalty (fix)
Simple sentence	0.0% WER	0.0% WER
Long paragraph (25s)	84.7% WER	30.6% WER
Technical jargon	60.6% WER	33.3% WER
Paragraph (15s)	2.5% WER	0.0% WER

Proposed fix

Add repeatPenalty, frequencyPenalty, presencePenalty, and repeatLastN fields to the Sampler struct
Implement a repeatPenalize() transform in sample/transforms.go matching llama.cpp's algorithm
Maintain a token history ring buffer (capped at repeatLastN)
Apply the penalty before topK/temperature/softmax in the sampling pipeline
Wire the options through in runner/ollamarunner/runner.go

I have a working implementation with tests ready to submit as a PR. ~224 lines across 6 files (including tests).

Related: #9278 (sampler interface TODO)

extent analysis

TL;DR

To fix the issue, update the Sampler struct and NewSampler() function to accept and implement repeat_penalty, frequency_penalty, and presence_penalty options.

Guidance

Review the proposed fix steps to ensure all necessary changes are included, such as adding penalty fields to the Sampler struct and implementing the repeatPenalize() transform.
Verify that the NewSampler() call in runner/ollamarunner/runner.go is updated to pass the penalty options from req.Options.
Test the changes using the provided benchmark results to ensure the word error rate is improved, especially for longer utterances.
Consider reviewing related issues, such as #9278, to ensure the sampler interface is properly updated.

Example

No code snippet is provided as the issue already includes a clear proposed fix and the implementation details are not explicitly stated.

Notes

The fix requires updating multiple files, including sample/samplers.go, sample/transforms.go, and runner/ollamarunner/runner.go, and adding tests to ensure the changes are correct.

Recommendation

Apply the proposed workaround by updating the Sampler struct and implementing the repeatPenalize() transform, as it has been tested and shown to improve the word error rate.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix Go sampler (ollamarunner) silently ignores repeat_penalty, frequency_penalty, and presence_penalty [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #15784: sample: implement repeat, frequency, and presence penalties in Go sampler

Description (problem / solution / changelog)

Summary

Impact

Test plan

Changed files

Code Example

What happened

How to reproduce

Where the bug is

Impact

Proposed fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix Go sampler (ollamarunner) silently ignores repeat_penalty, frequency_penalty, and presence_penalty [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #15784: sample: implement repeat, frequency, and presence penalties in Go sampler

Description (problem / solution / changelog)

Summary

Impact

Test plan

Changed files

Code Example

What happened

How to reproduce

Where the bug is

Impact

Proposed fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING