ollama - ✅(Solved) Fix gemma4:31b doesn't output some Polish characters [1 pull requests, 1 comments, 1 participants]

ollama2026-04-02 18:32:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

ollama/ollama#15231•Fetched 2026-04-08 02:34:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

szmarczak

Participants

szmarczak

Assignees

dhiltgen

Timeline (top)

cross-referenced ×4referenced ×2subscribed ×2assigned ×1

Fix Action

Fixed

Fixed by PR: tokenizer: add byte fallback for SentencePiece BPE encoding (https://github.com/ollama/ollama/pull/15232)
Closed with commit: de9673ac3fb1c57fbf6e5e194f1f3dc5a8b48668

PR fix notes

PR #15232: tokenizer: add byte fallback for SentencePiece BPE encoding

Repository: ollama/ollama
Author: dhiltgen
State: closed | merged: True
Link: https://github.com/ollama/ollama/pull/15232

Description (problem / solution / changelog)

When BPE merging produces tokens not in the vocabulary, fall back to encoding each UTF-8 byte as <0xHH> byte tokens instead of silently dropping the character. Also teach Decode to convert <0xHH> tokens back to raw bytes.

Fixes #15229, fixes #15231

Changed files

model/models/gemma4/tokenizer_reference_test.go (added, +341/-0)
tokenizer/bytepairencoding.go (modified, +55/-14)
tokenizer/bytepairencoding_test.go (modified, +117/-35)

Code Example

>>> Say "ąęśćżźółń" nothing more. Do not think.
Thinking...
The user wants me to say the string "śżź" and nothing else.
They explicitly said "nothing more" and "Do not think."

Constraint Checklist & Confidence Score:
1. Say "śżź"? Yes.
2. Nothing more? Yes.
3. Do not think? (This is a stylistic instruction to the AI, usually meant to bypass internal reasoning/chain of thought, though the system still processes 
it).

Confidence Score: 5/5

Mental Sandbox:
- If I output "śżź", I follow all instructions.
- If I output "Here is the text: śżź", I violate "nothing more".
- If I include a preamble, I violate "nothing more".

Conclusion: Output exactly "śżź".
...done thinking.

śżź

---

RAW_BUFFERClick to expand / collapse

What is the issue?

>>> Say "ąęśćżźółń" nothing more. Do not think.
Thinking...
The user wants me to say the string "śżź" and nothing else.
They explicitly said "nothing more" and "Do not think."

Constraint Checklist & Confidence Score:
1. Say "śżź"? Yes.
2. Nothing more? Yes.
3. Do not think? (This is a stylistic instruction to the AI, usually meant to bypass internal reasoning/chain of thought, though the system still processes 
it).

Confidence Score: 5/5

Mental Sandbox:
- If I output "śżź", I follow all instructions.
- If I output "Here is the text: śżź", I violate "nothing more".
- If I include a preamble, I violate "nothing more".

Conclusion: Output exactly "śżź".
...done thinking.

śżź

Relevant log output

OS

Linux

GPU

Nvidia

CPU

AMD

Ollama version

0.20.0-rc0

extent analysis

TL;DR

The issue seems to be related to the AI's understanding and response to specific instructions, and the most likely fix is to ensure the AI model is correctly interpreting the "nothing more" and "Do not think" constraints.

Guidance

Review the AI model's constraint handling mechanism to ensure it correctly interprets stylistic instructions like "Do not think" and literal instructions like "nothing more".
Verify that the model's output matches the expected response "śżź" without any additional text or processing.
Check the model's confidence score calculation to ensure it accurately reflects the model's adherence to the given constraints.
Consider testing the model with similar constraint-based inputs to identify any patterns or inconsistencies in its response.

Notes

The provided issue lacks specific technical details about the AI model's architecture or training data, making it challenging to provide a more targeted solution. The guidance provided is based on the information given and may need to be adapted to the specific implementation details of the AI model.

Recommendation

Apply workaround: The current implementation seems to be working as intended based on the provided constraints, so no immediate action is required, but ongoing monitoring and testing are recommended to ensure the model continues to perform as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#model compatibility #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - ✅(Solved) Fix gemma4:31b doesn't output some Polish characters [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15232: tokenizer: add byte fallback for SentencePiece BPE encoding

Description (problem / solution / changelog)

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

ollama - ✅(Solved) Fix gemma4:31b doesn't output some Polish characters [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15232: tokenizer: add byte fallback for SentencePiece BPE encoding

Description (problem / solution / changelog)

Changed files

Code Example

What is the issue?

Relevant log output

OS

GPU

CPU

Ollama version

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING