ollama - 💡(How to fix) Fix gemma4:e4b drops accented/Unicode characters, producing garbled French text [10 comments, 9 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
ollama/ollama#15234Fetched 2026-04-08 02:34:00
View on GitHub
Comments
10
Participants
9
Timeline
20
Reactions
7
Timeline (top)
commented ×10subscribed ×5unsubscribed ×3mentioned ×1

Code Example

curl -s http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role":"user","content":"Écris un texte en français"}],
  "stream": false
}'
RAW_BUFFERClick to expand / collapse

What is the issue?

gemma4:e4b silently drops accented and multi-byte UTF-8 characters from its output, producing garbled text in French. Words lose their accented characters entirely (not just the diacritics — the whole character is removed), making the output largely incoherent.

Prompt: "écris un texte en français"

Actual output (excerpt):

Puisque vous n'avez pas spifi sujet, je vais un texte qui que le temps, la beaut et la n s'arr.

l'art de l'arr. Cet art n'exige ni ticket de transport, ni itin pr ; il ne demande qu'une pause, un simple moment o l'on cesse de courir pour commencer .

le grlement lointain du caf bout, le chant hitant d'un oiseau qui teste sa modie, le bruissement l d'une feuille qui danse au gr'une brise ti.

Expected output should have proper accented characters:

  • "spifi" → "spécifié"
  • "beaut" → "beauté"
  • "s'arr" → "s'arrête"
  • "grlement" → "grondement" (or "grésillement")
  • "modie" → "mélodie"
  • etc.

Every é, è, ê, ë, à, ù, ç, ô, î and similar characters are stripped from the output.

Screenshot

gemma4-unicode-bug

(Ollama macOS app, gemma4:e4b, French text generation with widespread character drops)

Steps to reproduce

curl -s http://localhost:11434/api/chat -d '{
  "model": "gemma4:e4b",
  "messages": [{"role":"user","content":"Écris un texte en français"}],
  "stream": false
}'

OS

macOS (Apple M4 Max, 64 GB)

Ollama version

0.20.0-rc0

GPU

Apple M4 Max (integrated)

Related issues

  • #15229 — same bug on gemma4:31b (drops Unicode diacritics)
  • #15231 — same bug on gemma4:31b with Polish characters

This appears to be a tokenizer/detokenizer issue in the Gemma 4 architecture affecting all model sizes, not just 31b.

extent analysis

TL;DR

The issue can be addressed by investigating and adjusting the text encoding or decoding process in the Gemma 4 architecture, specifically focusing on how accented and multi-byte UTF-8 characters are handled.

Guidance

  1. Verify Character Encoding: Ensure that the input and output character encoding is set to UTF-8 to support accented and multi-byte characters.
  2. Inspect Tokenizer/Detokenizer: Examine the tokenizer and detokenizer components of the Gemma 4 architecture for any potential issues with handling Unicode characters, as the problem seems to stem from this area.
  3. Test with Different Models: Although the issue appears in multiple model sizes, testing with other models or versions might help isolate if the problem is specific to certain configurations or if it's a broader architectural issue.
  4. Check for Similar Issues: Review related issues (#15229, #15231) for any insights or fixes that might apply to this scenario, especially since they mention similar problems with Unicode diacritics and characters in other languages.

Notes

The exact solution might depend on the specifics of the Gemma 4 architecture and its handling of Unicode characters, which are not fully detailed in the provided information. Therefore, a thorough investigation of the tokenizer, detokenizer, and any relevant encoding/decoding processes is necessary.

Recommendation

Apply a workaround by adjusting the character encoding handling in the Gemma 4 architecture, as upgrading to a fixed version is not mentioned as an option in the provided context. This approach is chosen because the issue seems to be related to how characters are processed within the architecture rather than a version-specific bug that could be resolved by an upgrade.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING