hermes - 💡(How to fix) Fix feat: Language Auto-Correction — enforce user-defined grammar/spelling standard regardless of LLM training

hermes2026-05-24 14:17:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Portuguese users (BR/PT) frequently report that AI assistants quietly switch between pt-BR and pt-PT
Brazilian Portuguese has distinct grammar rules (second-person vs. third-person address, specific accent patterns, different vocabulary) that LLMs often confuse
A Brazilian Portuguese user asking for a "relatório" should not receive a "relatório" with European Portuguese spelling
This feature would also benefit any non-English user whose LLM was primarily English-trained

Code Example

language_standard: pt-BR  # BCP 47: en-US, pt-BR, es-MX, etc.
strict_orthography: false

---

User Input → [Pre-generation: inject language tag] → LLM → [Post-generation: grammar check] → Response

RAW_BUFFERClick to expand / collapse

Problem

Every LLM has baked-in linguistic biases — it tends to output in whatever language it was most trained on, regardless of what the user actually wrote in. When a user writes in Brazilian Portuguese, the model often mixes European Portuguese, drops accents, or silently "corrects" grammar in ways that do not match the user's established standard. The same happens with any non-primary-language user.

There is no way to tell Hermes: "my working language is pt-BR — maintain it faithfully, fix my grammar within that language, and never switch to pt-PT or English."

Proposed Solution

A language standard enforcement layer that:

User declares their target language/variant at startup or via config:
- language_standard: pt-BR (with optional region/country tag)
- Stored in profile/config, not per-session
Pre-generation hook (prompt injection): Before sending user input to the LLM, prepend a short system instruction:

"Always respond in the same language and linguistic variant the user is writing in. If the user writes in Brazilian Portuguese, respond in Brazilian Portuguese — never switch to European Portuguese, English, or any other variant. Maintain consistent grammar, spelling, and accentuation."
Post-generation grammar/spell check: After the LLM responds, run a lightweight grammar-check pass using the user's declared standard. This is critical for models that were not fine-tuned on the user's variant — it catches common errors even when the model tried to stay in the right language.
- Tools: after-generation check with spell-check-pt-br / lingua-correction or similar
- Can be toggled on/off via config flag
Orthography mode toggle:
- strict_orthography: true — enforce accentuation, hyphenation, and grammar of the declared standard
- strict_orthography: false (default) — lightweight corrections only

Why This Matters

Portuguese users (BR/PT) frequently report that AI assistants quietly switch between pt-BR and pt-PT
Brazilian Portuguese has distinct grammar rules (second-person vs. third-person address, specific accent patterns, different vocabulary) that LLMs often confuse
A Brazilian Portuguese user asking for a "relatório" should not receive a "relatório" with European Portuguese spelling
This feature would also benefit any non-English user whose LLM was primarily English-trained

Implementation Suggestion

Config fields (in config.yaml or .env)

language_standard: pt-BR  # BCP 47: en-US, pt-BR, es-MX, etc.
strict_orthography: false

Hook flow

User Input → [Pre-generation: inject language tag] → LLM → [Post-generation: grammar check] → Response

File ownership

agent/hooks/ or agent/language_standard/ for the pre/post hooks
hermes_cli/ for the config loading

Alternatives Considered

Relying on system prompt alone — does not work reliably across all providers/models
Per-session language declaration — too verbose, should be persistent in profile
Waiting for all LLMs to be equally good in all languages — not realistic

Priority: Medium — not blocking, but meaningfully improves UX for the majority of non-English users.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering