hermes - ✅(Solved) Fix [Bug]: Ollama/GLM stop-to-length heuristic false-triggers on responses ending with emoji sign-offs [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14572Fetched 2026-04-24 06:16:28
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×4commented ×1cross-referenced ×1

The heuristic added in commit 8011aa31 (fix(agent): continue ollama glm truncation replies) treats any Ollama-hosted GLM finish_reason=stop response as truncated if it doesn't end with a character in its terminal-punctuation whitelist. The whitelist only covers ASCII punctuation and CJK punctuation — emojis (💛, ✨, 🙌, …) are not included. Any agent that habitually signs off with an emoji has 100% of its replies falsely reclassified as truncated, producing a continuation loop that exhausts max_turns (up to 60 turns × 3 continuation attempts = ~180 wasted API calls per single user message) without ever delivering a response to the user.

Root Cause

run_agent.py::_has_natural_response_ending (line ~2557) whitelists only:

. ! ? : ) " ' ] } 。!?:)】」』》

Responses ending with a hundred-thousand-other-Unicode-codepoints, including every emoji, are considered "not naturally ended" and force-reclassified. Combined with the condition that there must be prior tool messages in the conversation (run_agent.py:2589) and content ≥20 chars with whitespace (run_agent.py:2602), this fires on virtually any substantive reply from a persona with emoji sign-offs.

In our case: a personal-assistant agent that ends every Telegram message with 💛 had ~89 continuation attempts logged after a single user message before the user noticed the hang.

Fix Action

Fix / Workaround

  1. Expand the natural-ending whitelist to include Unicode emoji ranges (\p{Emoji_Presentation} / \p{Extended_Pictographic}) and common sign-off glyphs.
  2. Raise the minimum-length gate (currently 20 chars) to something like 500+ chars so only genuinely long truncations are suspicious. Short/medium replies are unlikely to be truncated in practice.
  3. Add a negative signal — e.g., if the reply is short enough to fit well within max_tokens, the "stop" is probably real.
  4. Make the heuristic opt-in via config flag (agent.glm_truncation_heuristic: true|false) so users facing false-positives can disable it without a local patch.

Workaround we're running

Local patch that short-circuits _should_treat_stop_as_truncated to return False. Tracked in our out-of-tree patch set. Happy to submit a PR for any of the fixes above if the maintainers have a preference.

PR fix notes

PR #14574: fix(agent): accept emoji sign-offs as natural endings

Description (problem / solution / changelog)

Summary

  • treat emoji sign-offs as natural response endings for the Ollama/GLM stop-to-length heuristic
  • ignore trailing emoji joiners and modifiers before checking the terminal visible character
  • add a regression test covering a tool-using GLM reply that ends with 💛

Testing

  • python3 -m pytest -o addopts='' tests/run_agent/test_run_agent.py -q -k 'ollama_glm_stop or non_ollama_stop_without_terminal_boundary'

Closes #14572

Changed files

  • run_agent.py (modified, +23/-1)
  • tests/run_agent/test_run_agent.py (modified, +33/-0)

Code Example

model:
     default: glm-5
     provider: ollama-cloud
     base_url: https://ollama.com/v1

---

⚠️  Treating suspicious Ollama/GLM stop response as truncated
   ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
Requesting continuation (1/3)...

---

. ! ? : ) " ' ] } 。!?:)】」』》
RAW_BUFFERClick to expand / collapse

Summary

The heuristic added in commit 8011aa31 (fix(agent): continue ollama glm truncation replies) treats any Ollama-hosted GLM finish_reason=stop response as truncated if it doesn't end with a character in its terminal-punctuation whitelist. The whitelist only covers ASCII punctuation and CJK punctuation — emojis (💛, ✨, 🙌, …) are not included. Any agent that habitually signs off with an emoji has 100% of its replies falsely reclassified as truncated, producing a continuation loop that exhausts max_turns (up to 60 turns × 3 continuation attempts = ~180 wasted API calls per single user message) without ever delivering a response to the user.

Reproduction

  1. Configure a GLM model via ollama-cloud (or local Ollama):
    model:
      default: glm-5
      provider: ollama-cloud
      base_url: https://ollama.com/v1
  2. Give the agent a persona that ends messages with an emoji sign-off (a common pattern for "warm, friendly" Telegram/Discord assistants — e.g., always end with 💛 or ✨).
  3. Send any message that triggers tool use first, then a response.
  4. Observe gateway log:
    ⚠️  Treating suspicious Ollama/GLM stop response as truncated
    ⚠️  Response truncated (finish_reason='length') - model hit max output tokens
    ↻ Requesting continuation (1/3)...
    Repeats up to max_turns × 3 times per user message. User sees nothing, platform integration hangs.

Root cause

run_agent.py::_has_natural_response_ending (line ~2557) whitelists only:

. ! ? : ) " ' ] } 。!?:)】」』》

Responses ending with a hundred-thousand-other-Unicode-codepoints, including every emoji, are considered "not naturally ended" and force-reclassified. Combined with the condition that there must be prior tool messages in the conversation (run_agent.py:2589) and content ≥20 chars with whitespace (run_agent.py:2602), this fires on virtually any substantive reply from a persona with emoji sign-offs.

In our case: a personal-assistant agent that ends every Telegram message with 💛 had ~89 continuation attempts logged after a single user message before the user noticed the hang.

Why existing open PR #13111 doesn't fix this

PR #13111 narrows _is_ollama_glm_backend() to Ollama-specific signatures. That helps sglang/vLLM users, but not Ollama users with emoji sign-offs — ollama.com still matches the post-PR check, so the heuristic still fires.

Suggested fixes (non-exclusive)

  1. Expand the natural-ending whitelist to include Unicode emoji ranges (\p{Emoji_Presentation} / \p{Extended_Pictographic}) and common sign-off glyphs.
  2. Raise the minimum-length gate (currently 20 chars) to something like 500+ chars so only genuinely long truncations are suspicious. Short/medium replies are unlikely to be truncated in practice.
  3. Add a negative signal — e.g., if the reply is short enough to fit well within max_tokens, the "stop" is probably real.
  4. Make the heuristic opt-in via config flag (agent.glm_truncation_heuristic: true|false) so users facing false-positives can disable it without a local patch.

Option 1 alone would fix our case; options 1+2 combined would also address the "conversational reply without period" and "Markdown/code reply ending with a URL" classes of false positives.

Workaround we're running

Local patch that short-circuits _should_treat_stop_as_truncated to return False. Tracked in our out-of-tree patch set. Happy to submit a PR for any of the fixes above if the maintainers have a preference.

Environment

  • Hermes Agent HEAD (as of 2026-04-22)
  • Model: glm-5 via ollama-cloud (https://ollama.com/v1)
  • Platform: Telegram DM agent (persona ends replies with 💛)
  • macOS 14.6 / Python 3.12 / launchd-managed gateway

extent analysis

TL;DR

The most likely fix is to expand the natural-ending whitelist to include Unicode emoji ranges to prevent false positives for agents with emoji sign-offs.

Guidance

  • Review the _has_natural_response_ending function in run_agent.py to understand the current whitelist and consider adding Unicode emoji ranges (\p{Emoji_Presentation} / \p{Extended_Pictographic}) to the whitelist.
  • Evaluate the effectiveness of raising the minimum-length gate (currently 20 chars) to reduce false positives for short/medium replies.
  • Consider adding a negative signal to the heuristic, such as checking if the reply is short enough to fit within max_tokens, to improve accuracy.
  • Weigh the pros and cons of making the heuristic opt-in via a config flag (agent.glm_truncation_heuristic: true|false) to allow users to disable it if needed.

Example

No code example is provided as the issue is more related to the logic and configuration of the heuristic rather than a specific code snippet.

Notes

The suggested fixes are non-exclusive, and a combination of them may be necessary to fully address the issue. The effectiveness of each fix may vary depending on the specific use case and environment.

Recommendation

Apply workaround by expanding the natural-ending whitelist to include Unicode emoji ranges, as this is the most direct solution to the problem described. This fix can be implemented by modifying the _has_natural_response_ending function in run_agent.py to include the necessary Unicode ranges.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING