hermes - ✅(Solved) Fix Gateway hygiene hard message cap counts tool rows, causing early compaction in tool-heavy Telegram sessions [1 pull requests, 1 comments, 2 participants]

hermes2026-04-24 14:46:49

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15195•Fetched 2026-04-25 06:23:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

danielz1z

Participants

alt-glitch

danielz1z

Timeline (top)

labeled ×4commented ×1cross-referenced ×1

Fix Action

Fixed

Fixed by PR: fix(gateway): count only substantive messages for hard message cap, not tool plumbing rows (#15195) (https://github.com/NousResearch/hermes-agent/pull/15439)

PR fix notes

PR #15439: fix(gateway): count only substantive messages for hard message cap, not tool plumbing rows (#15195)

Repository: NousResearch/hermes-agent
Author: Tranquil-Flow
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15439

Description (problem / solution / changelog)

Summary

Fixes #15195.

The gateway hygiene hard message cap uses len(history) to count messages, which includes tool-call wrappers and tool-result rows. A single user exchange with 10 tool calls produces 21+ raw rows. With _HARD_MSG_LIMIT = 400, a session with just 20 real user exchanges that use tools heavily can produce 400+ raw rows, triggering premature compaction and losing conversation context.

Changes

gateway/run.py: Replaced _msg_count = len(history) with a filtered count that only includes user messages and assistant messages with actual content. Tool-call-only assistant messages (no content) and tool role messages are excluded. Added _raw_row_count for logging. Updated the log message to show both substantive message count and raw row count for debugging clarity.

Validation

Tests: tests/gateway/test_session_hygiene.py — 5 new tests in TestHardMessageCapExcludesToolRows:

Tool-heavy session (30 user + 30 assistant + 340 tool = 400 raw) does NOT trigger cap
Substantive-heavy session (200 user + 200 assistant = 400 substantive) DOES trigger
Tool-call-only assistant messages are not counted
Empty-content user messages still count
Raw row count preserved for logging

Tested on macOS (Python 3.11).

Changed files

gateway/run.py (modified, +14/-3)
tests/gateway/test_session_hygiene.py (modified, +112/-0)

Code Example

_msg_count = len(history)
_HARD_MSG_LIMIT = 400
_needs_compress = (
    _approx_tokens >= _compress_token_threshold
    or _msg_count >= _HARD_MSG_LIMIT
)

---

reason=message_count raw_rows=403 tokens=312077 threshold=892500

RAW_BUFFERClick to expand / collapse

Question / possible bug

Gateway session hygiene has a hard message cap:

_msg_count = len(history)
_HARD_MSG_LIMIT = 400
_needs_compress = (
    _approx_tokens >= _compress_token_threshold
    or _msg_count >= _HARD_MSG_LIMIT
)

Is it intentional that _msg_count counts raw transcript rows, including tool result rows and empty assistant tool-call wrapper rows?

In tool-heavy Telegram sessions, this can trigger hygiene compression far below the token threshold.

Example

I saw hygiene compactions like:

400 messages, ~258,155 tokens
409 messages, ~443,475 tokens
403 messages, ~312,077 tokens

while the logged threshold was:

85% of 1,050,000 = 892,500 tokens

In one real session before compaction, the transcript composition was roughly:

21 user messages
19 visible assistant messages
188 tool rows
135 empty assistant/tool-call wrapper rows
364 total raw rows

So the hard cap can be approached after ~20 actual user turns in a tool-heavy session.

Why I’m asking

I understand the 400 cap was added as a safety valve for #2153 / PR #4750 to prevent API-disconnect compression death spirals.

The question is whether the hard cap should count:

raw transcript rows, current behavior
only user + visible assistant messages
weighted/effective messages
or be configurable, e.g. gateway.session_hygiene.max_messages

At minimum, it might help if the log included the trigger reason:

reason=message_count raw_rows=403 tokens=312077 threshold=892500

Currently the log only shows the token threshold, which makes it look like compression fired despite being far below threshold.

extent analysis

TL;DR

The hard message cap in gateway session hygiene may be triggering compression too aggressively in tool-heavy sessions due to counting raw transcript rows, and a more nuanced approach to counting messages may be needed.

Guidance

Review the current implementation of _msg_count and consider alternative methods for counting messages, such as only counting user and visible assistant messages.
Evaluate the effectiveness of the current _HARD_MSG_LIMIT value and consider making it configurable, e.g., via a gateway.session_hygiene.max_messages setting.
Enhance logging to include the trigger reason for compression, such as the example log message provided, to improve visibility into when and why compression is occurring.
Consider implementing a weighted or effective message count to more accurately reflect the actual message volume in tool-heavy sessions.

Example

No code snippet is provided as the issue is more related to the logic and configuration of the gateway session hygiene rather than a specific code implementation.

Notes

The current implementation may lead to premature compression in tool-heavy sessions, and a more nuanced approach to counting messages may be necessary to prevent this. The exact solution will depend on the specific requirements and constraints of the system.

Recommendation

Apply a workaround by modifying the _msg_count calculation to only include user and visible assistant messages, or make the _HARD_MSG_LIMIT value configurable to allow for more fine-grained control over compression triggering. This will help prevent premature compression in tool-heavy sessions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory management #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Gateway hygiene hard message cap counts tool rows, causing early compaction in tool-heavy Telegram sessions [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15439: fix(gateway): count only substantive messages for hard message cap, not tool plumbing rows (#15195)

Description (problem / solution / changelog)

Summary

Changes

Validation

Changed files

Code Example

Question / possible bug

Example

Why I’m asking

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Gateway hygiene hard message cap counts tool rows, causing early compaction in tool-heavy Telegram sessions [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #15439: fix(gateway): count only substantive messages for hard message cap, not tool plumbing rows (#15195)

Description (problem / solution / changelog)

Summary

Changes

Validation

Changed files

Code Example

Question / possible bug

Example

Why I’m asking

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING