hermes - ✅(Solved) Fix [Bug]: Gateway can silently auto-compact at 400 messages even when /usage shows low context pressure [1 pull requests, 1 participants]

SJY051 · 2026-04-19T17:04:50Z

[hermes] PR 12874: feat gateway : make session hygiene message-limit configurable and observable - Repository: NousResearch/hermes-agent - Author: sontianye -… # PR #12874: feat(gateway): make session hygiene message-limit configurable and observable - Repository: NousResearch/hermes-agent - Author: sontianye - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/12874 ## Description (problem / solution / changelog) ## Summary Fixes #12626. Gateway session hygiene has a hard-coded `_HARD_MSG_LIMIT = 400` safety valve that silently auto-compresses transcripts when message count is reached — regardless of token pressure. Users see `/usage` reporting 34% context usage and are surprised by unexpected compaction. Three changes fix this: - **Configurable limit**: reads `gateway.session_hygiene.max_messages` from `config.yaml` (default: 400, `0` = disabled). Fully backward-compatible. - **User-visible notice**: when compaction fires due to message count (not token pressure), the platform receives a message explaining what happened, the actual token pressure %, and how to adjust the config. - **`/usage` shows message count**: both the live-agent and fallback branches now show `Messages: N / limit (pct%)` with a ⚠️ warning at 90%. ## Changes | File | Change | |---|---| | `gateway/run.py` | Read limit from config; track `_hygiene_reason`; send compaction notice; add message count to `/usage` (both branches) | | `cli-config.yaml.example` | Document `gateway.session_hygiene.max_messages` | | `tests/gateway/test_hygiene_observability.py` | 11 unit tests: config parsing, hygiene reason, `/usage` output | ## Config ```yaml # gateway.session_hygiene.max_messages controls the message-count safety valve. # Default: 400. Set to 0 to disable. gateway: session_hygiene: max_messages: 400 ``` ## User-facing behavior **`/usage` (new)**: ``` 📊 Session Token Usage Model: gpt-5.4 ... Context: 360,077 / 1,050,000 (34%) Compressions: 0 Messages: 362 / 400 (90%) ⚠️ Approaching message limit — session may auto-compact soon ``` **Compaction notice (new, message-count triggered only)**: ``` ℹ️ Session auto-compacted — your conversation reached 406 messages (limit: 400). Earlier context was summarized to keep things running smoothly. Token pressure was only 34% of the context window. To adjust the limit: set gateway.session_hygiene.max_messages in config.yaml. Set to 0 to disable the message-count valve. ``` ## Test plan - [x] `pytest tests/gateway/test_hygiene_observability.py` — 11 passed - [x] Default limit (400) preserved when config key absent - [x] Custom limit respected; `0` disables the valve - [x] Invalid config values fall back to 400 gracefully - [x] `/usage` shows count/limit in both agent-live and fallback branches - [x] 90% warning appears correctly ## Changed files - `cli-config.yaml.example` (modified, +11/-1) - `gateway/run.py` (modified, +71/-10) - `tests/gateway/test_hygiene_observability.py` (added, +176/-0) ## Fixed - Fixed by PR: feat(gateway): make session hygiene message-limit configurable and observable (https://github.com/NousResearch/hermes-agent/pull/12874) ## Bug Description In a long-lived Discord gateway DM, Hermes auto-compacted the session even though `/usage` showed only **34%** context usage (`360,077 / 1,050,000`). The compaction turned out to be triggered by **gateway session hygiene's hard 400-message safety valve**, not by token pressure. In my case the session had **406 messages**, so the gateway compacted it before the next turn. The surprising part is that this is effectively **invisible from the user-facing gateway UX**: - `/usage` reports low context pressure, so the user expects the session to be safe - there is no warning that message-count-based hygiene is about to fire - the auto-compaction can interrupt an important ongoing workflow and abruptly change the session's continuity shape This may be intentional behavior for gateway survivability, but if so it still needs **explicit visibility, documentation, and configurability**. ## Steps to Reproduce 1. Run Hermes via a gateway platform (observed on **Discord DM**). 2. Keep using one long-lived session until the transcript reaches **400+ messages**. 3. Check `/usage` shortly before the next inbound message. In my case it showed: - `Context: 360,077 / 1,050,000 (34%)` 4. Send the next normal user message. 5. Observe that gateway session hygiene auto-compacts the session even though token usage is still far below the reported context threshold. 6. Check `/usage` again after the response. In my case it dropped to: - `Context: 50,007 / 1,050,000 (5%)` ## Expected Behavior If gateway session hygiene can compact based on **message count** independently of token pressure, users should be able to see that clearly. At minimum, Hermes should do one or more of the following: 1. **Expose hygiene triggers in `/usage` and `/status`** - current message count - hard message limit (if enabled) - whether the next inbound turn is

hermes2026-04-19 17:04:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#12626•Fetched 2026-04-20 12:17:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

SJY051

Participants

SJY051

Timeline (top)

cross-referenced ×1referenced ×1

Error Message

Warn when auto-compaction is triggered by message count rather than token pressure

Root Cause

From local investigation:

/usage renders context_compressor.last_prompt_tokens / context_length, i.e. the last prompt-token-based view of context pressure.
gateway/run.py also has a separate session hygiene path that runs before the agent turn.
That hygiene path uses its own thresholding and includes a hard _HARD_MSG_LIMIT = 400 safety valve.
So a gateway session can auto-compact at low token percentage if it crosses the message-count limit.

This is understandable as a survivability mechanism (see related issue #2153), but it is not surfaced clearly enough to users.

Fix Action

Fixed

Fixed by PR: feat(gateway): make session hygiene message-limit configurable and observable (https://github.com/NousResearch/hermes-agent/pull/12874)

PR fix notes

PR #12874: feat(gateway): make session hygiene message-limit configurable and observable

Repository: NousResearch/hermes-agent
Author: sontianye
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/12874

Description (problem / solution / changelog)

Summary

Fixes #12626.

Gateway session hygiene has a hard-coded _HARD_MSG_LIMIT = 400 safety valve that silently auto-compresses transcripts when message count is reached — regardless of token pressure. Users see /usage reporting 34% context usage and are surprised by unexpected compaction. Three changes fix this:

Configurable limit: reads gateway.session_hygiene.max_messages from config.yaml (default: 400, 0 = disabled). Fully backward-compatible.
User-visible notice: when compaction fires due to message count (not token pressure), the platform receives a message explaining what happened, the actual token pressure %, and how to adjust the config.
/usage shows message count: both the live-agent and fallback branches now show Messages: N / limit (pct%) with a ⚠️ warning at 90%.

Changes

File	Change
`gateway/run.py`	Read limit from config; track `_hygiene_reason`; send compaction notice; add message count to `/usage` (both branches)
`cli-config.yaml.example`	Document `gateway.session_hygiene.max_messages`
`tests/gateway/test_hygiene_observability.py`	11 unit tests: config parsing, hygiene reason, `/usage` output

Config

# gateway.session_hygiene.max_messages controls the message-count safety valve.
# Default: 400. Set to 0 to disable.
gateway:
  session_hygiene:
    max_messages: 400

User-facing behavior

/usage (new):

📊 Session Token Usage
Model: gpt-5.4
...
Context: 360,077 / 1,050,000 (34%)
Compressions: 0
Messages: 362 / 400 (90%)
⚠️ Approaching message limit — session may auto-compact soon

Compaction notice (new, message-count triggered only):

ℹ️ Session auto-compacted — your conversation reached 406 messages (limit: 400).
Earlier context was summarized to keep things running smoothly. Token pressure
was only 34% of the context window.
To adjust the limit: set gateway.session_hygiene.max_messages in config.yaml.
Set to 0 to disable the message-count valve.

Test plan

pytest tests/gateway/test_hygiene_observability.py — 11 passed
Default limit (400) preserved when config key absent
Custom limit respected; 0 disables the valve
Invalid config values fall back to 400 gracefully
/usage shows count/limit in both agent-live and fallback branches
90% warning appears correctly

Changed files

cli-config.yaml.example (modified, +11/-1)
gateway/run.py (modified, +71/-10)
tests/gateway/test_hygiene_observability.py (added, +176/-0)

Code Example

📊 Session Token Usage
Model: gpt-5.4
Input tokens: 11,275,580
Cache read tokens: 2,227,840
Output tokens: 27,100
Total: 13,530,520
API calls: 42
Cost: $0.0000
Context: 360,077 / 1,050,000 (34%)

---

📊 Session Token Usage
Model: gpt-5.4
Input tokens: 13,088,788
Cache read tokens: 2,279,040
Output tokens: 30,556
Total: 15,398,384
API calls: 48
Cost: $0.0000
Context: 50,007 / 1,050,000 (5%)

---

2026-04-20 01:45:04,833 INFO gateway.run: Session hygiene: 406 messages, ~365,433 tokens (actual) — auto-compressing (threshold: 85% of 1,050,000 = 892,500 tokens)
2026-04-20 01:47:10,331 INFO gateway.run: Session hygiene: compressed 406 → 7 msgs, ~365,433 → ~8,346 tokens

---

version:          0.10.0 (2026.4.16) [d0e1388c]
os:               Linux 6.6.87.2-microsoft-standard-WSL2 x86_64
python:           3.11.15
profile:          default
model:            gpt-5.4
provider:         openai-codex
terminal:         local
memory_provider:  holographic
platforms:        discord
compression.threshold: 0.95

RAW_BUFFERClick to expand / collapse

Bug Description

In a long-lived Discord gateway DM, Hermes auto-compacted the session even though /usage showed only 34% context usage (360,077 / 1,050,000).

The compaction turned out to be triggered by gateway session hygiene's hard 400-message safety valve, not by token pressure. In my case the session had 406 messages, so the gateway compacted it before the next turn.

The surprising part is that this is effectively invisible from the user-facing gateway UX:

/usage reports low context pressure, so the user expects the session to be safe
there is no warning that message-count-based hygiene is about to fire
the auto-compaction can interrupt an important ongoing workflow and abruptly change the session's continuity shape

This may be intentional behavior for gateway survivability, but if so it still needs explicit visibility, documentation, and configurability.

Steps to Reproduce

Run Hermes via a gateway platform (observed on Discord DM).
Keep using one long-lived session until the transcript reaches 400+ messages.
Check /usage shortly before the next inbound message. In my case it showed:
- Context: 360,077 / 1,050,000 (34%)
Send the next normal user message.
Observe that gateway session hygiene auto-compacts the session even though token usage is still far below the reported context threshold.
Check /usage again after the response. In my case it dropped to:
- Context: 50,007 / 1,050,000 (5%)

Expected Behavior

If gateway session hygiene can compact based on message count independently of token pressure, users should be able to see that clearly.

At minimum, Hermes should do one or more of the following:

Expose hygiene triggers in /usage and /status
- current message count
- hard message limit (if enabled)
- whether the next inbound turn is likely to trigger hygiene compaction
Warn when auto-compaction is triggered by message count rather than token pressure
- e.g. “Gateway session hygiene compacted this session because it exceeded the 400-message safety limit, even though token usage was still below the model threshold.”
Make the hard message limit configurable
- config key and/or environment variable
- personal DM workflows may prefer a much higher limit than public/group gateways
Document this behavior explicitly
- if this is intended design, it should still be discoverable in docs and status commands

Actual Behavior

Gateway session hygiene silently compacts based on message count, while /usage only shows the last prompt-token-based view of context pressure.

This makes the compaction feel random from the user side.

Observed user-facing /usage snapshots:

📊 Session Token Usage
Model: gpt-5.4
Input tokens: 11,275,580
Cache read tokens: 2,227,840
Output tokens: 27,100
Total: 13,530,520
API calls: 42
Cost: $0.0000
Context: 360,077 / 1,050,000 (34%)

📊 Session Token Usage
Model: gpt-5.4
Input tokens: 13,088,788
Cache read tokens: 2,279,040
Output tokens: 30,556
Total: 15,398,384
API calls: 48
Cost: $0.0000
Context: 50,007 / 1,050,000 (5%)

Relevant log evidence:

2026-04-20 01:45:04,833 INFO gateway.run: Session hygiene: 406 messages, ~365,433 tokens (actual) — auto-compressing (threshold: 85% of 1,050,000 = 892,500 tokens)
2026-04-20 01:47:10,331 INFO gateway.run: Session hygiene: compressed 406 → 7 msgs, ~365,433 → ~8,346 tokens

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)
Agent Core (conversation loop, context compression, memory)
Configuration (config.yaml, .env, hermes setup)

Messaging Platform (if gateway-related)

Discord

Debug Report

I ran hermes debug share --local and am including the relevant environment summary here instead of a paste link:

version:          0.10.0 (2026.4.16) [d0e1388c]
os:               Linux 6.6.87.2-microsoft-standard-WSL2 x86_64
python:           3.11.15
profile:          default
model:            gpt-5.4
provider:         openai-codex
terminal:         local
memory_provider:  holographic
platforms:        discord
compression.threshold: 0.95

Operating System

Ubuntu 24.04.4 LTS (Noble Numbat) under WSL2

Python Version

3.11.15

Hermes Version

Hermes Agent v0.10.0 (2026.4.16)

Root Cause Analysis

From local investigation:

/usage renders context_compressor.last_prompt_tokens / context_length, i.e. the last prompt-token-based view of context pressure.
gateway/run.py also has a separate session hygiene path that runs before the agent turn.
That hygiene path uses its own thresholding and includes a hard _HARD_MSG_LIMIT = 400 safety valve.
So a gateway session can auto-compact at low token percentage if it crosses the message-count limit.

This is understandable as a survivability mechanism (see related issue #2153), but it is not surfaced clearly enough to users.

Proposed Fix

I think the safest and least disruptive improvement would be:

Keep the safety valve, but make it visible
Add message-count / hygiene state to /usage and /status
Emit a post-compaction reason when hygiene fired due to message count
Add a config/env override for the hard message limit, for example:
- gateway.session_hygiene.max_messages
- or HERMES_GATEWAY_HARD_MSG_LIMIT
Document that gateway continuity may compact on message count even when token pressure looks low

Related Context

This is related to, but not the same as:

#2153 — gateway context-compression death spiral; contributor added the 400-message hard safety valve
#7317 — gateway-visible context/compression observability gap
#10617 — proposal to expose richer context composition / usage information

My issue here is specifically the user-facing surprise / observability / configurability gap: a gateway session can be compacted in the middle of important work even though /usage appears to say “only 34% used.”

extent analysis

TL;DR

To address the issue of silent session compaction based on message count, make the hard message limit visible and configurable, and add message-count-based hygiene state to /usage and /status commands.

Guidance

Expose hygiene triggers: Add current message count, hard message limit, and likelihood of triggering hygiene compaction to /usage and /status commands for better user visibility.
Warn on auto-compaction: Emit a post-compaction reason when hygiene fires due to message count, informing users why the session was compacted.
Make the hard message limit configurable: Introduce a config key or environment variable (e.g., gateway.session_hygiene.max_messages or HERMES_GATEWAY_HARD_MSG_LIMIT) to allow users to adjust the message limit according to their needs.
Document the behavior: Clearly document in the official documentation that gateway sessions can compact based on message count, even when token pressure is low, to manage user expectations.

Example

No specific code snippet is provided as the issue focuses on configuration and visibility changes rather than code modifications.

Notes

The proposed fix aims to improve user visibility and control over session compaction without removing the safety valve, which is intended for survivability. The changes should be tested to ensure they do not introduce unintended behavior, especially regarding the configurability of the hard message limit.

Recommendation

Apply the workaround by making the necessary configuration changes and updates to /usage and /status commands to improve visibility and user control over session compaction, as the issue highlights a gap in user-facing functionality rather than a bug that would be fixed by an upgrade.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #indexing error #inference speed #output truncation #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.