Compression success should be evaluated by material request-size reduction and the active trigger reason, not only by message count. Examples: - If compression reduced estimated request tokens below threshold, continue. - If compression reduced estimated request tokens materially but not enough, allow another pass or report token pressure accurately. - If the trigger was message-count hygiene, either run a mode that actually reduces effective message rows or emit a specific message-count-hygiene no-op reason. - Do not report `Context length exceeded` when the model context is 1M and post-compression estimate is ~183k.

hermes - 💡(How to fix) Fix [Bug]: Compression token savings ignored when message count is unchanged, causing false context exhaustion [2 pull requests]

hermes2026-06-05 03:54:37

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Auto/preflight context compression can materially reduce token/request size while leaving the message count unchanged, but the conversation loop treats len(messages) >= orig_len as "cannot compress further" and returns a context-exhaustion failure. In gateway sessions this can auto-reset an otherwise viable long-context session even when the post-compression token count is far below the model context window.

Error Message

Update the error text to distinguish token overflow from message-count hygiene exhaustion.

The current error message misleads users into thinking GPT-5.5 cannot handle ~288k tokens, when the failure is Hermes' compression bookkeeping.

Root Cause

The model context window was 1,000,000 tokens. The compressed request estimate was ~183,180 tokens, well below the configured threshold and far below the model limit, but the session still failed because the message count did not decrease.

Fix Action

Fixed

Fixed by PR: fix(agent): count tokens, not just rows, as preflight compression progress (https://github.com/NousResearch/hermes-agent/pull/39574)
Fixed by PR: fix(agent): improve context compression success criterion to include token reduction (https://github.com/NousResearch/hermes-agent/pull/39673)

Code Example

2026-06-04 23:39:57 context compression started: session=20260604_131128_cd6624 messages=220 tokens=~288,028 model=gpt-5.5 focus=None
2026-06-04 23:41:38 context compression done: session=20260604_234138_2f8759 messages=220->220 tokens=~183,180
2026-06-04 23:41:38 Context length exceeded: 288,028 tokens. Cannot compress further.
2026-06-04 23:41:38 Auto-resetting session 20260604_131128_cd6624 after compression exhaustion.

---

_orig_len = len(messages)
messages, active_system_prompt = agent._compress_context(...)
if len(messages) >= _orig_len:
    break  # Cannot compress further

RAW_BUFFERClick to expand / collapse

Summary

Real observed failure

Main/default Telegram profile using GPT-5.5 with explicit 1M context:

model.default: gpt-5.5
model.provider: openai-codex
model.context_length: 1000000
compression.threshold: 0.35
compression.hygiene_hard_message_limit: 220 at the time of the incident

Logs:

2026-06-04 23:39:57 context compression started: session=20260604_131128_cd6624 messages=220 tokens=~288,028 model=gpt-5.5 focus=None
2026-06-04 23:41:38 context compression done: session=20260604_234138_2f8759 messages=220->220 tokens=~183,180
2026-06-04 23:41:38 Context length exceeded: 288,028 tokens. Cannot compress further.
2026-06-04 23:41:38 Auto-resetting session 20260604_131128_cd6624 after compression exhaustion.

Why this is wrong

Compression can succeed by reducing content/tool-result size without reducing the number of message objects. In this case it saved roughly 105k tokens (~36%) but preserved 220 message rows.

The conversation loop appears to use message-count reduction as the success criterion:

_orig_len = len(messages)
messages, active_system_prompt = agent._compress_context(...)
if len(messages) >= _orig_len:
    break  # Cannot compress further

That conflates two different conditions:

No-op compression: transcript materially unchanged.
Effective token compression: same number of rows, much smaller request.

Only (1) should be treated as compression exhaustion.

Expected behavior

Compression success should be evaluated by material request-size reduction and the active trigger reason, not only by message count.

Examples:

If compression reduced estimated request tokens below threshold, continue.
If compression reduced estimated request tokens materially but not enough, allow another pass or report token pressure accurately.
If the trigger was message-count hygiene, either run a mode that actually reduces effective message rows or emit a specific message-count-hygiene no-op reason.
Do not report Context length exceeded when the model context is 1M and post-compression estimate is ~183k.

Actual behavior

The loop treats unchanged message count as compression failure/exhaustion even when token pressure was substantially improved. The gateway then auto-resets the session.

Related issues

This is adjacent to, but not fully covered by:

#6202 — /compress can report success even when transcript is unchanged
#15195 — gateway hygiene hard message cap counts tool rows in tool-heavy Telegram sessions
#12626 — gateway auto-compacts below token pressure due to message count
#35809 — compression exhaustion / auto-reset loop

This issue is specifically about the success criterion after compression: same message count does not imply no compression.

Suggested fix direction

Return structured compression outcome metadata, e.g.:

changed_messages
old_message_count, new_message_count
old_request_tokens, new_request_tokens
token_savings_pct
trigger_reason (token_threshold, message_hygiene, manual, provider_413)
no_op_reason when applicable

Minimum viable fix:

Re-estimate request tokens immediately after _compress_context(...).
Treat compression as successful if request tokens decreased materially and/or fell below threshold, even if len(messages) is unchanged.
Only set compression_exhausted=True when both message count and request size are materially unchanged, or when post-compression request size still exceeds provider/model limits after max passes.
Update the error text to distinguish token overflow from message-count hygiene exhaustion.

Impact

Long-context models with 1M windows can be reset around ~200-400 raw transcript rows even when token usage is far below context.
Tool-heavy Telegram sessions are especially vulnerable because tool calls/results inflate raw row count.
The current error message misleads users into thinking GPT-5.5 cannot handle ~288k tokens, when the failure is Hermes' compression bookkeeping.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Compression success should be evaluated by material request-size reduction and the active trigger reason, not only by message count.

Examples:

If compression reduced estimated request tokens below threshold, continue.
If compression reduced estimated request tokens materially but not enough, allow another pass or report token pressure accurately.
If the trigger was message-count hygiene, either run a mode that actually reduces effective message rows or emit a specific message-count-hygiene no-op reason.
Do not report Context length exceeded when the model context is 1M and post-compression estimate is ~183k.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Compression token savings ignored when message count is unchanged, causing false context exhaustion [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Summary

Real observed failure

Why this is wrong

Expected behavior

Actual behavior

Related issues

Suggested fix direction

Impact

FAQ

Expected behavior

Still need to ship something?

TRENDING