hermes - 💡(How to fix) Fix Context compression triggers every turn due to rough token estimate inflating last_prompt_tokens [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

File: agent/conversation_compression.py:418

After compression, last_prompt_tokens is set to estimate_request_tokens_rough(compressed_messages) — a rough estimate that overcounts by 30-50% for tool-heavy sessions because tool schemas are counted twice (once in content/4, once as explicit overhead in model_metadata.py:1806).

Mechanism:

  1. Compress finishes → last_prompt_tokens = estimate_request_tokens_rough(compressed) ≈ 177K (overestimated)
  2. User sends one more message (only 1-2 new messages)
  3. Preflight: last_prompt_tokens > 0, so the rough overestimated value is used directly in should_compress() (conversation_loop.py:3303-3309)
  4. should_compress(177K) fires because the value exceeds threshold
  5. Anti-thrashing protection (context_compressor.py:612) requires 2 consecutive compressions with savings < 10%, but the loop triggers before this limit is hit

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Bug Description

Context compression fires repeatedly every 1-2 turns, even immediately after compression just ran. This creates a compression loop where sessions never stabilize.

Steps to Reproduce

  1. Start a session with many tools enabled (50+ tool schemas)
  2. Send a few messages — after the first compression
  3. Send one more message
  4. Observe: compression fires again immediately (savings < 10%)

Root Cause

File: agent/conversation_compression.py:418

After compression, last_prompt_tokens is set to estimate_request_tokens_rough(compressed_messages) — a rough estimate that overcounts by 30-50% for tool-heavy sessions because tool schemas are counted twice (once in content/4, once as explicit overhead in model_metadata.py:1806).

Mechanism:

  1. Compress finishes → last_prompt_tokens = estimate_request_tokens_rough(compressed) ≈ 177K (overestimated)
  2. User sends one more message (only 1-2 new messages)
  3. Preflight: last_prompt_tokens > 0, so the rough overestimated value is used directly in should_compress() (conversation_loop.py:3303-3309)
  4. should_compress(177K) fires because the value exceeds threshold
  5. Anti-thrashing protection (context_compressor.py:612) requires 2 consecutive compressions with savings < 10%, but the loop triggers before this limit is hit

Key Files

FileLineRole
agent/model_metadata.py1806estimate_request_tokens_rough() — tool schemas counted twice
agent/conversation_compression.py418Sets last_prompt_tokens to rough (overestimated) value
agent/conversation_loop.py3303-3309Uses last_prompt_tokens directly for should_compress()
agent/context_compressor.py601-621should_compress() + anti-thrashing check

Suggested Fix

Option A (minimal, one line): After compression, set last_prompt_tokens = 0 instead of the rough estimate. Forces the next cycle to wait for actual API-reported prompt_tokens.

Option B (guard): Add a minimum-message-count check in preflight before evaluating compression eligibility (e.g., only check when ≥ 20 messages since last compression).


Reported by Hermes Agent user, root cause traced via systematic-debugging skill.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Context compression triggers every turn due to rough token estimate inflating last_prompt_tokens [1 pull requests]