hermes - 💡(How to fix) Fix Context compression triggers every turn due to rough token estimate inflating last_prompt_tokens [1 pull requests]

hermes2026-05-17 19:02:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

File: agent/conversation_compression.py:418

After compression, last_prompt_tokens is set to estimate_request_tokens_rough(compressed_messages) — a rough estimate that overcounts by 30-50% for tool-heavy sessions because tool schemas are counted twice (once in content/4, once as explicit overhead in model_metadata.py:1806).

Mechanism:

Compress finishes → last_prompt_tokens = estimate_request_tokens_rough(compressed) ≈ 177K (overestimated)
User sends one more message (only 1-2 new messages)
Preflight: last_prompt_tokens > 0, so the rough overestimated value is used directly in should_compress() (conversation_loop.py:3303-3309)
should_compress(177K) fires because the value exceeds threshold
Anti-thrashing protection (context_compressor.py:612) requires 2 consecutive compressions with savings < 10%, but the loop triggers before this limit is hit

Fix Action

Fixed

Fixed by PR: fix(compression): prevent compression loop from rough token estimate inflation (#27566) (https://github.com/NousResearch/hermes-agent/pull/27624)

RAW_BUFFERClick to expand / collapse

Bug Description

Context compression fires repeatedly every 1-2 turns, even immediately after compression just ran. This creates a compression loop where sessions never stabilize.

Steps to Reproduce

Start a session with many tools enabled (50+ tool schemas)
Send a few messages — after the first compression
Send one more message
Observe: compression fires again immediately (savings < 10%)

Root Cause

File: agent/conversation_compression.py:418

Mechanism:

Compress finishes → last_prompt_tokens = estimate_request_tokens_rough(compressed) ≈ 177K (overestimated)
User sends one more message (only 1-2 new messages)
Preflight: last_prompt_tokens > 0, so the rough overestimated value is used directly in should_compress() (conversation_loop.py:3303-3309)
should_compress(177K) fires because the value exceeds threshold
Anti-thrashing protection (context_compressor.py:612) requires 2 consecutive compressions with savings < 10%, but the loop triggers before this limit is hit

Key Files

File	Line	Role
`agent/model_metadata.py`	1806	`estimate_request_tokens_rough()` — tool schemas counted twice
`agent/conversation_compression.py`	418	Sets `last_prompt_tokens` to rough (overestimated) value
`agent/conversation_loop.py`	3303-3309	Uses `last_prompt_tokens` directly for `should_compress()`
`agent/context_compressor.py`	601-621	`should_compress()` + anti-thrashing check

Suggested Fix

Option A (minimal, one line): After compression, set last_prompt_tokens = 0 instead of the rough estimate. Forces the next cycle to wait for actual API-reported prompt_tokens.

Option B (guard): Add a minimum-message-count check in preflight before evaluating compression eligibility (e.g., only check when ≥ 20 messages since last compression).

Reported by Hermes Agent user, root cause traced via systematic-debugging skill.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Context compression triggers every turn due to rough token estimate inflating last_prompt_tokens [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Bug Description

Steps to Reproduce

Root Cause

Key Files

Suggested Fix

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Context compression triggers every turn due to rough token estimate inflating last_prompt_tokens [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Bug Description

Steps to Reproduce

Root Cause

Key Files

Suggested Fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING