hermes - ✅(Solved) Fix BUG: Post-compression token estimate excludes tools schema, delaying next compression cycle [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14695Fetched 2026-04-24 06:15:11
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×2

Root Cause

In run_agent.py _compress_context(), the post-compression token estimate uses estimate_tokens_rough() + estimate_messages_tokens_rough(), which only count message content:

# Lines 7598-7602
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est

However, estimate_messages_tokens_rough() is defined as:

def estimate_messages_tokens_rough(messages):
    total_chars = sum(len(str(msg)) for msg in messages)
    return (total_chars + 3) // 4

This completely omits the tools schema, which estimate_request_tokens_rough() includes:

def estimate_request_tokens_rough(messages, *, system_prompt="", tools=None):
    total_chars = 0
    if system_prompt:
        total_chars += len(system_prompt)
    if messages:
        total_chars += sum(len(str(msg)) for msg in messages)
    if tools:
        total_chars += len(str(tools))  # <-- This is the missing piece
    return (total_chars + 3) // 4

The codebase itself documents this issue in the docstring: "With 50+ tools enabled, schemas alone can add 20-30K tokens — a significant blind spot when only counting messages."

Fix Action

Fix

Replace the post-compression estimate with estimate_request_tokens_rough(), which already exists in the codebase and includes tools schema:

# Before:
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

estimate_request_tokens_rough is already imported in run_agent.py (line 90) and used for preflight compression checks (lines 8852, 8898), so this is consistent with existing patterns.

PR fix notes

PR #14696: fix(compression): three bugs causing auto-compression to never trigger

Description (problem / solution / changelog)

Summary

Fixes three bugs in the context auto-compression system that collectively cause compression to never trigger for models with context_length at or near MINIMUM_CONTEXT_LENGTH (64000 tokens).

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Closes #14690

When context_length == MINIMUM_CONTEXT_LENGTH == 64000, the floor value in threshold_tokens calculation dominates:

# Before: max(44800, 64000) = 64000 = 100% of context → compression never triggers
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)

Fix: Fall back to percentage-based value when floor >= context_length:

if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Applied in both __init__ and update_model.

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Closes #14694

After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists.

Fix: Add time-based auto-recovery (300 seconds). If enough time has passed since the last compression attempt, reset the counter:

if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
    else:
        return False

Bug 3: Post-compression token estimate excludes tools schema

Closes #14695

After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold.

Fix: Use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern:

# Before:
_compressed_est = estimate_tokens_rough(new_system_prompt) + estimate_messages_tokens_rough(compressed)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed, system_prompt=new_system_prompt or "", tools=self.tools or None,
)

Testing

Verified with unit-level tests:

  • Bug 1: context_length=64000, threshold=0.7threshold_tokens=44800 (70%), should_compress(44800)=True
  • Bug 2: Anti-thrashing blocks within 300s window, auto-recovers after 300s elapsed
  • Bug 3: estimate_request_tokens_rough includes tools schema in token count

Files Changed

  • agent/context_compressor.py: Bug 1 fix (L320-321, L363-368) + Bug 2 fix (L299, L398-401, L418-436, L1283)
  • run_agent.py: Bug 3 fix (L7596-7607)

Changed files

  • agent/context_compressor.py (modified, +34/-8)
  • run_agent.py (modified, +8/-3)

PR #14882: fix: include tools in post-compression token estimate

Description (problem / solution / changelog)

Summary

  • use estimate_request_tokens_rough(...) when updating last_prompt_tokens after context compression
  • include tool schema tokens in the post-compression estimate so the next compression cycle is scheduled against the real request size
  • add a regression test covering _compress_context() with a large tool schema

Closes #14695

Testing

  • python3 -m pytest -o addopts= tests/run_agent/test_context_token_tracking.py

Changed files

  • run_agent.py (modified, +4/-3)
  • tests/run_agent/test_context_token_tracking.py (modified, +43/-0)

Code Example

# Lines 7598-7602
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est

---

def estimate_messages_tokens_rough(messages):
    total_chars = sum(len(str(msg)) for msg in messages)
    return (total_chars + 3) // 4

---

def estimate_request_tokens_rough(messages, *, system_prompt="", tools=None):
    total_chars = 0
    if system_prompt:
        total_chars += len(system_prompt)
    if messages:
        total_chars += sum(len(str(msg)) for msg in messages)
    if tools:
        total_chars += len(str(tools))  # <-- This is the missing piece
    return (total_chars + 3) // 4

---

Compression completes
  → last_prompt_tokens = estimate WITHOUT tools schema (e.g., 25,000)
Actual API prompt_tokens = 50,000 (includes 25K tools schema)
Next turn: should_compress(25000) returns False
Context must grow another 20-30K tokens before compression triggers
User experiences delayed compression, potentially hitting context limit

---

# Before:
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)
RAW_BUFFERClick to expand / collapse

Bug Description

After context compression completes, last_prompt_tokens is set to a rough estimate that does not include tools schema tokens. With 50+ tools enabled, schemas alone can add 20-30K tokens. This causes the next compression cycle to trigger much later than the configured threshold, because should_compress() compares against an underestimate.

Root Cause

In run_agent.py _compress_context(), the post-compression token estimate uses estimate_tokens_rough() + estimate_messages_tokens_rough(), which only count message content:

# Lines 7598-7602
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est

However, estimate_messages_tokens_rough() is defined as:

def estimate_messages_tokens_rough(messages):
    total_chars = sum(len(str(msg)) for msg in messages)
    return (total_chars + 3) // 4

This completely omits the tools schema, which estimate_request_tokens_rough() includes:

def estimate_request_tokens_rough(messages, *, system_prompt="", tools=None):
    total_chars = 0
    if system_prompt:
        total_chars += len(system_prompt)
    if messages:
        total_chars += sum(len(str(msg)) for msg in messages)
    if tools:
        total_chars += len(str(tools))  # <-- This is the missing piece
    return (total_chars + 3) // 4

The codebase itself documents this issue in the docstring: "With 50+ tools enabled, schemas alone can add 20-30K tokens — a significant blind spot when only counting messages."

Impact Chain

Compression completes
  → last_prompt_tokens = estimate WITHOUT tools schema (e.g., 25,000)
  → Actual API prompt_tokens = 50,000 (includes 25K tools schema)
  → Next turn: should_compress(25000) returns False
  → Context must grow another 20-30K tokens before compression triggers
  → User experiences delayed compression, potentially hitting context limit

For a 64K context model with 70% threshold (44,800 tokens):

  • After compression, actual tokens might be 50K (25K messages + 25K tools)
  • But last_prompt_tokens is set to 25K
  • Next compression won't trigger until last_prompt_tokens reaches 44,800
  • That means actual token usage must reach ~70K before compression fires — exceeding the 64K context limit

Fix

Replace the post-compression estimate with estimate_request_tokens_rough(), which already exists in the codebase and includes tools schema:

# Before:
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

estimate_request_tokens_rough is already imported in run_agent.py (line 90) and used for preflight compression checks (lines 8852, 8898), so this is consistent with existing patterns.

Environment

  • Hermes Agent version: latest main (ce089169)
  • Model: Qwen3.6-35B-A3B (local llama.cpp, 64K per slot)
  • OS: Linux (ROCm)

extent analysis

TL;DR

Replace the post-compression token estimate in run_agent.py with estimate_request_tokens_rough() to include tools schema tokens.

Guidance

  • Identify the current implementation of estimate_tokens_rough() and estimate_messages_tokens_rough() in run_agent.py to understand the existing token estimation logic.
  • Verify that estimate_request_tokens_rough() is already imported and used in run_agent.py for preflight compression checks.
  • Update the post-compression token estimate to use estimate_request_tokens_rough() with the tools parameter to include tools schema tokens.
  • Test the updated implementation to ensure that last_prompt_tokens accurately reflects the total token count, including tools schema tokens.

Example

# Updated code snippet
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

Notes

This fix assumes that estimate_request_tokens_rough() is correctly implemented and includes the tools schema tokens. Additionally, this solution may not apply to older versions of the Hermes Agent or different model configurations.

Recommendation

Apply the workaround by replacing the post-compression token estimate with estimate_request_tokens_rough() to ensure accurate token counting and prevent delayed compression.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING