hermes - ✅(Solved) Fix BUG: Post-compression token estimate excludes tools schema, delaying next compression cycle [2 pull requests, 1 participants]

hermes2026-04-23 19:02:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14695•Fetched 2026-04-24 06:15:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

devilardis

Participants

devilardis

Timeline (top)

labeled ×3cross-referenced ×2

Root Cause

In run_agent.py _compress_context(), the post-compression token estimate uses estimate_tokens_rough() + estimate_messages_tokens_rough(), which only count message content:

# Lines 7598-7602
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est

However, estimate_messages_tokens_rough() is defined as:

def estimate_messages_tokens_rough(messages):
    total_chars = sum(len(str(msg)) for msg in messages)
    return (total_chars + 3) // 4

This completely omits the tools schema, which estimate_request_tokens_rough() includes:

def estimate_request_tokens_rough(messages, *, system_prompt="", tools=None):
    total_chars = 0
    if system_prompt:
        total_chars += len(system_prompt)
    if messages:
        total_chars += sum(len(str(msg)) for msg in messages)
    if tools:
        total_chars += len(str(tools))  # <-- This is the missing piece
    return (total_chars + 3) // 4

The codebase itself documents this issue in the docstring: "With 50+ tools enabled, schemas alone can add 20-30K tokens — a significant blind spot when only counting messages."

Fix Action

Fix

Replace the post-compression estimate with estimate_request_tokens_rough(), which already exists in the codebase and includes tools schema:

# Before:
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

estimate_request_tokens_rough is already imported in run_agent.py (line 90) and used for preflight compression checks (lines 8852, 8898), so this is consistent with existing patterns.

PR fix notes

PR #14696: fix(compression): three bugs causing auto-compression to never trigger

Repository: NousResearch/hermes-agent
Author: devilardis
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14696

Description (problem / solution / changelog)

Summary

Fixes three bugs in the context auto-compression system that collectively cause compression to never trigger for models with context_length at or near MINIMUM_CONTEXT_LENGTH (64000 tokens).

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Closes #14690

When context_length == MINIMUM_CONTEXT_LENGTH == 64000, the floor value in threshold_tokens calculation dominates:

# Before: max(44800, 64000) = 64000 = 100% of context → compression never triggers
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)

Fix: Fall back to percentage-based value when floor >= context_length:

if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Applied in both __init__ and update_model.

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Closes #14694

After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists.

Fix: Add time-based auto-recovery (300 seconds). If enough time has passed since the last compression attempt, reset the counter:

if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
    else:
        return False

Bug 3: Post-compression token estimate excludes tools schema

Closes #14695

After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold.

Fix: Use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern:

# Before:
_compressed_est = estimate_tokens_rough(new_system_prompt) + estimate_messages_tokens_rough(compressed)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed, system_prompt=new_system_prompt or "", tools=self.tools or None,
)

Testing

Verified with unit-level tests:

Bug 1: context_length=64000, threshold=0.7 → threshold_tokens=44800 (70%), should_compress(44800)=True
Bug 2: Anti-thrashing blocks within 300s window, auto-recovers after 300s elapsed
Bug 3: estimate_request_tokens_rough includes tools schema in token count

Files Changed

agent/context_compressor.py: Bug 1 fix (L320-321, L363-368) + Bug 2 fix (L299, L398-401, L418-436, L1283)
run_agent.py: Bug 3 fix (L7596-7607)

Changed files

agent/context_compressor.py (modified, +34/-8)
run_agent.py (modified, +8/-3)

PR #14882: fix: include tools in post-compression token estimate

Repository: NousResearch/hermes-agent
Author: LeonSGP43
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14882

Description (problem / solution / changelog)

Summary

use estimate_request_tokens_rough(...) when updating last_prompt_tokens after context compression
include tool schema tokens in the post-compression estimate so the next compression cycle is scheduled against the real request size
add a regression test covering _compress_context() with a large tool schema

Closes #14695

Testing

python3 -m pytest -o addopts= tests/run_agent/test_context_token_tracking.py

Changed files

run_agent.py (modified, +4/-3)
tests/run_agent/test_context_token_tracking.py (modified, +43/-0)

Code Example

# Lines 7598-7602
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est

---

def estimate_messages_tokens_rough(messages):
    total_chars = sum(len(str(msg)) for msg in messages)
    return (total_chars + 3) // 4

---

def estimate_request_tokens_rough(messages, *, system_prompt="", tools=None):
    total_chars = 0
    if system_prompt:
        total_chars += len(system_prompt)
    if messages:
        total_chars += sum(len(str(msg)) for msg in messages)
    if tools:
        total_chars += len(str(tools))  # <-- This is the missing piece
    return (total_chars + 3) // 4

---

Compression completes
  → last_prompt_tokens = estimate WITHOUT tools schema (e.g., 25,000)
  → Actual API prompt_tokens = 50,000 (includes 25K tools schema)
  → Next turn: should_compress(25000) returns False
  → Context must grow another 20-30K tokens before compression triggers
  → User experiences delayed compression, potentially hitting context limit

---

# Before:
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

RAW_BUFFERClick to expand / collapse

Bug Description

After context compression completes, last_prompt_tokens is set to a rough estimate that does not include tools schema tokens. With 50+ tools enabled, schemas alone can add 20-30K tokens. This causes the next compression cycle to trigger much later than the configured threshold, because should_compress() compares against an underestimate.

Root Cause

In run_agent.py _compress_context(), the post-compression token estimate uses estimate_tokens_rough() + estimate_messages_tokens_rough(), which only count message content:

# Lines 7598-7602
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)
self.context_compressor.last_prompt_tokens = _compressed_est

However, estimate_messages_tokens_rough() is defined as:

def estimate_messages_tokens_rough(messages):
    total_chars = sum(len(str(msg)) for msg in messages)
    return (total_chars + 3) // 4

This completely omits the tools schema, which estimate_request_tokens_rough() includes:

def estimate_request_tokens_rough(messages, *, system_prompt="", tools=None):
    total_chars = 0
    if system_prompt:
        total_chars += len(system_prompt)
    if messages:
        total_chars += sum(len(str(msg)) for msg in messages)
    if tools:
        total_chars += len(str(tools))  # <-- This is the missing piece
    return (total_chars + 3) // 4

The codebase itself documents this issue in the docstring: "With 50+ tools enabled, schemas alone can add 20-30K tokens — a significant blind spot when only counting messages."

Impact Chain

Compression completes
  → last_prompt_tokens = estimate WITHOUT tools schema (e.g., 25,000)
  → Actual API prompt_tokens = 50,000 (includes 25K tools schema)
  → Next turn: should_compress(25000) returns False
  → Context must grow another 20-30K tokens before compression triggers
  → User experiences delayed compression, potentially hitting context limit

For a 64K context model with 70% threshold (44,800 tokens):

After compression, actual tokens might be 50K (25K messages + 25K tools)
But last_prompt_tokens is set to 25K
Next compression won't trigger until last_prompt_tokens reaches 44,800
That means actual token usage must reach ~70K before compression fires — exceeding the 64K context limit

Fix

Replace the post-compression estimate with estimate_request_tokens_rough(), which already exists in the codebase and includes tools schema:

# Before:
_compressed_est = (
    estimate_tokens_rough(new_system_prompt)
    + estimate_messages_tokens_rough(compressed)
)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

estimate_request_tokens_rough is already imported in run_agent.py (line 90) and used for preflight compression checks (lines 8852, 8898), so this is consistent with existing patterns.

Environment

Hermes Agent version: latest main (ce089169)
Model: Qwen3.6-35B-A3B (local llama.cpp, 64K per slot)
OS: Linux (ROCm)

extent analysis

TL;DR

Replace the post-compression token estimate in run_agent.py with estimate_request_tokens_rough() to include tools schema tokens.

Guidance

Identify the current implementation of estimate_tokens_rough() and estimate_messages_tokens_rough() in run_agent.py to understand the existing token estimation logic.
Verify that estimate_request_tokens_rough() is already imported and used in run_agent.py for preflight compression checks.
Update the post-compression token estimate to use estimate_request_tokens_rough() with the tools parameter to include tools schema tokens.
Test the updated implementation to ensure that last_prompt_tokens accurately reflects the total token count, including tools schema tokens.

Example

# Updated code snippet
_compressed_est = estimate_request_tokens_rough(
    compressed,
    system_prompt=new_system_prompt or "",
    tools=self.tools or None,
)

Notes

This fix assumes that estimate_request_tokens_rough() is correctly implemented and includes the tools schema tokens. Additionally, this solution may not apply to older versions of the Hermes Agent or different model configurations.

Recommendation

Apply the workaround by replacing the post-compression token estimate with estimate_request_tokens_rough() to ensure accurate token counting and prevent delayed compression.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #parallel task #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix BUG: Post-compression token estimate excludes tools schema, delaying next compression cycle [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

PR fix notes

PR #14696: fix(compression): three bugs causing auto-compression to never trigger

Description (problem / solution / changelog)

Summary

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Bug 3: Post-compression token estimate excludes tools schema

Testing

Files Changed

Changed files

PR #14882: fix: include tools in post-compression token estimate

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Bug Description

Root Cause

Impact Chain

Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING