hermes - ✅(Solved) Fix BUG: Context auto-compression never triggers when context_length == MINIMUM_CONTEXT_LENGTH (64000) [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14690Fetched 2026-04-24 06:15:17
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×2

Root Cause

In agent/context_compressor.py, the threshold_tokens calculation uses MINIMUM_CONTEXT_LENGTH (64000) as an absolute floor:

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

When context_length == 64000 (e.g., a local model with 192K context split across 3 parallel slots = 64K per slot), the floor value dominates:

  • max(44800, 64000) = 64000 → threshold = 100% of context window
  • should_compress() checks prompt_tokens >= threshold_tokens, but the API errors out before prompt_tokens can reach 64000
  • Compression never fires, regardless of the configured threshold percentage

This affects any configuration where context_length <= MINIMUM_CONTEXT_LENGTH / threshold_percent:

  • context_length=64000, threshold=0.7 → threshold_tokens = 64000 (100%) ❌
  • context_length=64000, threshold=0.5 → threshold_tokens = 64000 (100%) ❌
  • context_length=64000, threshold=0.85 → threshold_tokens = 64000 (100%) ❌
  • context_length=80000, threshold=0.7 → threshold_tokens = 64000 (80%) — works but threshold is higher than configured
  • context_length=128000, threshold=0.7 → threshold_tokens = 89600 (70%) ✅ correct

Fix Action

Fix

Add a safety check after the max() calculation — if the floor value pushes the threshold to 100% or beyond, fall back to the percentage-based value:

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

This preserves the original intent of the floor (preventing premature compression on large-context models) while ensuring compression can actually trigger when context_length is at or near the minimum.

PR fix notes

PR #14696: fix(compression): three bugs causing auto-compression to never trigger

Description (problem / solution / changelog)

Summary

Fixes three bugs in the context auto-compression system that collectively cause compression to never trigger for models with context_length at or near MINIMUM_CONTEXT_LENGTH (64000 tokens).

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Closes #14690

When context_length == MINIMUM_CONTEXT_LENGTH == 64000, the floor value in threshold_tokens calculation dominates:

# Before: max(44800, 64000) = 64000 = 100% of context → compression never triggers
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)

Fix: Fall back to percentage-based value when floor >= context_length:

if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Applied in both __init__ and update_model.

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Closes #14694

After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists.

Fix: Add time-based auto-recovery (300 seconds). If enough time has passed since the last compression attempt, reset the counter:

if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
    else:
        return False

Bug 3: Post-compression token estimate excludes tools schema

Closes #14695

After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold.

Fix: Use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern:

# Before:
_compressed_est = estimate_tokens_rough(new_system_prompt) + estimate_messages_tokens_rough(compressed)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed, system_prompt=new_system_prompt or "", tools=self.tools or None,
)

Testing

Verified with unit-level tests:

  • Bug 1: context_length=64000, threshold=0.7threshold_tokens=44800 (70%), should_compress(44800)=True
  • Bug 2: Anti-thrashing blocks within 300s window, auto-recovers after 300s elapsed
  • Bug 3: estimate_request_tokens_rough includes tools schema in token count

Files Changed

  • agent/context_compressor.py: Bug 1 fix (L320-321, L363-368) + Bug 2 fix (L299, L398-401, L418-436, L1283)
  • run_agent.py: Bug 3 fix (L7596-7607)

Changed files

  • agent/context_compressor.py (modified, +34/-8)
  • run_agent.py (modified, +8/-3)

PR #14878: fix: keep compression reachable at 64k context

Description (problem / solution / changelog)

Summary

  • keep ContextCompressor threshold percentage-based when the 64K floor would otherwise push it to the full context window
  • reuse the same threshold calculation in init and update_model
  • add regression tests for exact-64K initialization and model-update paths

Testing

  • python3 -m pytest -o addopts= tests/agent/test_context_compressor.py -k threshold

Closes #14690

Changed files

  • agent/context_compressor.py (modified, +15/-13)
  • tests/agent/test_context_compressor.py (modified, +22/-0)

Code Example

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

---

model:
  context_length: 64000
compression:
  enabled: true
  threshold: 0.7

---

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)
RAW_BUFFERClick to expand / collapse

Bug Description

Context auto-compression never triggers when the model's context_length equals MINIMUM_CONTEXT_LENGTH (64000 tokens). This causes conversations to grow until they hit the model's context limit and get forcefully degraded, instead of being automatically compressed at the configured threshold.

Root Cause

In agent/context_compressor.py, the threshold_tokens calculation uses MINIMUM_CONTEXT_LENGTH (64000) as an absolute floor:

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

When context_length == 64000 (e.g., a local model with 192K context split across 3 parallel slots = 64K per slot), the floor value dominates:

  • max(44800, 64000) = 64000 → threshold = 100% of context window
  • should_compress() checks prompt_tokens >= threshold_tokens, but the API errors out before prompt_tokens can reach 64000
  • Compression never fires, regardless of the configured threshold percentage

This affects any configuration where context_length <= MINIMUM_CONTEXT_LENGTH / threshold_percent:

  • context_length=64000, threshold=0.7 → threshold_tokens = 64000 (100%) ❌
  • context_length=64000, threshold=0.5 → threshold_tokens = 64000 (100%) ❌
  • context_length=64000, threshold=0.85 → threshold_tokens = 64000 (100%) ❌
  • context_length=80000, threshold=0.7 → threshold_tokens = 64000 (80%) — works but threshold is higher than configured
  • context_length=128000, threshold=0.7 → threshold_tokens = 89600 (70%) ✅ correct

Reproduction

  1. Configure a local model with context_length: 64000 in config.yaml
  2. Set compression.threshold: 0.7
  3. Start a long conversation and observe that context grows past 70% without triggering compression
  4. Conversation eventually hits the context limit and gets forcefully degraded

Config:

model:
  context_length: 64000
compression:
  enabled: true
  threshold: 0.7

Fix

Add a safety check after the max() calculation — if the floor value pushes the threshold to 100% or beyond, fall back to the percentage-based value:

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

This preserves the original intent of the floor (preventing premature compression on large-context models) while ensuring compression can actually trigger when context_length is at or near the minimum.

Related Design Issues (not bugs, but worth noting)

  1. Anti-thrashing has no auto-recovery: _ineffective_compression_count >= 2 causes should_compress() to permanently return False until /new resets the session. No decay or timeout mechanism exists.

  2. Post-compression token estimate excludes tools schema: After compression, last_prompt_tokens is set to a rough estimate (len(str)//4) that does not include tools schema tokens (potentially 20-30K), causing the next compression cycle to trigger later than configured.

Environment

  • Hermes Agent version: latest main (ce089169)
  • Model: Qwen3.6-35B-A3B (local llama.cpp, 192K context / 3 parallel slots = 64K per slot)
  • OS: Linux (ROCm)

extent analysis

TL;DR

The issue can be fixed by adding a safety check to prevent the threshold from being set to 100% or beyond when the context length equals the minimum context length.

Guidance

  • Review the agent/context_compressor.py file and update the threshold_tokens calculation to include the proposed safety check.
  • Verify that the compression triggers correctly by testing with different context lengths and threshold percentages.
  • Consider addressing the related design issues, such as implementing an auto-recovery mechanism for anti-thrashing and improving the post-compression token estimate.

Example

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Notes

The proposed fix assumes that the context_length and threshold_percent values are correctly configured. It's essential to test the updated code with various scenarios to ensure the compression triggers as expected.

Recommendation

Apply the proposed workaround by updating the threshold_tokens calculation with the safety check, as it preserves the original intent of the floor while ensuring compression can trigger when context_length is at or near the minimum.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix BUG: Context auto-compression never triggers when context_length == MINIMUM_CONTEXT_LENGTH (64000) [2 pull requests, 1 participants]