hermes - ✅(Solved) Fix BUG: Context auto-compression never triggers when context_length == MINIMUM_CONTEXT_LENGTH (64000) [2 pull requests, 1 participants]

hermes2026-04-23 18:56:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14690•Fetched 2026-04-24 06:15:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

devilardis

Participants

devilardis

Timeline (top)

labeled ×3cross-referenced ×2

Root Cause

In agent/context_compressor.py, the threshold_tokens calculation uses MINIMUM_CONTEXT_LENGTH (64000) as an absolute floor:

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

When context_length == 64000 (e.g., a local model with 192K context split across 3 parallel slots = 64K per slot), the floor value dominates:

max(44800, 64000) = 64000 → threshold = 100% of context window
should_compress() checks prompt_tokens >= threshold_tokens, but the API errors out before prompt_tokens can reach 64000
Compression never fires, regardless of the configured threshold percentage

This affects any configuration where context_length <= MINIMUM_CONTEXT_LENGTH / threshold_percent:

context_length=64000, threshold=0.7 → threshold_tokens = 64000 (100%) ❌
context_length=64000, threshold=0.5 → threshold_tokens = 64000 (100%) ❌
context_length=64000, threshold=0.85 → threshold_tokens = 64000 (100%) ❌
context_length=80000, threshold=0.7 → threshold_tokens = 64000 (80%) — works but threshold is higher than configured
context_length=128000, threshold=0.7 → threshold_tokens = 89600 (70%) ✅ correct

Fix Action

Fix

Add a safety check after the max() calculation — if the floor value pushes the threshold to 100% or beyond, fall back to the percentage-based value:

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

This preserves the original intent of the floor (preventing premature compression on large-context models) while ensuring compression can actually trigger when context_length is at or near the minimum.

PR fix notes

PR #14696: fix(compression): three bugs causing auto-compression to never trigger

Repository: NousResearch/hermes-agent
Author: devilardis
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14696

Description (problem / solution / changelog)

Summary

Fixes three bugs in the context auto-compression system that collectively cause compression to never trigger for models with context_length at or near MINIMUM_CONTEXT_LENGTH (64000 tokens).

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Closes #14690

When context_length == MINIMUM_CONTEXT_LENGTH == 64000, the floor value in threshold_tokens calculation dominates:

# Before: max(44800, 64000) = 64000 = 100% of context → compression never triggers
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)

Fix: Fall back to percentage-based value when floor >= context_length:

if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Applied in both __init__ and update_model.

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Closes #14694

After 2 consecutive ineffective compressions (<10% savings each), should_compress() returns False forever. No timeout, decay, or auto-recovery mechanism exists.

Fix: Add time-based auto-recovery (300 seconds). If enough time has passed since the last compression attempt, reset the counter:

if self._ineffective_compression_count >= 2:
    _elapsed = time.monotonic() - self._last_compression_time
    if _elapsed > self._ANTI_THRASH_RECOVERY_SECONDS:
        self._ineffective_compression_count = 0
    else:
        return False

Bug 3: Post-compression token estimate excludes tools schema

Closes #14695

After compression, last_prompt_tokens is set using estimate_messages_tokens_rough() which omits tools schema tokens (20-30K with 50+ tools). This causes the next compression cycle to trigger much later than the configured threshold.

Fix: Use estimate_request_tokens_rough() which includes tools schema, consistent with the preflight compression check pattern:

# Before:
_compressed_est = estimate_tokens_rough(new_system_prompt) + estimate_messages_tokens_rough(compressed)

# After:
_compressed_est = estimate_request_tokens_rough(
    compressed, system_prompt=new_system_prompt or "", tools=self.tools or None,
)

Testing

Verified with unit-level tests:

Bug 1: context_length=64000, threshold=0.7 → threshold_tokens=44800 (70%), should_compress(44800)=True
Bug 2: Anti-thrashing blocks within 300s window, auto-recovers after 300s elapsed
Bug 3: estimate_request_tokens_rough includes tools schema in token count

Files Changed

agent/context_compressor.py: Bug 1 fix (L320-321, L363-368) + Bug 2 fix (L299, L398-401, L418-436, L1283)
run_agent.py: Bug 3 fix (L7596-7607)

Changed files

agent/context_compressor.py (modified, +34/-8)
run_agent.py (modified, +8/-3)

PR #14878: fix: keep compression reachable at 64k context

Repository: NousResearch/hermes-agent
Author: LeonSGP43
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14878

Description (problem / solution / changelog)

Summary

keep ContextCompressor threshold percentage-based when the 64K floor would otherwise push it to the full context window
reuse the same threshold calculation in init and update_model
add regression tests for exact-64K initialization and model-update paths

Testing

python3 -m pytest -o addopts= tests/agent/test_context_compressor.py -k threshold

Closes #14690

Changed files

agent/context_compressor.py (modified, +15/-13)
tests/agent/test_context_compressor.py (modified, +22/-0)

Code Example

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

---

model:
  context_length: 64000
compression:
  enabled: true
  threshold: 0.7

---

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

RAW_BUFFERClick to expand / collapse

Bug Description

Context auto-compression never triggers when the model's context_length equals MINIMUM_CONTEXT_LENGTH (64000 tokens). This causes conversations to grow until they hit the model's context limit and get forcefully degraded, instead of being automatically compressed at the configured threshold.

Root Cause

In agent/context_compressor.py, the threshold_tokens calculation uses MINIMUM_CONTEXT_LENGTH (64000) as an absolute floor:

# Lines 356-359 (__init__) and 316-319 (update_model)
self.threshold_tokens = max(
    int(self.context_length * threshold_percent),  # e.g., int(64000 * 0.7) = 44800
    MINIMUM_CONTEXT_LENGTH,                        # 64000
)

When context_length == 64000 (e.g., a local model with 192K context split across 3 parallel slots = 64K per slot), the floor value dominates:

max(44800, 64000) = 64000 → threshold = 100% of context window
should_compress() checks prompt_tokens >= threshold_tokens, but the API errors out before prompt_tokens can reach 64000
Compression never fires, regardless of the configured threshold percentage

This affects any configuration where context_length <= MINIMUM_CONTEXT_LENGTH / threshold_percent:

context_length=64000, threshold=0.7 → threshold_tokens = 64000 (100%) ❌
context_length=64000, threshold=0.5 → threshold_tokens = 64000 (100%) ❌
context_length=64000, threshold=0.85 → threshold_tokens = 64000 (100%) ❌
context_length=80000, threshold=0.7 → threshold_tokens = 64000 (80%) — works but threshold is higher than configured
context_length=128000, threshold=0.7 → threshold_tokens = 89600 (70%) ✅ correct

Reproduction

Configure a local model with context_length: 64000 in config.yaml
Set compression.threshold: 0.7
Start a long conversation and observe that context grows past 70% without triggering compression
Conversation eventually hits the context limit and gets forcefully degraded

Config:

model:
  context_length: 64000
compression:
  enabled: true
  threshold: 0.7

Fix

Add a safety check after the max() calculation — if the floor value pushes the threshold to 100% or beyond, fall back to the percentage-based value:

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Related Design Issues (not bugs, but worth noting)

Anti-thrashing has no auto-recovery: _ineffective_compression_count >= 2 causes should_compress() to permanently return False until /new resets the session. No decay or timeout mechanism exists.
Post-compression token estimate excludes tools schema: After compression, last_prompt_tokens is set to a rough estimate (len(str)//4) that does not include tools schema tokens (potentially 20-30K), causing the next compression cycle to trigger later than configured.

Environment

Hermes Agent version: latest main (ce089169)
Model: Qwen3.6-35B-A3B (local llama.cpp, 192K context / 3 parallel slots = 64K per slot)
OS: Linux (ROCm)

extent analysis

TL;DR

The issue can be fixed by adding a safety check to prevent the threshold from being set to 100% or beyond when the context length equals the minimum context length.

Guidance

Review the agent/context_compressor.py file and update the threshold_tokens calculation to include the proposed safety check.
Verify that the compression triggers correctly by testing with different context lengths and threshold percentages.
Consider addressing the related design issues, such as implementing an auto-recovery mechanism for anti-thrashing and improving the post-compression token estimate.

Example

self.threshold_tokens = max(
    int(self.context_length * threshold_percent),
    MINIMUM_CONTEXT_LENGTH,
)
if self.threshold_tokens >= self.context_length:
    self.threshold_tokens = int(self.context_length * threshold_percent)

Notes

The proposed fix assumes that the context_length and threshold_percent values are correctly configured. It's essential to test the updated code with various scenarios to ensure the compression triggers as expected.

Recommendation

Apply the proposed workaround by updating the threshold_tokens calculation with the safety check, as it preserves the original intent of the floor while ensuring compression can trigger when context_length is at or near the minimum.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #search optimization #API routing #API middleware #SSR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix BUG: Context auto-compression never triggers when context_length == MINIMUM_CONTEXT_LENGTH (64000) [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

PR fix notes

PR #14696: fix(compression): three bugs causing auto-compression to never trigger

Description (problem / solution / changelog)

Summary

Bug 1: MINIMUM_CONTEXT_LENGTH floor makes threshold=100% when context_length==64000

Bug 2: Anti-thrashing protection permanently disables compression with no recovery

Bug 3: Post-compression token estimate excludes tools schema

Testing

Files Changed

Changed files

PR #14878: fix: keep compression reachable at 64k context

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Bug Description

Root Cause

Reproduction

Fix

Related Design Issues (not bugs, but worth noting)

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING