hermes - 💡(How to fix) Fix Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

22:15:17  Preflight: 202,723 tokens, 143 messages → compression starts
22:16:49  Compressed: 143→8 messages, ~41,592 tokens ✓
22:17:46  Preflight: 200,703 tokens, 144 messages ← back to 144! Infinite loop.

Root Cause

Two separate issues compound:

Fix Action

Fix / Workaround

After applying these patches, compression correctly transitions to the new session and subsequent turns load the compressed history:

Code Example

22:15:17  Preflight: 202,723 tokens, 143 messages → compression starts
22:16:49  Compressed: 1438 messages, ~41,592 tokens ✓
22:17:46  Preflight: 200,703 tokens, 144 messages ← back to 144! Infinite loop.

---

if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
    session_entry.session_id = agent_result["session_id"]

---

# run_agent.py:12458 — before fix
if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # triggers compression directly, no anti-thrashing check

---

# run_agent.py:15409 — correct path
if self.compression_enabled and _compressor.should_compress(_real_tokens):

---

@@ -12455,7 +12455,13 @@
                 tools=self.tools or None,
             )
 
-            if _preflight_tokens >= self.context_compressor.threshold_tokens:
+            # Use should_compress() to respect anti-thrashing protection.
+            # Without this, preflight bypasses the ineffective_compression_count
+            # check and can trigger infinite compression loops when system prompt
+            # overhead (skills + tools) keeps total tokens above threshold even
+            # after aggressive message compression.
+            if _preflight_tokens >= self.context_compressor.threshold_tokens \
+               and self.context_compressor.should_compress(_preflight_tokens):
                 logger.info(
                     "Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
                     f"{_preflight_tokens:,}",

@@ -16012,6 +16012,7 @@
 
         # Build result with interrupt info if applicable
         result = {
+            "session_id": self.session_id,
             "final_response": final_response,
             "last_reasoning": last_reasoning,
             "messages": messages,

---

@@ -8512,6 +8512,7 @@
             if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
                 session_entry.session_id = agent_result["session_id"]
+                self.session_store._save()

---

Before: 143→8→144→8→144 (infinite loop)
After:  143→8→8→8 (stable)
RAW_BUFFERClick to expand / collapse

Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Version: v2026.5.16-594-g0ba7339f7 (commit 0ba7339f7) Repro: Long conversation with large <available_skills> block + compression triggered during preflight or main loop


Problem

When context compression fires, _compress_context() creates a new session ID and writes compressed messages to the new session in SQLite. However, the gateway never learns about the new session ID, so on the next turn it loads the old (pre-compression) transcript — causing an infinite compression loop.

Observed behavior

22:15:17  Preflight: 202,723 tokens, 143 messages → compression starts
22:16:49  Compressed: 143→8 messages, ~41,592 tokens ✓
22:17:46  Preflight: 200,703 tokens, 144 messages ← back to 144! Infinite loop.

Compression works correctly (143→8), but the next turn loads the original 143 messages again because the gateway still references the old session ID.

Root cause

Two separate issues compound:

Issue A: session_id not returned in run_conversation() result

_compress_context() updates self.session_id to the new session, but run_conversation() never includes session_id in its return dict. Gateway code at gateway/run.py:8513 explicitly checks for this field:

if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
    session_entry.session_id = agent_result["session_id"]

This guard has existed for a while but is dead code — the agent never sends session_id, so the condition is always false. The gateway keeps using the old session ID and loads the uncompressed history on every turn.

Fix: Add "session_id": self.session_id to the result dict in run_conversation().

Issue B: Preflight compression bypasses should_compress() anti-thrashing

Preflight compression (before the main loop) uses a raw token comparison instead of going through should_compress():

# run_agent.py:12458 — before fix
if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # triggers compression directly, no anti-thrashing check

Meanwhile, the main loop correctly uses should_compress():

# run_agent.py:15409 — correct path
if self.compression_enabled and _compressor.should_compress(_real_tokens):

should_compress() has anti-thrashing logic that skips compression when the last two passes saved <10% each. Preflight bypasses this entirely, so even when compression is ineffective (e.g., system prompt overhead dominates), preflight will keep triggering it.

Fix: Preflight should also call should_compress(_preflight_tokens).

Issue C: Gateway doesn't persist session ID change

Even after fixing Issue A, gateway/run.py:8514 updates session_entry.session_id in memory but doesn't call self.session_store._save(). If the gateway restarts or the session entry is reloaded from disk, the new session ID is lost.

Fix: Add self.session_store._save() after updating session_entry.session_id.

Impact

  • Users with large skill lists (long <available_skills> block) hit this reliably — the system prompt overhead means compressed messages are small but total request tokens stay above threshold
  • Compression runs repeatedly on every turn, burning tokens and adding latency
  • No user-visible escape hatch — /compress doesn't help because the root cause is the session ID mismatch

Proposed fix

Three changes across two files:

run_agent.py — Add session_id to result dict + preflight anti-thrashing:

@@ -12455,7 +12455,13 @@
                 tools=self.tools or None,
             )
 
-            if _preflight_tokens >= self.context_compressor.threshold_tokens:
+            # Use should_compress() to respect anti-thrashing protection.
+            # Without this, preflight bypasses the ineffective_compression_count
+            # check and can trigger infinite compression loops when system prompt
+            # overhead (skills + tools) keeps total tokens above threshold even
+            # after aggressive message compression.
+            if _preflight_tokens >= self.context_compressor.threshold_tokens \
+               and self.context_compressor.should_compress(_preflight_tokens):
                 logger.info(
                     "Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
                     f"{_preflight_tokens:,}",

@@ -16012,6 +16012,7 @@
 
         # Build result with interrupt info if applicable
         result = {
+            "session_id": self.session_id,
             "final_response": final_response,
             "last_reasoning": last_reasoning,
             "messages": messages,

gateway/run.py — Persist session ID change:

@@ -8512,6 +8512,7 @@
             if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
                 session_entry.session_id = agent_result["session_id"]
+                self.session_store._save()

Verification

After applying these patches, compression correctly transitions to the new session and subsequent turns load the compressed history:

Before: 143→8→144→8→144 (infinite loop)
After:  143→8→8→8 (stable)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING