22:15:17 Preflight: 202,723 tokens, 143 messages → compression starts 22:16:49 Compressed: 143→8 messages, ~41,592 tokens ✓ 22:17:46 Preflight: 200,703 tokens, 144 messages ← back to 144! Infinite loop.

Code Example

22:15:17  Preflight: 202,723 tokens, 143 messages → compression starts
22:16:49  Compressed: 143→8 messages, ~41,592 tokens ✓
22:17:46  Preflight: 200,703 tokens, 144 messages ← back to 144! Infinite loop.

---

if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
    session_entry.session_id = agent_result["session_id"]

---

# run_agent.py:12458 — before fix
if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # triggers compression directly, no anti-thrashing check

---

# run_agent.py:15409 — correct path
if self.compression_enabled and _compressor.should_compress(_real_tokens):

---

@@ -12455,7 +12455,13 @@
                 tools=self.tools or None,
             )
 
-            if _preflight_tokens >= self.context_compressor.threshold_tokens:
+            # Use should_compress() to respect anti-thrashing protection.
+            # Without this, preflight bypasses the ineffective_compression_count
+            # check and can trigger infinite compression loops when system prompt
+            # overhead (skills + tools) keeps total tokens above threshold even
+            # after aggressive message compression.
+            if _preflight_tokens >= self.context_compressor.threshold_tokens \
+               and self.context_compressor.should_compress(_preflight_tokens):
                 logger.info(
                     "Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
                     f"{_preflight_tokens:,}",

@@ -16012,6 +16012,7 @@
 
         # Build result with interrupt info if applicable
         result = {
+            "session_id": self.session_id,
             "final_response": final_response,
             "last_reasoning": last_reasoning,
             "messages": messages,

---

@@ -8512,6 +8512,7 @@
             if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
                 session_entry.session_id = agent_result["session_id"]
+                self.session_store._save()

---

Before: 143→8→144→8→144 (infinite loop)
After:  143→8→8→8 (stable)

Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Version: v2026.5.16-594-g0ba7339f7 (commit 0ba7339f7) Repro: Long conversation with large <available_skills> block + compression triggered during preflight or main loop

Problem

When context compression fires, _compress_context() creates a new session ID and writes compressed messages to the new session in SQLite. However, the gateway never learns about the new session ID, so on the next turn it loads the old (pre-compression) transcript — causing an infinite compression loop.

Observed behavior

22:15:17  Preflight: 202,723 tokens, 143 messages → compression starts
22:16:49  Compressed: 143→8 messages, ~41,592 tokens ✓
22:17:46  Preflight: 200,703 tokens, 144 messages ← back to 144! Infinite loop.

Compression works correctly (143→8), but the next turn loads the original 143 messages again because the gateway still references the old session ID.

Root cause

Two separate issues compound:

Issue A: `session_id` not returned in `run_conversation()` result

_compress_context() updates self.session_id to the new session, but run_conversation() never includes session_id in its return dict. Gateway code at gateway/run.py:8513 explicitly checks for this field:

if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
    session_entry.session_id = agent_result["session_id"]

This guard has existed for a while but is dead code — the agent never sends session_id, so the condition is always false. The gateway keeps using the old session ID and loads the uncompressed history on every turn.

Fix: Add "session_id": self.session_id to the result dict in run_conversation().

Issue B: Preflight compression bypasses `should_compress()` anti-thrashing

Preflight compression (before the main loop) uses a raw token comparison instead of going through should_compress():

# run_agent.py:12458 — before fix
if _preflight_tokens >= self.context_compressor.threshold_tokens:
    # triggers compression directly, no anti-thrashing check

Meanwhile, the main loop correctly uses should_compress():

# run_agent.py:15409 — correct path
if self.compression_enabled and _compressor.should_compress(_real_tokens):

should_compress() has anti-thrashing logic that skips compression when the last two passes saved <10% each. Preflight bypasses this entirely, so even when compression is ineffective (e.g., system prompt overhead dominates), preflight will keep triggering it.

Fix: Preflight should also call should_compress(_preflight_tokens).

Issue C: Gateway doesn't persist session ID change

Even after fixing Issue A, gateway/run.py:8514 updates session_entry.session_id in memory but doesn't call self.session_store._save(). If the gateway restarts or the session entry is reloaded from disk, the new session ID is lost.

Fix: Add self.session_store._save() after updating session_entry.session_id.

Impact

Users with large skill lists (long <available_skills> block) hit this reliably — the system prompt overhead means compressed messages are small but total request tokens stay above threshold
Compression runs repeatedly on every turn, burning tokens and adding latency
No user-visible escape hatch — /compress doesn't help because the root cause is the session ID mismatch

Proposed fix

Three changes across two files:

run_agent.py — Add session_id to result dict + preflight anti-thrashing:

@@ -12455,7 +12455,13 @@
                 tools=self.tools or None,
             )
 
-            if _preflight_tokens >= self.context_compressor.threshold_tokens:
+            # Use should_compress() to respect anti-thrashing protection.
+            # Without this, preflight bypasses the ineffective_compression_count
+            # check and can trigger infinite compression loops when system prompt
+            # overhead (skills + tools) keeps total tokens above threshold even
+            # after aggressive message compression.
+            if _preflight_tokens >= self.context_compressor.threshold_tokens \
+               and self.context_compressor.should_compress(_preflight_tokens):
                 logger.info(
                     "Preflight compression: ~%s tokens >= %s threshold (model %s, ctx %s)",
                     f"{_preflight_tokens:,}",

@@ -16012,6 +16012,7 @@
 
         # Build result with interrupt info if applicable
         result = {
+            "session_id": self.session_id,
             "final_response": final_response,
             "last_reasoning": last_reasoning,
             "messages": messages,

gateway/run.py — Persist session ID change:

@@ -8512,6 +8512,7 @@
             if agent_result.get("session_id") and agent_result["session_id"] != session_entry.session_id:
                 session_entry.session_id = agent_result["session_id"]
+                self.session_store._save()

Verification

After applying these patches, compression correctly transitions to the new session and subsequent turns load the compressed history:

Before: 143→8→144→8→144 (infinite loop)
After:  143→8→8→8 (stable)

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Problem

Observed behavior

Root cause

Issue A: `session_id` not returned in `run_conversation()` result

Issue B: Preflight compression bypasses `should_compress()` anti-thrashing

Issue C: Gateway doesn't persist session ID change

Impact

Proposed fix

Verification

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug: Context compression creates new session but gateway never sees it — infinite compression loop

Problem

Observed behavior

Root cause

Issue A: session_id not returned in run_conversation() result

Issue B: Preflight compression bypasses should_compress() anti-thrashing

Issue C: Gateway doesn't persist session ID change

Impact

Proposed fix

Verification

Still need to ship something?

TRENDING

Issue A: `session_id` not returned in `run_conversation()` result

Issue B: Preflight compression bypasses `should_compress()` anti-thrashing