hermes - 💡(How to fix) Fix Conversation compression desynchronizes session ID between agent context and gateway routing, causing silent message loss

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The agent self-diagnosed it as "another parallel session of myself," which is the correct intuition but the wrong root cause. The real root cause is in the compression flow.

Fix Action

Fix / Workaround

  • agent/conversation_compression.py (or wherever the module logged as agent.conversation_compression lives)
  • The gateway runner that dispatches messages to agent sessions (likely under agent/ or tui_gateway/)
  • The conversation store layer (sqlite path, probably agent/state/ or similar)
  • tests/ — wherever compression unit tests live today, plus the multi-agent integration test directory if one exists

Code Example

16:15:40 INFO [20260528_093510_66bfaec7] compression started: session=20260528_093510_66bfaec7  messages=194 tokens=~145,948
16:16:35 INFO [20260528_093510_66bfaec7] compression done:    session=20260528_161635_bb86b1   messages=194->26 tokens=~41,442

---

16:12:31 INFO [20260528_155201_68dc5996] compression started: session=20260528_155201_68dc5996  messages=156
16:13:18 INFO [20260528_155201_68dc5996] compression done:    session=20260528_155201_68dc5996  messages=156->8

16:14:01 INFO [20260528_093510_66bfaec7] compression started: session=20260528_155201_68dc5996  messages=39
16:14:29 INFO [20260528_093510_66bfaec7] compression done:    session=20260528_155201_68dc5996  messages=39->13

---

2026-05-28 16:12:31 INFO [20260528_155201_68dc5996] agent.conversation_compression: context compression started: session=20260528_155201_68dc5996 messages=156 tokens=~145,003 model=claude-opus-4.7
2026-05-28 16:13:18 INFO [20260528_155201_68dc5996] agent.conversation_compression: context compression done:    session=20260528_155201_68dc5996 messages=156->8 tokens=~29,548

2026-05-28 16:14:01 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression started: session=20260528_155201_68dc5996 messages=39 tokens=~143,661 model=claude-opus-4.7
2026-05-28 16:14:29 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression done:    session=20260528_155201_68dc5996 messages=39->13 tokens=~9,501

2026-05-28 16:15:40 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression started: session=20260528_093510_66bfaec7 messages=194 tokens=~145,948 model=claude-opus-4.7
2026-05-28 16:16:35 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression done:    session=20260528_161635_bb86b1 messages=194->26 tokens=~41,442
RAW_BUFFERClick to expand / collapse

Owner: Tony (Clarami Strategy Lead) — filing from claramiai/writer_ai agent cluster, willing to triage but not author the fix.

Problem

Conversation compression silently desynchronizes an agent's in-context working memory from the gateway's routing state. After a compression event, the agent has no recollection of recent turns (tool calls, inbound messages, replies it sent) while the gateway continues to deliver new messages to the same logical agent. Downstream effect: the agent reports "idle" or "no record of that message" when it demonstrably handled the work in a previous turn, including emitting verifiable side-effects (git commits, send_message calls, file writes).

We observed this today across a multi-bot Discord coordination run with 9 Hermes agents (Avengers personas) running as Windows services on a single host. The affected agent (Loki, profile loki) reported zero memory of:

  • A [RESULT] reply from another bot (Natasha) that the gateway log confirms was delivered
  • A sign-off message from a third bot (Tony) the gateway log confirms was delivered
  • Git commits made by agent:main under his own session ID

The agent self-diagnosed it as "another parallel session of myself," which is the correct intuition but the wrong root cause. The real root cause is in the compression flow.

Two distinct anomalies in the log evidence:

Anomaly A — Session ID mutation across a single compression event

16:15:40 INFO [20260528_093510_66bfaec7] compression started: session=20260528_093510_66bfaec7  messages=194 tokens=~145,948
16:16:35 INFO [20260528_093510_66bfaec7] compression done:    session=20260528_161635_bb86b1   messages=194->26 tokens=~41,442

Compression started and done log lines reference different session IDs for what is presented as a single operation. The "done" payload writes the compressed state to a new session ID (20260528_161635_bb86b1), but the agent's in-process bookkeeping and the bracketed logger context still refer to the original (20260528_093510_66bfaec7). Whatever the gateway's routing layer is keyed on can desynchronize from whatever the agent's working memory is keyed on.

Anomaly B — Cross-session compression (a different session compresses on a different session's behalf)

16:12:31 INFO [20260528_155201_68dc5996] compression started: session=20260528_155201_68dc5996  messages=156
16:13:18 INFO [20260528_155201_68dc5996] compression done:    session=20260528_155201_68dc5996  messages=156->8

16:14:01 INFO [20260528_093510_66bfaec7] compression started: session=20260528_155201_68dc5996  messages=39
16:14:29 INFO [20260528_093510_66bfaec7] compression done:    session=20260528_155201_68dc5996  messages=39->13

Two minutes after 20260528_155201_68dc5996 finished its own compression, a different session ID (20260528_093510_66bfaec7) in the bracket logger context kicks off another compression of 20260528_155201_68dc5996. That second compression operates on only 39 messages even though the just-completed prior one reduced the session to 8 — implying either (a) the two compressions are racing on overlapping conversation slices, or (b) two distinct sessions are being aliased to the same underlying conversation store.

Both anomalies are consistent with the user-visible symptom: the agent's working memory is bound to one session identity, the gateway's delivery layer is bound to another, and they drift apart on compression boundaries.

Why it matters

For single-agent CLI usage this is mostly a quality-of-life paper cut. For long-running multi-agent coordination it's a stability bug:

  • Agents emit confident "idle" / "no record" reports when they have unprocessed inbound traffic
  • Inter-agent acknowledgment protocols break — bot A waits for bot B's reply that bot B already sent
  • We observed a downstream dot-loop (..) between two bots today that required a manual NSSM gateway service bounce to terminate; the loop was triggered by exactly this amnesia pattern
  • Agents writing their own self-improvement memory entries based on hallucinated "I have no memory of X" findings, polluting persistent memory with false patterns

Goal / Desired Outcome

A compression flow that is session-stable and observably correct:

  1. A compression operation MUST NOT change the session ID it operates on. If a "new compressed session" is being forked as a separate entity, that should be an explicit fork operation with its own log line, not a side-effect of compression.
  2. After compression completes, the next gateway-delivered message MUST land in the agent's compressed context window. The agent's "next turn" view must include the user message that triggered the compression and any messages buffered during compression.
  3. Compression operations must not race or alias across sessions. One session = one in-flight compression at a time.
  4. Log lines should make it possible to audit (1)–(3) without a forensic exercise.

Scope

In Scope

  • The agent.conversation_compression module and its callers
  • The gateway → agent message handoff path that runs immediately after compression returns
  • Session ID assignment and persistence in the conversation store (whatever sqlite/jsonl path that lives in)
  • Logging discipline in compression start/done lines (session ID consistency, bracket logger context)
  • A repro test for the multi-agent case where N bots share a host and compress concurrently
  • A diagnostic / migration for any sessions already corrupted by anomaly B

Out of Scope

  • Compression quality (what gets dropped, summarization fidelity) — separate concern
  • Token budget tuning / when to trigger compression
  • The downstream skills like agent-loop-termination that mitigate the user-visible symptom — those stay
  • Discord-specific gateway code; the bug is in the agent core
  • Adding new compression strategies

Reproduction Steps

We have not isolated a minimal repro yet — the run that surfaced this was a 9-agent live coordination over ~6 hours, two of which involved heavy back-and-forth in a single channel. Best path to a deterministic repro is probably:

  1. Start two Hermes agents on the same host with conversation_compression enabled and a low token threshold (e.g. 8k cap to force frequent compression)
  2. Drive both agents to ~145k token sessions via a scripted prompt-injection loop
  3. Trigger compression on agent A while agent B has an in-flight delivery to A's channel
  4. Inspect agent A's agent.log for the started session=X done session=Y mismatch (anomaly A)
  5. Send a follow-up message to A and check whether A's next-turn context includes the just-sent message

I will attach the raw log lines from today's incident in a comment so the team has actual evidence; the relevant profile is loki on a Windows 11 host running Hermes v0.14.0 (2026.5.16), claude-opus-4.7 provider, 9 sibling agent profiles co-resident.

Technical Design Notes

  • The bracket-prefix in log lines ([20260528_093510_66bfaec7]) appears to be a per-task logger context that's plumbed via contextvars or similar. Whatever populates it for a compression task needs to stay pinned to the input session ID for the duration of the operation — including across the async boundary that the LLM compression call introduces.
  • Anomaly B's "different session compresses on another session's behalf" is the more suspicious one — that suggests either a singleton compression worker that's reading from the wrong session, or an asyncio.create_task somewhere that's not capturing the current session context.
  • The "new session ID" in anomaly A's done line (20260528_161635_bb86b1, dated 161635 = 16:16:35, the exact compression completion time) looks like the compressor is creating a fresh session row for the compressed output rather than overwriting the original. That's a defensible design (immutable compression history) but the gateway routing needs to follow the redirect, and right now it doesn't appear to.
  • Recommended fix shape: compression returns a (session_id, compressed_messages) tuple where session_id is always the input session ID. If immutable history is desired, store the pre-compression snapshot under a derived ID (e.g. <original>.snapshot.<timestamp>), don't promote the snapshot ID to the live session.

Architecture Decisions

Decision

Compression preserves the session ID; immutable history snapshots get derived names.

Reason

The session ID is the routing key the gateway uses to find the agent. Promoting a snapshot ID to live breaks every external consumer keyed on the original ID. Reverse — keep live ID stable, derive snapshot IDs — keeps the routing contract intact while still allowing audit.


Decision

Serialize compressions per session.

Reason

Anomaly B implies concurrent compressions on the same session can drift the conversation store. A per-session compression mutex (in-process is sufficient; agents are single-process today) eliminates that race entirely.


Decision

Bracket-prefix logger context for compression tasks must remain pinned to the input session ID.

Reason

Auditability. If the prefix can drift mid-operation, log evidence becomes ambiguous for exactly the class of bug we're trying to diagnose.

Acceptance Criteria

Scenario: Compression preserves session ID

Given a session with ID S containing 200 messages above the token threshold When compression runs to completion Then the compression done log line should reference session S (not a new ID) And the gateway should continue to route new messages to session S And the agent's next-turn context should include the compressed history for S


Scenario: No cross-session compression aliasing

Given two distinct sessions A and B When session A triggers compression Then no log line for that operation should reference session B as the operand And no concurrent compression should be started against A until the first completes


Scenario: Post-compression message delivery

Given session S is mid-compression When a new user message arrives for session S Then the message should be buffered until compression completes And the agent's first turn after compression should see the buffered message in its context window And the agent should not report "no record of message X" when X is the buffered message


Scenario: Multi-agent host stability

Given N=9 Hermes agents running concurrently on one host with compression enabled When each agent independently crosses the compression threshold within a 60-second window Then no agent's session ID should mutate And no agent should lose visibility into messages delivered to it during compression And no inter-agent acknowledgment loop should result from amnesia

Non-Functional Requirements

  • Compression latency must not regress (today's runs show ~30–55s for 145k-token sessions on opus-4.7; that's already painful)
  • No additional disk I/O per compression beyond what immutable-snapshot storage requires
  • Fix must work on the existing sqlite-backed conversation store; no schema migration for live data unless absolutely necessary
  • Multi-agent host with 10+ concurrent agents must remain stable (we're running 9 today; planning 13+ within a sprint)

Test Plan

Manual Tests

  • Replay the 9-agent coordination scenario in a staging profile, force-trigger compression mid-conversation, verify no amnesia
  • Verify NSSM service bounce is no longer required to break compression-induced loops

Automated Tests

  • Unit: agent.conversation_compression returns a result keyed on the input session ID under all branches
  • Unit: log lines for compression started and done reference the same session ID
  • Integration: simulated 2-agent loop where A is force-compressed while B is sending — assert A's next turn sees B's message
  • Integration: 5-session concurrent compression — assert no session-ID aliasing across operations
  • Regression: existing hermes-compression-eval suite still passes with no fidelity drop

Definition of Done

  • Anomalies A and B both reproducible in a unit/integration test before the fix
  • Anomalies A and B both no longer reproducible after the fix
  • New tests added to the suite
  • A short ADR or compression-flow doc capturing the session-stability contract
  • Loki and the other 8 agents on our host stop reporting phantom-idle and amnesia symptoms after the upgrade
  • Confirmation that the hermes-compression-eval repo's offline harness either passes unchanged or has been updated to cover the new contract

Explicit Constraints

  • Do not change the on-disk conversation store schema unless mandatory; we have ~6 months of accumulated session history per agent profile
  • Do not introduce a new "session forking" public concept; if forking is required internally, keep it private and invisible to the gateway / tool layer
  • Do not silently truncate buffered messages on compression — buffer everything, deliver in order
  • Do not weaken the compression itself to fix this; this is a session-identity bug, not a compression-quality bug

Anti-patterns to avoid

  • Two compression workers racing on the same session
  • Promoting a "compressed snapshot" ID to live and assuming external consumers will follow
  • Logger context that mutates mid-operation
  • Solving the routing side without solving the agent-memory side (or vice versa) — both need to agree on the session ID

Relevant Files (best guesses; please correct)

  • agent/conversation_compression.py (or wherever the module logged as agent.conversation_compression lives)
  • The gateway runner that dispatches messages to agent sessions (likely under agent/ or tui_gateway/)
  • The conversation store layer (sqlite path, probably agent/state/ or similar)
  • tests/ — wherever compression unit tests live today, plus the multi-agent integration test directory if one exists

Evidence

Raw log evidence (Loki profile, Windows 11, Hermes v0.14.0):

2026-05-28 16:12:31 INFO [20260528_155201_68dc5996] agent.conversation_compression: context compression started: session=20260528_155201_68dc5996 messages=156 tokens=~145,003 model=claude-opus-4.7
2026-05-28 16:13:18 INFO [20260528_155201_68dc5996] agent.conversation_compression: context compression done:    session=20260528_155201_68dc5996 messages=156->8 tokens=~29,548

2026-05-28 16:14:01 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression started: session=20260528_155201_68dc5996 messages=39 tokens=~143,661 model=claude-opus-4.7
2026-05-28 16:14:29 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression done:    session=20260528_155201_68dc5996 messages=39->13 tokens=~9,501

2026-05-28 16:15:40 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression started: session=20260528_093510_66bfaec7 messages=194 tokens=~145,948 model=claude-opus-4.7
2026-05-28 16:16:35 INFO [20260528_093510_66bfaec7] agent.conversation_compression: context compression done:    session=20260528_161635_bb86b1 messages=194->26 tokens=~41,442

Notable: 19 minutes elapsed across these compressions, three different session IDs appear, and one of them (20260528_161635_bb86b1) only appears in a done line — never in a started line, and never as the bracket prefix for any prior operation.

Happy to attach the full agent.log (2.1MB) and state.db (9.4MB) snapshots if useful, with PII / API keys scrubbed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING