openclaw - ✅(Solved) Fix [Bug]: Session compaction blocks InteractionEventListener, causing silent message drops [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73204Fetched 2026-04-29 06:22:12
View on GitHub
Comments
2
Participants
2
Timeline
4
Reactions
0
Timeline (top)
commented ×2closed ×1cross-referenced ×1

Error Message

When OpenClaw session compaction triggers on a long-running session, the InteractionEventListener on the critical lane times out with a 120-second listener-timeout error. Incoming Discord messages are silently dropped without any user-facing error or retry. 2026-04-28T02:26:37.733Z warn agent/embedded incomplete turn detected: 2026-04-28T02:26:37.736Z warn model-fallback/decision candidate_failed: 2026-04-28T02:27:53.762Z warn agent/embedded 2026-04-28T02:27:53.763Z warn agent/embedded 2026-04-28T02:27:57.990Z ERROR event-queue

Root Cause

  • #68401 — "Feature: Allow message processing during session compaction" (feature request for background compaction)
  • #66646 — "Session file lock errors cascade through model fallback chain" (lock errors treated as model failures)
  • #64362 — "Main agent response lost after tool execution" (dead-pid lock cleanup causes lost responses)
  • #57248 — "InteractionEventListener stalls up to 78s on Discord" (closed — same symptom, different root cause: session-memory hook blocking)
  • #59218 — "Session-memory hook blocks Discord InteractionEventListener" (closed — same symptom, root cause was synchronous disk writes from cron hook)

Fix Action

Fixed

PR fix notes

PR #73313: fix #73204: release session write lock during LLM summarization to prevent message drops

Description (problem / solution / changelog)

Summary

Fixes #73204

Issue

Session compaction blocks InteractionEventListener, causing silent message drops.

Root Cause

The session write lock was held during the entire compaction process, including the LLM summarization call. This blocked InteractionEventListener and other session-level event processing, causing incoming messages to be silently dropped during the compaction window.

Solution

Release the session write lock before the LLM summarization call so incoming messages can be delivered during compaction. The lock is re-acquired after the call for post-compaction file I/O (manual boundary hardening, transcript rotation, checkpoint persistence).

Changes

  • src/agents/pi-embedded-runner/compact.ts: Release sessionLock before compactWithSafetyTimeout(), re-acquire after the call completes
  • Guard sessionLock.release() with a null check in the finally block

🤖 AI-assisted fix (Claude Code-generated, reviewed by maintainer).

Changed files

  • src/agents/pi-embedded-runner/compact.ts (modified, +24/-5)

Code Example

2026-04-28T02:26:37.733Z warn agent/embedded incomplete turn detected: 
  runId=96c9f2aa-26a0-47b3-a105-b36170dc862c 
  sessionId=517c9845-d14c-4750-8699-5af0fe93493b 
  stopReason=stop payloads=0

2026-04-28T02:26:37.736Z warn model-fallback/decision candidate_failed:
  requested=minimax-portal/MiniMax-M2.7 
  reason=format code=incomplete_result 
  errorPreview="minimax-portal/MiniMax-M2.7 ended with an incomplete terminal response"

2026-04-28T02:27:53.762Z warn agent/embedded 
  compaction retry aggregate timeout (60000ms): 
  proceeding with pre-compaction state 
  runId=96c9f2aa-26a0-47b3-a105-b36170dc862c 
  sessionId=517c9845-d14c-4750-8699-5af0fe93493b

2026-04-28T02:27:53.763Z warn agent/embedded 
  using pre-compaction snapshot: timed out during compaction

2026-04-28T02:27:57.990Z ERROR event-queue 
  type=listener-timeout 
  listener=InteractionEventListener 
  eventType=INTERACTION_CREATE 
  lane=critical 
  timeoutMs=120000
RAW_BUFFERClick to expand / collapse

Bug Description

When OpenClaw session compaction triggers on a long-running session, the InteractionEventListener on the critical lane times out with a 120-second listener-timeout error. Incoming Discord messages are silently dropped without any user-facing error or retry.

This is a regression from the expectation that the gateway should NACK/retry messages that cannot be appended to a locked session, rather than letting them queue until the listener timeout silently kills the lane.

Reproduction

  1. Start a session that accumulates enough context to trigger compaction (e.g., multiple model fallback attempts)
  2. Wait for compaction to begin (agent/embedded subsystem: compaction retry aggregate timeout)
  3. Send a Discord message during the compaction window
  4. Message is received by Discord gateway, queued, but never appended to the session
  5. InteractionEventListener hits 120s timeout; message is dropped with no user notification

Exact log evidence (2026-04-28 02:26–02:28 UTC)

2026-04-28T02:26:37.733Z warn agent/embedded incomplete turn detected: 
  runId=96c9f2aa-26a0-47b3-a105-b36170dc862c 
  sessionId=517c9845-d14c-4750-8699-5af0fe93493b 
  stopReason=stop payloads=0

2026-04-28T02:26:37.736Z warn model-fallback/decision candidate_failed:
  requested=minimax-portal/MiniMax-M2.7 
  reason=format code=incomplete_result 
  errorPreview="minimax-portal/MiniMax-M2.7 ended with an incomplete terminal response"

2026-04-28T02:27:53.762Z warn agent/embedded 
  compaction retry aggregate timeout (60000ms): 
  proceeding with pre-compaction state 
  runId=96c9f2aa-26a0-47b3-a105-b36170dc862c 
  sessionId=517c9845-d14c-4750-8699-5af0fe93493b

2026-04-28T02:27:53.763Z warn agent/embedded 
  using pre-compaction snapshot: timed out during compaction

2026-04-28T02:27:57.990Z ERROR event-queue 
  type=listener-timeout 
  listener=InteractionEventListener 
  eventType=INTERACTION_CREATE 
  lane=critical 
  timeoutMs=120000

Impact: User sent a message at 02:27 UTC. It was never processed — the session started fresh at 02:28 UTC with seq=1, meaning the message was entirely lost.

Expected Behavior

When the session file is locked for compaction, the gateway should:

  1. Fail fast with a "session busy" response to Discord (causing Discord to retry via exponential backoff), OR
  2. Buffer messages to a persistent queue that survives the compaction and is drained afterward, OR
  3. Use non-blocking compaction (copy-on-write or temp-file swap) so the session lock is never held for more than milliseconds

Actual Behavior

Messages queue in memory, the listener timeout fires at 120s, and the message is silently dropped. The user receives no feedback and must manually resend.

Environment

  • OpenClaw: 2026.4.25 (aa36ee6)
  • OS: Ubuntu 24.04 (Linux 6.8.0-110-generic x64)
  • Node: v24.14.1
  • Model: minimax-portal/MiniMax-M2.7 (primary) → MiniMax-M2.1 (fallback) → openrouter/minimax/minimax-m2.5:free (tertiary)
  • Channel: Discord
  • Session: native OpenClaw session compaction (not LCM)

Related Issues

  • #68401 — "Feature: Allow message processing during session compaction" (feature request for background compaction)
  • #66646 — "Session file lock errors cascade through model fallback chain" (lock errors treated as model failures)
  • #64362 — "Main agent response lost after tool execution" (dead-pid lock cleanup causes lost responses)
  • #57248 — "InteractionEventListener stalls up to 78s on Discord" (closed — same symptom, different root cause: session-memory hook blocking)
  • #59218 — "Session-memory hook blocks Discord InteractionEventListener" (closed — same symptom, root cause was synchronous disk writes from cron hook)

Severity

HIGH — Messages are silently lost with no user feedback. The user has no way to know their message was dropped unless they happen to resend and compare timestamps.

Suggested Fix

The most strategic fix is non-blocking compaction: instead of holding a WRITE lock on the session JSONL file while compacting, the compaction process should:

  1. Write the compacted state to a .tmp file
  2. Perform an atomic rename() to swap the old file with the compacted one
  3. Never hold the session lock for more than a few milliseconds

This eliminates the root cause entirely without requiring complex queue infrastructure.

Hypothesis (to be verified)

The session lock is held during the entire compaction I/O + LLM summarization cycle (60+ seconds). The InteractionEventListener waits for the session to accept the new message but the session file is locked. The 120s listener timeout then fires and kills the event without retrying.

extent analysis

TL;DR

Implement non-blocking compaction by writing the compacted state to a temporary file and then performing an atomic rename to swap the old file with the compacted one.

Guidance

  • Verify that the session lock is indeed held during the entire compaction cycle by analyzing the log evidence and the compaction process.
  • Consider implementing a persistent queue to buffer messages during compaction as an alternative solution.
  • Review the InteractionEventListener code to ensure it is properly handling timeouts and retries.
  • Test the non-blocking compaction approach to ensure it resolves the issue without introducing new problems.

Example

No code snippet is provided as the issue does not contain sufficient code details.

Notes

The suggested fix assumes that the non-blocking compaction approach is feasible and does not introduce significant performance overhead. Additional testing and verification are necessary to confirm the effectiveness of this solution.

Recommendation

Apply the non-blocking compaction workaround to eliminate the root cause of the issue, as it is a more strategic and efficient solution compared to implementing a persistent queue or modifying the InteractionEventListener code.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Session compaction blocks InteractionEventListener, causing silent message drops [1 pull requests, 2 comments, 2 participants]