openclaw - 💡(How to fix) Fix Bug: Compaction causes Pi runtime deadlock — agent freezes across all channels after summary generation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Compaction causes Pi runtime deadlock after summary generation, freezing ALL channels of the affected agent. Gateway remains healthy (no crash, no error log), but the agent stops responding to any channel until session is rebuilt.

OpenClaw 2026.5.18 (50a2481) · macOS 15.6 · Node 23.11.0 · Apple Silicon

Error Message

Compaction causes Pi runtime deadlock after summary generation, freezing ALL channels of the affected agent. Gateway remains healthy (no crash, no error log), but the agent stops responding to any channel until session is rebuilt.

Root Cause

Compaction causes Pi runtime deadlock after summary generation, freezing ALL channels of the affected agent. Gateway remains healthy (no crash, no error log), but the agent stops responding to any channel until session is rebuilt.

OpenClaw 2026.5.18 (50a2481) · macOS 15.6 · Node 23.11.0 · Apple Silicon

Fix Action

Fix / Workaround

Workaround Applied

Code Example

{
  "type": "compaction",
  "timestamp": "2026-05-21T03:00:56.816Z",
  "summary": "## Goal\n- Investigate...",
  "tokensBefore": 182767,
  "fromHook": false
}

---

{
  "agents": {
    "defaults": {
      "compaction": {
        "reserveTokensFloor": 80000,
        "midTurnPrecheck": { "enabled": true }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Compaction causes Pi runtime deadlock after summary generation, freezing ALL channels of the affected agent. Gateway remains healthy (no crash, no error log), but the agent stops responding to any channel until session is rebuilt.

OpenClaw 2026.5.18 (50a2481) · macOS 15.6 · Node 23.11.0 · Apple Silicon

Reproduction Pattern (3 days, 4 occurrences)

  1. Agent session accumulates tokens to ~182k/262k (~70%) — compaction threshold determined by reserveTokensFloor: 80000
  2. Auto-compaction triggers (or manual /compact)
  3. Compaction summary is generated successfully (verified in transcript)
  4. .reset backup created
  5. Post-compaction: no new messages written to transcript — session goes silent
  6. All channels of the same agent freeze (confirmed: Feishu + WeCom both unresponsive)
  7. Other agents on same gateway unaffected
  8. /compact returns "skipped: session was already compacted recently"
  9. Gateway restart does NOT recover — agent still unresponsive
  10. Only /new (fresh session) restores function

Timeline (Latest Incident)

All times UTC+8 (Beijing):

TimeEvent
10:54Gateway auto-restarted by launchd (kickstart)
10:58User message processed normally
11:00Auto-compaction triggered (182,767 tokens)
11:00Compaction summary generated (comprehensive, well-structured)
11:00.reset backup created (1.5MB, 570 lines)
11:00+No new messages in transcript — deadlock
11:05Session marked as reset
11:08User rebuilt session → new session works

Evidence

Compaction entry in transcript (last entry before deadlock)

{
  "type": "compaction",
  "timestamp": "2026-05-21T03:00:56.816Z",
  "summary": "## Goal\n- Investigate...",
  "tokensBefore": 182767,
  "fromHook": false
}

Summary was well-formed with Goal, Progress, Next Steps, read-files, modified-files — quality is fine.

File state after deadlock

Session directory contains:

  • xxx.jsonl.reset.<timestamp> — backup created at reset (1.5MB)
  • xxx.checkpoint.<uuid>.jsonl — pre-compaction checkpoint (611KB)
  • xxx.trajectory.jsonl — full trajectory (10MB)
  • xxx.trajectory-path.json — pointer

Missing: No compacted .json successor file was ever created.

Multi-channel confirmation

When Feishu froze, WeCom channel of the same agent also stopped responding within minutes. A different agent on the same gateway continued working normally, confirming the deadlock is agent-scoped, not gateway-scoped.

Gateway health

  • gateway.err.log: zero errors for the incident day
  • gateway.log: stopped writing on May 19 (2 days before incident) — log rotation or logging bug
  • Gateway process: healthy, no crash
  • Other agents: fully functional

Configuration Context

{
  "agents": {
    "defaults": {
      "compaction": {
        "reserveTokensFloor": 80000,
        "midTurnPrecheck": { "enabled": true }
      }
    }
  }
}

Key: reserveTokensFloor: 80000 on a 262k context window → compaction triggers at ~70% (182k tokens), much earlier than default (24k reserve → ~91% trigger).

truncateAfterCompaction was not set (default false) — in-place rewrite mode. notifyUser was not set (default false).

Hypothesis

Compaction summary generation succeeds, but the subsequent transcript write/rotation step fails silently. Since truncateAfterCompaction is false, OpenClaw uses in-place transcript rewrite. The failure leaves the Pi runtime's event loop in an inconsistent state — an async file operation doesn't resolve, blocking the entire agent's message processing queue. This would explain:

  1. Agent-level deadlock (Pi runtime blocked, not gateway)
  2. No gateway errors (the event loop is stuck, not crashed)
  3. Gateway restart doesn't help (the broken session state persists on disk)
  4. /new fixes it (creates fresh Pi runtime + fresh transcript)

The reserveTokensFloor: 80000 (causing frequent early compactions) and gateway restart shortly before compaction may be contributing factors — restart may leave session state slightly inconsistent when the next auto-compaction fires.

Workaround Applied

  • reserveTokensFloor: 80000 → 24000 (default)
  • truncateAfterCompaction: false → true
  • notifyUser: false → true

Related

  • Model: deepseek-v4-pro (262k context)
  • Previous occurrence: same pattern observed on May 19 and May 20
  • Session reset files preserved for debugging if needed

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING