openclaw - ✅(Solved) Fix [Bug]: Context overflow / compaction can orphan agent:main:main and silently rotate WebChat to a new session [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70472Fetched 2026-04-24 05:57:39
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
cross-referenced ×2commented ×1

Long-running WebChat main sessions can silently lose the live agent:main:main mapping after context overflow / compaction-path failures, causing the next visible user message to start a new session id even though the prior transcript still exists on disk.

From the user's perspective, this looks like the session "disappeared" or got wiped:

  • the visible main WebChat chat appears to restart
  • prior checkpoints/history are no longer attached to the active session row
  • the agent behaves like it lost context
  • old transcripts are still present on disk, but the active session store now points agent:main:main at a new sessionId

This seems related to #70330, but the trigger here is different:

  • #70330 is about restart/reconnect / reset-like rotation
  • this report is about rotation after context overflow / compaction handling, with no gateway restart required

Root Cause

The log line below is especially suspicious because it shows the compaction/checkpoint path could not find the active session entry it expected:

Fix Action

Fixed

PR fix notes

PR #70473: fix(agents): derive overflow budgets from provider errors

Description (problem / solution / changelog)

Summary

  • broaden observed overflow token extraction across provider-shaped error formats used during overflow recovery
  • pass a minimally over-budget currentTokenCount into overflow compaction when overflow is confirmed but the provider message does not expose a parseable count
  • add regression coverage and compaction docs for the shared overflow-budget path

Testing

  • pnpm test src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts
  • pnpm test src/agents/pi-embedded-runner/run.overflow-compaction.test.ts
  • pnpm build

Notes

  • Related to #70472, but does not fully resolve the later session-rotation/orphaning behavior after failed compaction.
  • Live validation on opencode-go/minimax-m2.5 now logs observed overflow counts for MiniMax-style errors instead of unknown; the remaining compaction behavior still needs separate follow-up.

Changed files

  • docs/reference/session-management-compaction.md (modified, +4/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +15/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +21/-1)
  • src/agents/pi-embedded-runner/run.overflow-compaction.test.ts (modified, +34/-0)
  • src/agents/pi-embedded-runner/run.ts (modified, +12/-5)

PR #70479: fix(auto-reply): preserve sessions after compaction failures

Description (problem / solution / changelog)

Summary

  • preserve the active session mapping when overflow recovery or compaction failure cannot recover a turn
  • replace the silent session-rotation path with explicit user guidance to retry, /compact, or /new
  • add regression coverage for both embedded overflow payloads and thrown compaction-failure errors

Fixes #70472.

Testing

  • pnpm test src/auto-reply/reply/agent-runner-execution.test.ts
  • pnpm build
  • commit gate: pnpm check:changed --staged

Notes

  • This is intentionally separate from #70473. That PR improves overflow-budget extraction; this PR fixes the later auto-reply session remap/orphaning behavior after compaction recovery still fails.

Changed files

  • docs/reference/session-management-compaction.md (modified, +3/-0)
  • src/auto-reply/reply/agent-runner-execution.test.ts (modified, +108/-37)
  • src/auto-reply/reply/agent-runner-execution.ts (modified, +16/-18)
  • src/auto-reply/reply/agent-runner.ts (modified, +0/-7)

Code Example

skipping compaction checkpoint persist: session not found

---

2026-04-22 22:19:21 EDT  [context-overflow-diag] sessionKey=agent:main:main ... sessionFile=/Users/.../b8a5c854-4632-4954-ae09-fd507ada1e8a.jsonl
2026-04-22 22:21:14 EDT  skipping compaction checkpoint persist: session not found  { sessionKey: agent:main:main }
2026-04-22 22:30:51 EDT  [context-overflow-diag] sessionKey=agent:main:main ... sessionFile=/Users/.../df065d9f-9445-4401-a94b-61042c7eff40.jsonl
2026-04-22 23:17:42 EDT  [context-overflow-diag] sessionKey=agent:main:main ... sessionFile=/Users/.../df065d9f-9445-4401-a94b-61042c7eff40.jsonl

---

old transcript still present:
  ~/.openclaw/agents/main/sessions/df065d9f-9445-4401-a94b-61042c7eff40.jsonl

new active transcript:
  ~/.openclaw/agents/main/sessions/f36b8f4d-bdca-41a2-b13c-a6ec3016218a.jsonl

---

skipping compaction checkpoint persist: session not found
RAW_BUFFERClick to expand / collapse

Summary

Long-running WebChat main sessions can silently lose the live agent:main:main mapping after context overflow / compaction-path failures, causing the next visible user message to start a new session id even though the prior transcript still exists on disk.

From the user's perspective, this looks like the session "disappeared" or got wiped:

  • the visible main WebChat chat appears to restart
  • prior checkpoints/history are no longer attached to the active session row
  • the agent behaves like it lost context
  • old transcripts are still present on disk, but the active session store now points agent:main:main at a new sessionId

This seems related to #70330, but the trigger here is different:

  • #70330 is about restart/reconnect / reset-like rotation
  • this report is about rotation after context overflow / compaction handling, with no gateway restart required

Environment

  • OpenClaw CLI: 2026.4.21 (f788c88)
  • Channel/surface: WebChat direct
  • Session key: agent:main:main
  • Model: openai-codex/gpt-5.4
  • Host OS: Darwin 25.4.0 arm64

What happened

On Wednesday, April 22, 2026 (America/New_York), the main WebChat session hit context-overflow conditions multiple times during heavy tool use.

Relevant observed timeline:

  • 10:19:21 PM EDT — log recorded a context overflow for agent:main:main tied to session file b8a5c854-4632-4954-ae09-fd507ada1e8a.jsonl
  • 10:21:14 PM EDT — log recorded skipping compaction checkpoint persist: session not found for agent:main:main
  • 10:30:51 PM EDT — another context overflow for agent:main:main, now tied to df065d9f-9445-4401-a94b-61042c7eff40.jsonl
  • 11:17:42 PM EDT — another context overflow for that same later session file
  • after that, later visible user messages landed in a fresh active session mapping with session id f36b8f4d-bdca-41a2-b13c-a6ec3016218a

Important detail: the earlier transcripts were still on disk. They were not actually deleted from the transcript folder. What changed was the active sessions.json entry for agent:main:main, which ended up pointing at a new session id.

Why this looks like a bug

The sequence suggests the session store entry for the active main session becomes missing/unavailable during overflow/compaction handling:

  1. main session grows very large
  2. provider/tool loop hits context overflow
  3. compaction checkpoint code tries to persist and logs session not found
  4. active agent:main:main mapping is no longer the old session
  5. next user message creates or uses a new session id
  6. user experiences this as spontaneous session loss

The log line below is especially suspicious because it shows the compaction/checkpoint path could not find the active session entry it expected:

skipping compaction checkpoint persist: session not found

Sanitized evidence

Observed log lines from /tmp/openclaw/openclaw-2026-04-22.log:

2026-04-22 22:19:21 EDT  [context-overflow-diag] sessionKey=agent:main:main ... sessionFile=/Users/.../b8a5c854-4632-4954-ae09-fd507ada1e8a.jsonl
2026-04-22 22:21:14 EDT  skipping compaction checkpoint persist: session not found  { sessionKey: agent:main:main }
2026-04-22 22:30:51 EDT  [context-overflow-diag] sessionKey=agent:main:main ... sessionFile=/Users/.../df065d9f-9445-4401-a94b-61042c7eff40.jsonl
2026-04-22 23:17:42 EDT  [context-overflow-diag] sessionKey=agent:main:main ... sessionFile=/Users/.../df065d9f-9445-4401-a94b-61042c7eff40.jsonl

Session files observed afterward:

old transcript still present:
  ~/.openclaw/agents/main/sessions/df065d9f-9445-4401-a94b-61042c7eff40.jsonl

new active transcript:
  ~/.openclaw/agents/main/sessions/f36b8f4d-bdca-41a2-b13c-a6ec3016218a.jsonl

Current sessions.json afterward pointed agent:main:main at the new session id instead of the prior active transcript.

Expected behavior

When a main WebChat session overflows context or compaction/checkpoint persistence has trouble:

  • OpenClaw should not silently orphan the active agent:main:main mapping
  • the existing session should remain the active logical session unless the user explicitly resets it
  • checkpoint persistence failure should not cause a hidden session rotation
  • if recovery/new-session behavior does happen, it should be explicit and auditable in the UI and store

Actual behavior

  • overflow happened
  • compaction/checkpoint path logged session not found
  • active WebChat main mapping later pointed to a different session id
  • user-visible effect was "we lost the session again"
  • prior transcript remained on disk, but continuity in the active UI/session mapping was broken

Possible root-cause area

This log pair looks like the key clue:

  • context overflow in the embedded runner
  • compaction checkpoint persistence cannot find the current session entry

The checkpoint persistence code already logs this exact case:

skipping compaction checkpoint persist: session not found

So one plausible failure mode is:

  • overflow/compaction or related recovery mutates/removes the active session-store entry unexpectedly
  • checkpoint persistence races or arrives after the store no longer contains the expected canonical key
  • later inbound WebChat traffic reinitializes agent:main:main onto a new session id

Suggested fixes

  1. Treat agent:main:main disappearance during overflow/compaction as a high-severity invariant violation and log the old/new session ids plus store path.
  2. Prevent checkpoint persistence failure from leaving the active main session unmapped.
  3. Add an explicit recovery path that preserves the active session id unless the user intentionally resets.
  4. If a fallback/new session must be created, emit a visible/auditable session-rotation event in the transcript/store/UI.
  5. Add regression coverage for:
    • large WebChat direct session
    • context overflow during tool-heavy turn
    • post-overflow compaction/checkpoint handling
    • subsequent user message should continue same active session id

Severity / impact

This is risky for long-running operational sessions because the user can believe they are continuing the same stateful conversation when the agent has actually been remapped onto a fresh session.

That is especially dangerous for write-capable local-admin workflows because the agent may continue from incomplete or reconstructed context while the user thinks continuity was preserved.

extent analysis

TL;DR

The most likely fix involves modifying the checkpoint persistence code to handle session not found errors and preventing the active main session from becoming unmapped during overflow/compaction handling.

Guidance

  • Investigate the checkpoint persistence code to identify why it logs "session not found" and determine if it's related to a race condition or unexpected session store mutation.
  • Consider adding logging to track the old and new session ids when the active agent:main:main mapping changes, to better understand the failure mode.
  • Review the recovery path for overflow/compaction handling to ensure it preserves the active session id unless the user intentionally resets.
  • Evaluate the need for an explicit session-rotation event in the transcript/store/UI when a fallback/new session is created.

Example

No code snippet is provided due to the complexity of the issue and the need for a thorough investigation of the checkpoint persistence code.

Notes

The provided information suggests a potential issue with the checkpoint persistence code, but further investigation is required to determine the root cause. The suggested fixes provide a starting point for addressing the problem.

Recommendation

Apply a workaround by treating agent:main:main disappearance during overflow/compaction as a high-severity invariant violation and logging the old/new session ids plus store path, while also preventing checkpoint persistence failure from leaving the active main session unmapped. This will help to identify and mitigate the issue until a permanent fix can be implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a main WebChat session overflows context or compaction/checkpoint persistence has trouble:

  • OpenClaw should not silently orphan the active agent:main:main mapping
  • the existing session should remain the active logical session unless the user explicitly resets it
  • checkpoint persistence failure should not cause a hidden session rotation
  • if recovery/new-session behavior does happen, it should be explicit and auditable in the UI and store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Context overflow / compaction can orphan agent:main:main and silently rotate WebChat to a new session [2 pull requests, 1 comments, 2 participants]