openclaw - 💡(How to fix) Fix Hard-recovery option needed — reason=restart reuses the resume token and is a no-op when the session is in a stuck state

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When OpenClaw's cli-backend issues claude live session close: reason=restart, the subsequent claude live session start resumes the SAME claude-cli session ID (useResume=true session=present resumeSession=<same-token>). If the on-disk .jsonl transcript is in a corrupted/stuck state, the resumed subprocess immediately reproduces the same broken output. Only a full session reset (which abandons the resume token via useResume=false) recovers.

Root Cause

When OpenClaw's cli-backend issues claude live session close: reason=restart, the subsequent claude live session start resumes the SAME claude-cli session ID (useResume=true session=present resumeSession=<same-token>). If the on-disk .jsonl transcript is in a corrupted/stuck state, the resumed subprocess immediately reproduces the same broken output. Only a full session reset (which abandons the resume token via useResume=false) recovers.

Fix Action

Fix / Workaround

Proposed design

  • New websocket action sessions.hardReset alongside existing sessions.reset. Existing reset behavior unchanged (opt-in).
  • Behavior: close the live session AND mark the OpenClaw session entry with a single-use discardResumeToken: true flag.
  • On the next cli exec, the dispatch layer sees the flag, forces useResume=false, and clears the flag after one use.
  • Chat history preservation: unchanged. chat.history is a separate on-disk index from claude-cli's resume token; the UI continues to display prior turns.
  • UI surface: new "Hard reset" button in openclaw-control-ui next to existing "Reset". Both are deliberate actions.
RAW_BUFFERClick to expand / collapse

Summary

When OpenClaw's cli-backend issues claude live session close: reason=restart, the subsequent claude live session start resumes the SAME claude-cli session ID (useResume=true session=present resumeSession=<same-token>). If the on-disk .jsonl transcript is in a corrupted/stuck state, the resumed subprocess immediately reproduces the same broken output. Only a full session reset (which abandons the resume token via useResume=false) recovers.

Case study

2026-05-12 incident: main session emitted a hardcoded synthetic literal ("The PDF file was not valid. Try converting it to text first (e.g., pdftotext).") for 11 consecutive turns over ~12 minutes with zero API calls (all 11 assistant messages had model: "<synthetic>", input_tokens: 0, output_tokens: 0, byte-identical content sha256 02df1d51bd7e9f1c3b44b1e0e431b8704bb80a8f8fc10e286894a61a51042429).

Relevant gateway.log timeline:

  • 11:21:44 stuck-loop begins (synthetic emission, no API call)
  • 11:24:17 onwards: 9 user retries, all return identical canned response
  • 11:28:35 claude live session close: reason=restart + start activeSessions=1 (auto-recovery attempt)
  • 11:28:36 onwards: stuck state PERSISTS through the restarted subprocess (same resume token)
  • 11:29:37 / 11:30:59 webchat UI reconnects — no effect
  • 11:34:15 full sessions.reset from openclaw-control-ui — only path that recovered

Proposed design

  • New websocket action sessions.hardReset alongside existing sessions.reset. Existing reset behavior unchanged (opt-in).
  • Behavior: close the live session AND mark the OpenClaw session entry with a single-use discardResumeToken: true flag.
  • On the next cli exec, the dispatch layer sees the flag, forces useResume=false, and clears the flag after one use.
  • Chat history preservation: unchanged. chat.history is a separate on-disk index from claude-cli's resume token; the UI continues to display prior turns.
  • UI surface: new "Hard reset" button in openclaw-control-ui next to existing "Reset". Both are deliberate actions.

Full incident postmortem

Available on request. Contains the full four-bug-stack analysis including upstream Claude Code Read tool and claude-cli failure modes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING