openclaw - 💡(How to fix) Fix prepare.runtime invalidates live claude-cli sessions as missing-transcript, causing mid-thread context loss [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78506Fetched 2026-05-07 03:36:06
View on GitHub
Comments
2
Participants
2
Timeline
3
Reactions
2
Author
Timeline (top)
commented ×2closed ×1

Root Cause

File: dist/prepare.runtime-DbLtKpjH.js:719 Helper: dist/attempt-execution.helpers-D6YqcgiA.js:16–69 Destructive branch: dist/attempt-execution-C4Upu-4a.js:247–256

The probe (claudeCliSessionTranscriptHasContent) scans ~/.claude/projects/<dir>/<sessionId>.jsonl for any record where obj.message.role === "assistant". It returns false if:

  • The file doesn't exist at all (binding desync — OpenClaw store points at a session ID that differs from the live claude-cli session).
  • The file exists but the session errored before its first assistant output (e.g., after a 400 response).
  • There's a brief window between flush completion and the probe firing.

In the desync case, the OpenClaw-store binding's sessionId does not match the session ID the live claude-cli process is actually using. The probe checks the wrong file, gets a miss, and hard-resets — even though the live session is perfectly healthy.

clearCliSessionInStore is then called before the new run starts, so there's no recovery path.

Code Example

06:32:00  cli exec: … trigger=user useResume=true session=present reuse=reusable
06:32:08  claude live session turn: durationMs=7873          ← turn 1 finishes OK
06:32:48  cli session reset: provider=claude-cli reason=missing-transcript   ← BUG
06:32:48  cli exec: … trigger=user useResume=false session=none reuse=invalidated:missing-transcript
06:32:48  claude live session close: reason=restart
06:32:48  claude live session start                           ← fresh session, no context
06:32:57  claude live session turn: durationMs=8867
RAW_BUFFERClick to expand / collapse

Bug

When OpenClaw starts a new turn, prepare.runtime probes for the bound CLI session's transcript file on disk. If the file is absent or contains no assistant-role records yet, the runtime calls clearCliSessionInStore and spawns a fresh claude-cli session — discarding all prior conversation context before the turn even starts.

This silently resets the AI's memory mid-thread, with no warning to the user.

Reproduction

The most reliable trigger:

  1. Cause any claude-cli turn to fail before it produces its first assistant message (e.g. hit a usage-limit 400, or kill the process mid-turn).
  2. Wait for the process to recover, then send a follow-up message.
  3. The runtime emits cli session reset: provider=claude-cli reason=missing-transcript and the next turn runs in a brand-new session with zero prior context.

It also fires spontaneously during normal use — the gateway log shows missing-transcript resets a dozen or more times per day, on both trigger=user and trigger=heartbeat turns. No failure required.

Log evidence (gateway.log excerpt)

06:32:00  cli exec: … trigger=user useResume=true session=present reuse=reusable
06:32:08  claude live session turn: durationMs=7873          ← turn 1 finishes OK
06:32:48  cli session reset: provider=claude-cli reason=missing-transcript   ← BUG
06:32:48  cli exec: … trigger=user useResume=false session=none reuse=invalidated:missing-transcript
06:32:48  claude live session close: reason=restart
06:32:48  claude live session start                           ← fresh session, no context
06:32:57  claude live session turn: durationMs=8867

Turn 2 arrives 40 seconds after turn 1 completes — well past any flush race.

Root cause

File: dist/prepare.runtime-DbLtKpjH.js:719 Helper: dist/attempt-execution.helpers-D6YqcgiA.js:16–69 Destructive branch: dist/attempt-execution-C4Upu-4a.js:247–256

The probe (claudeCliSessionTranscriptHasContent) scans ~/.claude/projects/<dir>/<sessionId>.jsonl for any record where obj.message.role === "assistant". It returns false if:

  • The file doesn't exist at all (binding desync — OpenClaw store points at a session ID that differs from the live claude-cli session).
  • The file exists but the session errored before its first assistant output (e.g., after a 400 response).
  • There's a brief window between flush completion and the probe firing.

In the desync case, the OpenClaw-store binding's sessionId does not match the session ID the live claude-cli process is actually using. The probe checks the wrong file, gets a miss, and hard-resets — even though the live session is perfectly healthy.

clearCliSessionInStore is then called before the new run starts, so there's no recovery path.

What makes this worse

buildClaudeCliFallbackContextPrelude already exists in attempt-execution.helpers for failover paths. It is not invoked on missing-transcript resets — so even when OpenClaw has a complete unified history it could replay, it doesn't.

The setCliSessionBinding call in attempt-execution-C4Upu-4a.js:306 only fires on the FailoverError path, not on normal session rotation — so the binding can drift silently with each new CLI session.

Proposed fix

Short-term (most defensive):

  1. Distinguish "file not found" from "file has no assistant records." If the transcript file simply doesn't exist, treat the binding as unknown rather than missing: keep it, let claude-cli attempt a resume, and let claude-cli reject if the session is genuinely dead.
  2. When a reset is unavoidable, invoke buildClaudeCliFallbackContextPrelude (already imported in prepare.runtime) to feed the prior context to the new session. This infrastructure exists; it's just not wired to this code path.
  3. Log the stored sessionId vs. the actual running session ID when they diverge — this is the diagnostic gap that makes the bug hard to catch.

Medium-term:

  • Reverse the order in the destructive branch: only call clearCliSessionInStore after a successful new turn confirms the old binding is dead, not before.
  • Fire setCliSessionBinding whenever runCliAgent returns a cliSessionBinding whose sessionId differs from the stored one, not only on FailoverError.

Long-term:

  • Stop using JSONL assistant-record presence as a liveness probe. Either query claude-cli directly, or maintain an OpenClaw-side liveness flag that is set when the runtime receives the first assistant chunk over the live channel (the claude live session turn events already carry this information).

What this is NOT

  • Not a claw-messenger (or other channel plugin) bug — the plugin's inbound routing is correct.
  • Not a claude-cli bug — given a fresh session ID, claude-cli behaves as expected.
  • Not a config issue — no gateway setting addresses this; the logic is in compiled runtime code.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING