openclaw - ✅(Solved) Fix [Bug]: Telegram DM amnesia — cliSessionBindings stores claude-cli sessionId with no backing transcript; --resume silently starts a fresh session every turn [2 pull requests, 3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70177Fetched 2026-04-23 07:28:16
View on GitHub
Comments
3
Participants
4
Timeline
14
Reactions
0
Timeline (top)
referenced ×4commented ×3cross-referenced ×2mentioned ×2

In a Telegram DM bound to a main agent on the claude-cli backend, the stored cliSessionBindings["claude-cli"].sessionId points to a Claude CLI session that has no matching transcript file under ~/.claude/projects/<slug>/<sessionId>.jsonl. Every turn the gateway invokes claude --resume <sessionId> with that phantom UUID, Claude Code treats it as a fresh session (parentUuid: null), and the user experiences amnesia with no memory continuity across turns.

Unlike #69118 / #64386, the session-reuse gate (resolveCliSessionReuse) does not invalidate in this failure mode — all four keys (authProfileId, authEpoch, extraSystemPromptHash, mcpConfigHash) match the stored binding, so there is no cli session reset reason=… line in the gateway log. The bug is silent: OpenClaw thinks it is resuming; Claude Code has nothing to resume.

Root Cause

4. Gateway log is silent — no reset reason is logged

Fix Action

Fixed

PR fix notes

PR #70298: fix(agents): use atomic store helper for CLI session clearing

Description (problem / solution / changelog)

What

  • Bug fix: clearCliSession and clearAllCliSessions used delete to remove CLI session properties. When mergeSessionEntry later spreads the update object over the persisted store entry, deleted keys are absent from the spread — so stale session IDs survive the merge and the expired session is never actually cleared.

    Changed to = undefined so the key is explicitly present in the spread.

  • Refactor: Extracted clearCliSessionInStore() in session-store.ts — an atomic helper that clears + persists + merges in one call, matching the existing updateSessionStoreAfterAgentRun pattern.

  • Dead code removal: Removed unreachable retry logic in cli-runner.ts that attempted session recovery without store access (duplicated by the caller in attempt-execution.ts).

Why

Without this fix, CLI sessions that expire mid-run are not properly cleared from the store. The agent keeps retrying with the stale session ID instead of creating a fresh session, causing repeated session_expired failures.

Testing

  • All existing agent tests pass (vitest --project agents — 382 files, 4000 tests)
  • Specifically: attempt-execution.cli.test.ts covers the session-expired recovery path end-to-end
  • oxlint + tsc --noEmit clean

AI Disclosure

  • AI-assisted (Antigravity / Claude Opus 4.6)
  • Fully tested locally
  • Author understands the changes
  • No Codex access for local review

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/cli-session.ts (modified, +4/-4)
  • src/agents/command/attempt-execution.ts (modified, +7/-15)
  • src/agents/command/session-store.test.ts (modified, +81/-1)
  • src/agents/command/session-store.ts (modified, +26/-1)

PR #70317: fix: clear phantom Claude CLI resumes

Description (problem / solution / changelog)

Summary

  • verify stored Claude CLI session ids have a readable ~/.claude/projects/*/<sessionId>.jsonl transcript with assistant content before reusing them
  • clear missing-transcript bindings before the CLI runner can pass --resume
  • add regression coverage for missing transcripts, valid transcripts, and path-like session ids

Fixes #70177.

Testing

  • pnpm test src/agents/command/attempt-execution.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/command/session-store.test.ts
  • pnpm check:changed
  • pnpm check
  • OPENCLAW_VITEST_MAX_WORKERS=1 pnpm check:changed --staged

Full suite note

pnpm test was also attempted. The run hit unrelated failures outside this patch:

  • src/gateway/server-startup.test.ts timed out in skips static warmup for configured CLI backends
  • extensions/openai/provider-runtime.contract.test.ts failed during OpenAI Codex OAuth refresh with Failed to refresh OpenAI Codex token
  • the run later hung in vitest.contracts-channel-surface.config.ts after that shard had already printed a pass summary, so it was stopped with SIGTERM

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/command/attempt-execution.cli.test.ts (modified, +156/-0)
  • src/agents/command/attempt-execution.helpers.ts (modified, +54/-15)
  • src/agents/command/attempt-execution.test.ts (modified, +68/-0)
  • src/agents/command/attempt-execution.ts (modified, +90/-48)

Code Example

"agent:main:direct:michael": {
  "sessionId": "94b88552-b02c-4d0a-bca8-d3873226537d",
  "cliSessionBindings": {
    "claude-cli": {
      "sessionId": "3171f8f7-efb4-433d-81be-071a5d0630ea",
      "authProfileId": "anthropic:claude-cli",
      "authEpoch": "e4807207b45487…",
      "extraSystemPromptHash": "2ce382856b9bc2…",
      "mcpConfigHash": "6cba25a87f1904…"
    }
  }
}

---

$ find ~/.openclaw ~/.claude -name "3171f8f7-*.jsonl"
(nothing)
$ find ~/.openclaw ~/.claude -name "94b88552-*.jsonl"
(nothing)
$ ls ~/.claude/session-env/3171f8f7-*
/home/ubuntu/.claude/session-env/3171f8f7-efb4-433d-81be-071a5d0630ea   # directory only, no transcript

---

011f5e08-70a5-42c9-b2e8-693917c5d557.jsonl.reset.2026-04-20T21-06-08.449Z   (8.1 MB)

---

12:12:56 cli exec: provider=claude-cli model=opus promptChars=505
12:22:18 cli exec: provider=claude-cli model=opus promptChars=416
12:22:20 cli exec: provider=claude-cli model=opus promptChars=416
12:45:50 cli exec: provider=claude-cli model=opus promptChars=782
12:45:51 cli exec: provider=claude-cli model=opus promptChars=782
13:00:32 cli exec: provider=claude-cli model=opus promptChars=1203
RAW_BUFFERClick to expand / collapse

Summary

In a Telegram DM bound to a main agent on the claude-cli backend, the stored cliSessionBindings["claude-cli"].sessionId points to a Claude CLI session that has no matching transcript file under ~/.claude/projects/<slug>/<sessionId>.jsonl. Every turn the gateway invokes claude --resume <sessionId> with that phantom UUID, Claude Code treats it as a fresh session (parentUuid: null), and the user experiences amnesia with no memory continuity across turns.

Unlike #69118 / #64386, the session-reuse gate (resolveCliSessionReuse) does not invalidate in this failure mode — all four keys (authProfileId, authEpoch, extraSystemPromptHash, mcpConfigHash) match the stored binding, so there is no cli session reset reason=… line in the gateway log. The bug is silent: OpenClaw thinks it is resuming; Claude Code has nothing to resume.

Environment

  • openclaw 2026.4.21 (f788c88), upgraded last night from 2026.4.20
  • claude-cli backend, OAuth auth profile, Opus
  • Channel: Telegram DM (chat_type: direct), binding key agent:main:direct:michael
  • Linux (Oracle ARM), Node 22, systemd-managed user unit openclaw-gateway

Evidence

1. Binding points at a Claude-CLI sessionId whose transcript does not exist

~/.openclaw/agents/main/sessions/sessions.json:

"agent:main:direct:michael": {
  "sessionId": "94b88552-b02c-4d0a-bca8-d3873226537d",
  "cliSessionBindings": {
    "claude-cli": {
      "sessionId": "3171f8f7-efb4-433d-81be-071a5d0630ea",
      "authProfileId": "anthropic:claude-cli",
      "authEpoch": "e4807207b45487…",
      "extraSystemPromptHash": "2ce382856b9bc2…",
      "mcpConfigHash": "6cba25a87f1904…"
    }
  }
}

Neither UUID has a backing JSONL:

$ find ~/.openclaw ~/.claude -name "3171f8f7-*.jsonl"
(nothing)
$ find ~/.openclaw ~/.claude -name "94b88552-*.jsonl"
(nothing)
$ ls ~/.claude/session-env/3171f8f7-*
/home/ubuntu/.claude/session-env/3171f8f7-efb4-433d-81be-071a5d0630ea   # directory only, no transcript

Expected: ~/.claude/projects/<slug>/3171f8f7-efb4-433d-81be-071a5d0630ea.jsonl exists.

2. The prior working binding was hard-reset, not migrated

The preceding Michael-direct binding 011f5e08-70a5-42c9-b2e8-693917c5d557 was renamed:

011f5e08-70a5-42c9-b2e8-693917c5d557.jsonl.reset.2026-04-20T21-06-08.449Z   (8.1 MB)

That rename happened 2026-04-20 21:06 UTC — before the 2026.4.21 upgrade and without user /reset. The new binding (94b88552 / 3171f8f7) was written fresh on next turn, but the code path that allocated it never produced a corresponding ~/.claude/projects/.../*.jsonl for the claude-cli sessionId it chose.

3. Aggressive pruning in 2026.4.20 amplified the surface area

sessions.json dropped from ~3.7 MB → ~1.7 MB after the 2026.4.20 upgrade (59 → 27 keys). The 2026.4.20 changelog:

enforce the built-in entry cap and age prune by default, and prune oversized stores at load time

Presumably intentional, but the pruner evicted still-live bindings for infrequently-used DMs (the TUI is the hot path; Telegram DMs went a day without traffic). When the user came back via Telegram, a brand new binding was allocated and the missing-transcript code path was taken.

4. Gateway log is silent — no reset reason is logged

Two hours of journalctl --user -u openclaw-gateway:

12:12:56 cli exec: provider=claude-cli model=opus promptChars=505
12:22:18 cli exec: provider=claude-cli model=opus promptChars=416
12:22:20 cli exec: provider=claude-cli model=opus promptChars=416
12:45:50 cli exec: provider=claude-cli model=opus promptChars=782
12:45:51 cli exec: provider=claude-cli model=opus promptChars=782
13:00:32 cli exec: provider=claude-cli model=opus promptChars=1203

promptChars is tiny per turn (inbound envelope only) — confirming no conversation history is being carried across turns. But there are zero cli session reset reason=… lines for agent:main:direct:michael in this window. The reuse gate happily returns "reuse" because the binding fields all match; Claude Code receives --resume 3171f8f7-… and silently starts fresh.

(For contrast, this morning's log does show reason=mcp and reason=auth-epoch resets on other bindings — those invalidations fire as designed; this one does not.)

Impact

  • Any channel that gets pruned from sessions.json and later re-binds is at risk of the same silent amnesia.
  • Users see degraded context without any log signal pointing at session plumbing.
  • Particularly bad for low-frequency DMs, which are exactly what the age-based pruner targets.

Suspected root cause (needs maintainer confirmation)

Something in the rebind path is writing a claude-cli sessionId before or without a turn that actually produces a ~/.claude/projects/<slug>/*.jsonl. Likely candidates:

  • The sessionId is generated optimistically from an allocator (or re-read from a stale field), the first claude -p invocation fails or is short-circuited before Claude Code writes its transcript, but the binding is persisted regardless.
  • Or the sessionId is being captured from a parent process whose transcript is written under a different project slug than the one --resume is later asked to load from.

Either way, the invariant worth enforcing is: never persist cliSessionBindings[provider].sessionId unless a transcript for that sessionId exists on disk at write time.

Suggested fixes

  1. Post-write verification: after setCliSessionBinding persists a claude-cli sessionId, stat the expected ~/.claude/projects/<slug>/<sessionId>.jsonl. If absent, don't persist; log a warning and let the next turn allocate fresh.

  2. Pre-resume verification: in resolveCliSessionReuse, add a sixth check — if the binding references claude-cli but the transcript file is missing, return invalidatedReason: "transcript-missing" and fall through to claude -p. This at least makes the bug visible in the log and stops handing phantom --resume UUIDs to Claude Code.

  3. Pruner guardrails: the 2026.4.20 age-prune should either:

    • not evict bindings whose underlying transcript is still present, or
    • when it does evict, also delete the transcript file and any session-env/<sessionId> directory, so downstream code cannot be fooled into thinking there is something to resume.
  4. Telemetry: emit a gateway log line whenever --resume <sessionId> is passed to claude-cli but the transcript cannot be stat-ed. Today this entire failure is invisible.

Related

  • #69118 — extraSystemPromptHash drift in group channels (different surface, overlapping plumbing).
  • #64386 — mcpConfigHash drift on restart.
  • #57141 / #62769 — Telegram DM binding failure modes.
  • #49888 — oversized entries poisoning shared sessions (different but adjacent to the 2026.4.20 pruner rationale).

Happy to provide a stripped sessions.json snippet and journalctl excerpts on request, or open a PR that adds the post-write / pre-resume stat check + regression test.

extent analysis

TL;DR

The most likely fix involves adding a verification step to ensure that a transcript file exists for a given cliSessionBindings[provider].sessionId before persisting it, to prevent silent amnesia in Telegram DMs.

Guidance

  1. Implement post-write verification: After setCliSessionBinding persists a claude-cli sessionId, verify the existence of the corresponding transcript file (~/.claude/projects/<slug>/<sessionId>.jsonl) and log a warning if it's absent.
  2. Add pre-resume verification: In resolveCliSessionReuse, check for the presence of the transcript file before reusing a claude-cli sessionId, and return an invalidatedReason if the file is missing.
  3. Review pruner logic: Ensure the age-prune functionality in the 2026.4.20 update doesn't evict bindings with existing transcripts, or properly cleans up associated files when evicting.
  4. Enhance telemetry: Emit a log line when --resume <sessionId> is passed to claude-cli but the transcript file cannot be found, to make the failure visible.

Example

// Example of post-write verification in setCliSessionBinding
function setCliSessionBinding(sessionId) {
  // ... persist session binding logic ...
  const transcriptPath = `~/.claude/projects/<slug}/${sessionId}.jsonl`;
  if (!fs.existsSync(transcriptPath)) {
    console.warn(`Transcript file not found for sessionId ${sessionId}`);
    // Consider not persisting the session binding or taking alternative action
  }
}

Notes

  • The provided suggestions aim to address the silent amnesia issue by introducing checks for transcript file existence, but the root cause may require further investigation.
  • The age-prune functionality's impact on bindings and transcripts needs careful review to prevent unintended behavior.

Recommendation

Apply the suggested fixes, starting with the post-write verification and pre-resume verification steps, to address the silent amnesia issue and improve the overall robustness of the session management logic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Telegram DM amnesia — cliSessionBindings stores claude-cli sessionId with no backing transcript; --resume silently starts a fresh session every turn [2 pull requests, 3 comments, 4 participants]