openclaw - 💡(How to fix) Fix [Bug] doctor `--fix` archives historical session transcripts as "orphans," silently losing chat history; recommend CI hardening of every "Run openclaw doctor" recommendation [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73471Fetched 2026-04-29 06:19:28
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Author
Timeline (top)
commented ×1

openclaw doctor --fix classifies all historical primary session transcripts (<uuid>.jsonl) as "orphans" and offers to archive them by renaming to *.deleted.<timestamp>. The phrasing of the prompt makes the operation sound innocuous ("This only renames them"), but until the fix in [fix/include-reset-transcripts-in-discovery] lands, the renamed files are no longer discovered by memory search — i.e., the user's prior conversational history disappears from memory_search results without any indication.

Closely related to but distinct from existing issues #70680 (trajectory .jsonl falsely flagged) and #50248 (fresh cron sessions falsely classified as missing). All three are different mistakes in the same orphan-classification code path. This third variant is the most damaging because it tombstones valid, complete primary transcripts — i.e., the actual user-visible chat history.

Error Message

This is honest framing: my trigger was self-inflicted. I rebased a PR branch onto upstream main and rebuilt, ending up with a version skew where the running gateway didn't satisfy plugin engines.openclaw requirements declared in my config. That produced a startup crash loop, and the gateway's own error message said Run: openclaw doctor --fix. I ran it, hit the orphan-archive prompt, and clicked through. 4. Silent failure mode upstream amplifies the trap. loadSessionStore (src/config/sessions/store-load.ts) silently degrades to an empty store {} on any JSON parse error or schema mismatch. The catch block is empty: 4. Make loadSessionStore parse failures non-silent. On parse error, log a warning and either back up + recreate from disk-walk, or hard-fail with a clear message instead of silently returning {}.

Root Cause

Closely related to but distinct from existing issues #70680 (trajectory .jsonl falsely flagged) and #50248 (fresh cron sessions falsely classified as missing). All three are different mistakes in the same orphan-classification code path. This third variant is the most damaging because it tombstones valid, complete primary transcripts — i.e., the actual user-visible chat history.

Fix Action

Fix / Workaround

Workaround (until the indexer fix lands and a recovery command exists)

Code Example

Found 47 orphan transcript file(s) in ~/.openclaw/agents/main/sessions.
     These .jsonl files are no longer referenced by sessions.json, so they are
     not part of any active session history.
     Doctor can archive them safely by renaming each file to *.deleted.<timestamp>.
   Archive 47 orphan transcript file(s) in ~/.openclaw/agents/main/sessions?
   This only renames them to *.deleted.<timestamp>. [y/N]

---

} catch {
     if (attempt < maxReadAttempts - 1) { ... continue; }
   }

---

cd ~/.openclaw/agents/<agent>/sessions
  for f in *.jsonl.deleted.*; do
    mv -- "$f" "${f%.deleted.*}"  # only if you're sure no live session uses that uuid
  done
RAW_BUFFERClick to expand / collapse

[Bug] doctor --fix archives historical session transcripts as "orphans," causing silent loss of conversational history; recommend CI hardening of every "Run openclaw doctor" recommendation

Summary

openclaw doctor --fix classifies all historical primary session transcripts (<uuid>.jsonl) as "orphans" and offers to archive them by renaming to *.deleted.<timestamp>. The phrasing of the prompt makes the operation sound innocuous ("This only renames them"), but until the fix in [fix/include-reset-transcripts-in-discovery] lands, the renamed files are no longer discovered by memory search — i.e., the user's prior conversational history disappears from memory_search results without any indication.

Closely related to but distinct from existing issues #70680 (trajectory .jsonl falsely flagged) and #50248 (fresh cron sessions falsely classified as missing). All three are different mistakes in the same orphan-classification code path. This third variant is the most damaging because it tombstones valid, complete primary transcripts — i.e., the actual user-visible chat history.

How I hit it (developer-induced, narrow trigger)

This is honest framing: my trigger was self-inflicted. I rebased a PR branch onto upstream main and rebuilt, ending up with a version skew where the running gateway didn't satisfy plugin engines.openclaw requirements declared in my config. That produced a startup crash loop, and the gateway's own error message said Run: openclaw doctor --fix. I ran it, hit the orphan-archive prompt, and clicked through.

This particular trigger is unlikely to affect normal users, so this issue isn't claiming widespread reproduction in the wild. The reason to file it anyway is that:

  1. The orphan classification, prompt language, and lack of recovery are wrong regardless of how the user got to doctor.
  2. There are ~66 distinct "Run openclaw doctor" recommendations in src/, many fired for non-dev reasons (config migrations, channel auth drift, legacy-form configs, etc.). Any of those can route a user into the same archive prompt.
  3. Once the user is at the prompt, the dangerous behavior is the same regardless of trigger.

In other words: the trigger that got me here is rare, but the trap at the end of the path is reachable from many paths.

Reproduction (minimal, no dev setup needed)

  1. Start with an OpenClaw install that has accumulated historical primary .jsonl transcripts under agents/<id>/sessions/. (Any non-trivial install will have this — every prior session leaves a transcript file behind.)
  2. Verify: sessions.json lists ~1–5 keys (e.g. agent:main:main plus active cron run keys), but the sessions directory holds many more <uuid>.jsonl files.
  3. Run openclaw doctor --fix (or openclaw doctor and answer Y at the prompt).
  4. Observe the prompt:
    Found 47 orphan transcript file(s) in ~/.openclaw/agents/main/sessions.
      These .jsonl files are no longer referenced by sessions.json, so they are
      not part of any active session history.
      Doctor can archive them safely by renaming each file to *.deleted.<timestamp>.
    Archive 47 orphan transcript file(s) in ~/.openclaw/agents/main/sessions?
    This only renames them to *.deleted.<timestamp>. [y/N]
  5. Confirm. Every historical session transcript is renamed <uuid>.jsonl.deleted.<timestamp>.
  6. On any release prior to fix/include-reset-transcripts-in-discovery, those transcripts are no longer discoverable by memory search.

The prompt itself is the bug surface. Whether the user got there via a dev mistake, a config migration, an upgrade hint, or a tip in a status message is incidental.

Why the classification is wrong

sessions.json is not a registry of all sessions — it is a "currently-active session per session-key" map. Each entry is keyed by something like agent:main:main and holds the most recent sessionId and sessionFile for that key. When a new session starts under the same key, the entry is overwritten; the old transcript file remains on disk.

The orphan detector in src/commands/doctor-state-integrity.ts builds referencedTranscriptPaths from entry.sessionFile for each entry in sessions.json. So the reference set always contains exactly N paths, where N is the number of active session keys (typically 1–5).

The detector then walks agents/<id>/sessions/ and flags every <uuid>.jsonl not in that reference set. This means every prior session transcript (i.e., the entire chat history for that agent) is flagged as "orphan."

A genuine orphan would be: a transcript that was never created by a known session-key (corrupted, leftover from a deleted agent, never registered, malformed first record, etc.). The current heuristic cannot distinguish "old but legitimate history" from "actual orphan" because sessions.json was never designed as a registry of "every session that ever existed."

Why it's dangerous regardless of trigger

  1. The prompt language is misleading. "Archive" implies "moved to a recoverable location"; "this only renames them" implies "no real change." In fact, it removes them from active discovery and (pre-fix) from memory search. The actual semantic is "tombstone."
  2. The user is usually under stress when they arrive at doctor. Most "Run openclaw doctor" recommendations fire after a startup failure or migration warning. Users will click through prompts to make the noise stop.
  3. No undo path. There is no openclaw doctor --restore-archives or --undo-last; recovery requires manual mv *.jsonl.deleted.* *.jsonl on the user's part, if they realize what happened.
  4. Silent failure mode upstream amplifies the trap. loadSessionStore (src/config/sessions/store-load.ts) silently degrades to an empty store {} on any JSON parse error or schema mismatch. The catch block is empty:
    } catch {
      if (attempt < maxReadAttempts - 1) { ... continue; }
    }
    No log, no warning, no "consider running doctor." A corrupted sessions.json therefore makes every transcript appear orphan to doctor on the next run — turning a small sessions.json corruption event into a "47 orphans, please archive?" prompt with no upstream signal that the store itself was the problem.

Existing related bugs (same code, different mistakes)

  • #70680 — .trajectory.jsonl files falsely flagged as orphans (matcher bug)
  • #50248 — fresh cron sessions falsely classified as missing-transcript (timing/path bug)
  • This issue — historical primary .jsonl files falsely classified as orphans (definition-of-orphan bug)

The recurrence pattern suggests the orphan/cleanup detection logic needs systematic test coverage that asserts semantics, not just mechanism.

Recommended fixes

Short-term (correctness)

  1. Tighten orphan classification. A primary .jsonl is only a real orphan if it cannot be parsed as a valid session transcript (no session-start record, truncated header, etc.) or if it explicitly belongs to a deleted agent. Don't equate "not currently in sessions.json" with "orphan."
  2. Reword the prompt to reflect actual semantics. Replace "archive" with "tombstone (hide from discovery)"; surface the count, total bytes, and oldest/newest timestamps; make clear that the files will not appear in chat history listings or (pre-fix) memory search.
  3. Add a recovery path. openclaw doctor --restore-archives [--since <timestamp>] to undo a previous archive run.
  4. Make loadSessionStore parse failures non-silent. On parse error, log a warning and either back up + recreate from disk-walk, or hard-fail with a clear message instead of silently returning {}.

Long-term (CI hardening)

The "Run openclaw doctor" surface is a high-leverage user trust boundary. Every site that says "run doctor to fix this" is implicitly a contract: the user trusts that doctor will fix the named problem and not damage anything else. That contract should be tested systematically.

Proposal: doctor recommendation audit + CI test suite.

  1. Inventory. Programmatically extract every "Run openclaw doctor" recommendation in src/. Output: structured list of (trigger condition, recommended command, repair flow, mutation level, has test, test asserts semantics).

  2. Triage by mutation level.

    • Read-only/info: openclaw doctor (no --fix) — low risk
    • Config-write: openclaw doctor --fix for legacy config migrations — medium risk, recoverable from VCS/backup
    • State-rename: orphan archive, session pruning, oauth-dir migrations — high risk, the dangerous tier
    • State-delete: anything actually removing files — critical, audit individually
  3. Per-recommendation test triple for every state-mutating site:

    • Reproduce trigger: set up the exact broken state that causes the "run doctor" recommendation
    • Run repair: invoke the recommended doctor command non-interactively
    • Assert semantics:
      • Repair succeeded
      • Triggering condition is gone
      • No collateral mutation to user-content data (transcripts, memory files, MEMORY.md, config-tracked files): byte-for-byte preservation, or explicit listing of every file mutated
      • Memory search still finds the same chunks for the same queries before vs. after (modulo expected new chunks)
  4. CI gate. Block PR merges that add a new "run doctor" recommendation without a corresponding test in this suite. A small lint rule + tag system (/* @doctor-recommendation tested-by: <test-id> */) would catch new sites.

The third bullet under (3) is the test type that doesn't exist today. The current doctor-state-integrity.test.ts creates a fake orphan-session.jsonl containing only {"type":"session"} — it tests the mechanism of archiving, never the semantics of "is this thing actually an orphan, and is the archive operation safe for real user data."

Workaround (until the indexer fix lands and a recovery command exists)

  • Do not run openclaw doctor --fix blind. Run openclaw doctor (read-only) first to see the proposed changes.
  • If you've already been bitten: archived transcripts are at agents/<id>/sessions/<uuid>.jsonl.deleted.<timestamp>. They are intact and can be restored by:
    cd ~/.openclaw/agents/<agent>/sessions
    for f in *.jsonl.deleted.*; do
      mv -- "$f" "${f%.deleted.*}"  # only if you're sure no live session uses that uuid
    done
    However, this puts them back as plain .jsonl and the next doctor --fix will tombstone them again (definition bug). Better to leave them as .deleted.* and pick up the indexer fix from fix/include-reset-transcripts-in-discovery, which makes archived files searchable.

Environment

  • OpenClaw 2026.4.22 → 2026.4.25/2026.4.26 (the trap reproduces across versions)
  • Linux (Fedora 43)
  • Original trigger in my case was a dev-induced version skew (rebased PR + stale build), not a path normal users typically take
  • Affected agent: main, ~47 transcripts spanning Feb–Apr archived in a single doctor run

extent analysis

TL;DR

The most likely fix for the issue is to tighten the orphan classification in openclaw doctor --fix to prevent historical primary session transcripts from being incorrectly archived.

Guidance

  • Review the doctor-state-integrity.ts file to understand the current orphan detection logic and identify areas for improvement.
  • Update the prompt language in openclaw doctor --fix to reflect the actual semantics of archiving, including the count, total bytes, and oldest/newest timestamps of the files to be archived.
  • Consider adding a recovery path, such as openclaw doctor --restore-archives, to undo previous archive runs.
  • Make loadSessionStore parse failures non-silent by logging a warning and either backing up and recreating from disk-walk or hard-failing with a clear message.

Example

cd ~/.openclaw/agents/<agent>/sessions
for f in *.jsonl.deleted.*; do
  mv -- "$f" "${f%.deleted.*}"  # only if you're sure no live session uses that uuid
done

This command can be used to restore archived transcripts, but it's recommended to wait for the indexer fix from fix/include-reset-transcripts-in-discovery to make archived files searchable.

Notes

The issue is specific to the openclaw doctor --fix command and its interaction with historical primary session transcripts. The proposed fixes aim to improve the accuracy of orphan detection and provide a safer user experience.

Recommendation

Apply the workaround of not running openclaw doctor --fix blind and instead running openclaw doctor (read-only) first to see the proposed changes. Wait for the indexer fix from fix/include-reset-transcripts-in-discovery to make archived files searchable.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug] doctor `--fix` archives historical session transcripts as "orphans," silently losing chat history; recommend CI hardening of every "Run openclaw doctor" recommendation [1 comments, 2 participants]