openclaw - ✅(Solved) Fix BUG: Persisted main session row can become stale and diverge from transcript, wedging new input [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60542Fetched 2026-04-08 02:49:50
View on GitHub
Comments
1
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2labeled ×2commented ×1referenced ×1

The persisted main-session registry entry can become stale and inconsistent with the session transcript. When this happens, the gateway may remain up and partially responsive, but new prompts are not accepted from multiple surfaces until the stale main-session entry is removed and the gateway is restarted.

Root Cause

The persisted main-session registry entry can become stale and inconsistent with the session transcript. When this happens, the gateway may remain up and partially responsive, but new prompts are not accepted from multiple surfaces until the stale main-session entry is removed and the gateway is restarted.

Fix Action

Fixed

PR fix notes

PR #60642: fix: prevent stale main-session registry entry from wedging new input

Description (problem / solution / changelog)

Summary

Detect stale persisted main-session rows that are already terminal but older than their transcript, and roll forward to a fresh session instead of reusing the wedged row.

Changes

  • add a targeted stale-row check in initSessionState for main sessions only
  • compare the persisted terminal row timestamp against the resolved transcript file mtime
  • force a new session when the transcript is newer than the terminal registry row
  • add a regression test covering the stale terminal main-session case

Testing

  • pnpm test -- --run src/auto-reply/reply/session.test.ts (session test file reached 59/61 passing; 2 unrelated/pre-existing failures remained in that file under the project runner)
  • pnpm exec tsx manual reproduction script confirming a stale terminal agent:main:main row now rolls to a fresh session

Fixes openclaw/openclaw#60542

Changed files

  • openclaw-2026-03-07.log (added, +129/-0)
  • src/auto-reply/reply/session.test.ts (modified, +42/-0)
  • src/auto-reply/reply/session.ts (modified, +49/-3)
  • src/infra/json-files.ts (modified, +37/-13)
  • src/slack/monitor/provider.ts (modified, +59/-2)

Code Example

The persisted main-session registry entry referenced a transcript with newer events than the registry entry reflected.

The transcript included a later prompt-timeout event not represented in the terminal registry snapshot.

Both Telegram and the control dashboard stopped accepting new prompts, which ruled out a single-surface provider issue.

The gateway still appeared healthy enough to answer some local RPC/dashboard calls, showing that process health alone did not reflect interaction health.

Restarting the gateway without removing the stale main-session entry did not reliably restore interaction.

Removing only the stale main-session entry from the session registry, then restarting, caused OpenClaw to recreate a fresh main session and restored normal operation.

This appears related to, but distinct from, issue #60250, which covered the narrower case of a completed run remaining persisted as running.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Yes

Summary

The persisted main-session registry entry can become stale and inconsistent with the session transcript. When this happens, the gateway may remain up and partially responsive, but new prompts are not accepted from multiple surfaces until the stale main-session entry is removed and the gateway is restarted.

Steps to reproduce

  1. Start an interactive direct-chat session on OpenClaw 2026.4.2.
  2. Use the default main session on gpt-5.4.
  3. Allow normal interaction and at least one restart/recovery cycle after earlier session-state trouble.
  4. Inspect the persisted main-session registry entry and its referenced transcript.
  5. If the bug triggers, the registry entry can be in a terminal state such as done while the linked transcript contains later incompatible events not reflected in the registry entry.
  6. In that state, new prompts from more than one surface are not accepted even though the gateway can still answer some local RPC/dashboard requests.

Expected behavior

The persisted main-session registry entry and the linked transcript should remain consistent.

If the session is terminal, the registry should accurately reflect the latest transcript state, and the conversation should continue accepting new prompts normally after restart or recovery.

Actual behavior

Observed behavior showed a persisted main-session registry entry in terminal state while the linked transcript contained later events that the registry entry did not account for, including a later prompt-timeout event.

In that state, the gateway remained up and partially responsive, but new prompts were not accepted from either Telegram or the control dashboard. Restarting alone was insufficient. Recovery required deleting only the stale main-session registry entry so OpenClaw could recreate a fresh main session on restart.

OpenClaw version

2026.4.2

Operating system

Ubuntu 24.04.4 LTS; Linux x86_64

Install method

npm global

Model

gpt-5.4

Provider / routing chain

openai-codex -> gpt-5.4

Additional provider/model setup details

Observed on the default main direct-chat session.

A separate Anthropic auth issue existed in the same environment, but it was ruled out as the immediate cause of this cross-surface wedge.

Logs, screenshots, and evidence

The persisted main-session registry entry referenced a transcript with newer events than the registry entry reflected.

The transcript included a later prompt-timeout event not represented in the terminal registry snapshot.

Both Telegram and the control dashboard stopped accepting new prompts, which ruled out a single-surface provider issue.

The gateway still appeared healthy enough to answer some local RPC/dashboard calls, showing that process health alone did not reflect interaction health.

Restarting the gateway without removing the stale main-session entry did not reliably restore interaction.

Removing only the stale main-session entry from the session registry, then restarting, caused OpenClaw to recreate a fresh main session and restored normal operation.

This appears related to, but distinct from, issue #60250, which covered the narrower case of a completed run remaining persisted as running.

Impact and severity

High severity.

When triggered, the primary conversation becomes unusable across multiple surfaces even though the gateway may still look online. Recovery requires manual intervention in persisted session state.

Additional information

This looks like the same general persistence-corruption family as issue #60250, but a broader manifestation. In #60250 the contradictory state was running plus terminal fields. Here the registry entry itself was already terminal, but stale relative to the linked transcript.

A maintainer may want to inspect whether session-registry persistence and transcript/lifecycle persistence can be written out of order or from different stale snapshots during restart/recovery paths.

If you want, I can also draft the short cross-link comment for #60250 so the two issues point at each other cleanly.

extent analysis

TL;DR

Removing the stale main-session registry entry and restarting the gateway may resolve the issue of new prompts not being accepted from multiple surfaces.

Guidance

  • Verify the consistency of the persisted main-session registry entry and its linked transcript to identify any discrepancies.
  • Check the registry entry's state and the transcript's events to ensure they are up-to-date and reflect the correct session state.
  • If the registry entry is terminal but the transcript contains later events, removing the stale registry entry and restarting the gateway may restore normal operation.
  • Inspect the session-registry persistence and transcript/lifecycle persistence to determine if they can be written out of order or from different stale snapshots during restart/recovery paths.

Example

No code snippet is provided as the issue is related to the consistency of the session registry and transcript, and the solution involves removing the stale registry entry and restarting the gateway.

Notes

The issue appears to be related to issue #60250, but with a broader manifestation. The solution may involve ensuring that the session registry and transcript are updated consistently, especially during restart/recovery cycles.

Recommendation

Apply the workaround of removing the stale main-session registry entry and restarting the gateway, as this has been shown to restore normal operation in the affected environment.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The persisted main-session registry entry and the linked transcript should remain consistent.

If the session is terminal, the registry should accurately reflect the latest transcript state, and the conversation should continue accepting new prompts normally after restart or recovery.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING