openclaw - ✅(Solved) Fix Session not restored after Gateway crash — new session created instead [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80254Fetched 2026-05-11 03:17:04
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
4
Timeline (top)
commented ×2cross-referenced ×1mentioned ×1subscribed ×1

Root Cause

The session files on disk are intact after crash. The issue appears to be that the Gateway does not reconnect to the existing session on restart, creating a new one instead. This is particularly problematic because the crash was caused by the Gateway itself (event-loop starvation), not by the user closing the tab.

Fix Action

Fixed

PR fix notes

PR #80359: [AI-assisted] fix(ui): preserve WebChat session on token reconnect

Description (problem / solution / changelog)

Summary

  • Problem: WebChat reconnect/auth URLs that carried a token without an explicit session could reset the Control UI session selection to main.
  • Why it matters: after a Gateway crash/restart, a user could land in a fresh main chat even though the previously selected session transcript still exists.
  • What changed: token-only URL handling now preserves the current sessionKey and lastActiveSessionKey only when the supplied token matches the already-loaded same-gateway session token, which is the reconnect case.
  • What did NOT change (scope boundary): a fresh token-only launch with a different token still resets stale stored session selection to main; explicit session= URLs still switch sessions. This PR does not alter Gateway restart recovery, transcript storage, or session-default normalization.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #80254
  • Related #47453
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: token-only WebChat reconnect URLs should not discard the persisted selected session when the browser already has the matching same-gateway token.
  • Real environment tested: local Windows checkout running the real Control UI dev server from this PR worktree, opened in headless Chrome with the browser's real localStorage / sessionStorage.
  • Exact steps or command run after this patch: started corepack pnpm --dir ui exec vite --host 127.0.0.1 --port 5174 --strictPort, loaded http://127.0.0.1:5174/chat#token=proof-token, seeded a saved WebChat session and matching session token for ws://127.0.0.1:18789, and inspected the live <openclaw-app> instance after URL hydration.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): terminal capture from the live browser check:
LIVE CONTROL UI PROOF: {"url":"http://127.0.0.1:5174/chat?session=agent%3Atest_old%3Amain","sessionKey":"agent:test_old:main","settingsSessionKey":"agent:test_old:main","lastActiveSessionKey":"agent:test_old:main","token":"proof-token","pendingGatewayToken":null,"search":"?session=agent%3Atest_old%3Amain","hash":""}
  • Observed result after fix: the browser-loaded Control UI preserved agent:test_old:main as both sessionKey and lastActiveSessionKey, applied proof-token, removed the token hash from the URL, and did not leave a pending gateway token.
  • What was not tested: full Gateway crash/restart replay on a Windows WebChat user environment.
  • Before evidence (optional but encouraged): current main resets token-only same-gateway URLs to main regardless of whether the token is a fresh launch token or the already-loaded reconnect token; the updated tests split those cases.

Root Cause (if applicable)

  • Root cause: applySettingsFromUrl treated any same-gateway token URL without session as stale session evidence and forcibly reset session state to main.
  • Missing detection / guardrail: the existing test locked in the broad reset behavior instead of distinguishing matching-token reconnects from fresh token-only launches.
  • Contributing context (if known): session selection is already scoped per gateway, and the same-gateway session token is stored in session storage, so token equality gives a narrow reconnect signal without reviving stale storage for new token launches.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: ui/src/ui/app-settings.test.ts
  • Scenario the test should lock in: matching-token reconnect preserves persisted sessionKey/lastActiveSessionKey; fresh token-only launch resets stale selection to main; explicit session= still switches.
  • Why this is the smallest reliable guardrail: the regression was in URL settings hydration before Gateway reconnect logic runs.
  • Existing test that already covers this (if any): the prior broad token-only reset test is split into reconnect-preserve and fresh-token-reset cases.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

WebChat users keep their selected session when reconnecting with a matching token-only URL after Gateway restart/crash flows. Fresh token-only launches still open on main instead of reviving stale browser storage.

Diagram (if applicable)

Matching-token reconnect:
[token-only reconnect URL] -> [token confirmed and URL cleaned] -> [previous session remains selected]

Fresh token-only launch:
[new token-only URL] -> [token applied and URL cleaned] -> [stale session reset to main]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (Yes)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation: token storage/cleanup behavior is unchanged; the PR only uses equality with the already-loaded session token to distinguish reconnect from fresh token launch.

Repro + Verification

Environment

  • OS: Windows local checkout
  • Runtime/container: Node 22 / pnpm via Corepack
  • Model/provider: N/A
  • Integration/channel (if any): WebChat / Control UI
  • Relevant config (redacted): N/A

Steps

  1. Seed Control UI host settings with sessionKey = lastActiveSessionKey = agent:test_old:main and a matching same-gateway token.
  2. Load/apply a token-only URL: https://control.example/chat#token=test-token.
  3. Compare session state and URL cleanup after applySettingsFromUrl.
  4. Repeat with a different fresh token and confirm stale session state resets to main.

Expected

  • Matching token reconnect: token is applied/confirmed, removed from the URL, and selected session remains agent:test_old:main.
  • Fresh token-only launch: token is applied, removed from the URL, and stale selected session resets to main.

Actual

  • Verified by the updated focused tests and by the live browser Control UI proof above.

Additional local validation:

  • corepack pnpm test ui/src/ui/app-settings.test.ts
  • corepack pnpm test ui/src/ui/app-gateway.node.test.ts ui/src/ui/app-settings.test.ts ui/src/ui/controllers/sessions.test.ts src/agents/main-session-restart-recovery.test.ts
  • corepack pnpm check:changed
  • codex review --base origin/main -c model_reasoning_effort='"medium"' reported no discrete regression in the affected Control UI URL-settings path after the P2 follow-up.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Console output
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: matching-token URL preserves selected session; fresh-token URL resets stale selection to main; explicit session URL still changes selection; different-gateway token deferral remains unchanged; real browser Control UI URL hydration preserves the stored session for matching-token reconnect.
  • Edge cases checked: Gateway defaults/restart-recovery acceptance-adjacent tests stayed green.
  • What you did not verify: live Gateway crash/restart on Windows WebChat.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: a matching session token could still be associated with a now-invalid selected session.
    • Mitigation: session selection remains gateway-scoped, explicit session= remains authoritative, and existing Gateway/session loading logic can reject or normalize unavailable sessions.
  • Risk: changing URL-token behavior could weaken the prior fresh-launch guardrail.
    • Mitigation: the fresh token-only case is explicitly retained and covered by regression tests.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • ui/src/ui/app-settings.test.ts (modified, +47/-2)
  • ui/src/ui/app-settings.ts (modified, +3/-1)

Code Example

20:13:43 [fetch-timeout] fetch timeout after 2500ms (elapsed 3639ms) operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/beta
20:13:47 [fetch-timeout] fetch timeout after 2500ms (elapsed 4458ms) timer delayed 1958ms, likely event-loop starvation operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/latest
20:14:21 [ws] closed before connect conn=... code=1005 reason=n/a
20:14:24 [ws] closed before connect conn=... code=1005 reason=n/a
RAW_BUFFERClick to expand / collapse

Bug Description

When Gateway crashes due to event-loop starvation (e.g., fetch-timeout to npm registry), the session is not restored after restart. Instead, a completely new session is created, losing all conversation history.

Steps to Reproduce

  1. Open a webchat session with active conversation history
  2. Gateway experiences event-loop blocking (e.g., npm registry timeout causing thread starvation)
  3. After the crash/restart, the session list shows only a brand new session
  4. Old session files exist on disk but are not reconnected

Expected Behavior

After Gateway restarts following a crash, the existing session should be restored and conversation history should remain accessible.

Actual Behavior

  • Session files remain on disk at agents/main/sessions/<id>.jsonl
  • Control UI shows a brand new session with no history
  • User must manually switch to the old session to recover history

Gateway Log Excerpt

20:13:43 [fetch-timeout] fetch timeout after 2500ms (elapsed 3639ms) operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/beta
20:13:47 [fetch-timeout] fetch timeout after 2500ms (elapsed 4458ms) timer delayed 1958ms, likely event-loop starvation operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/latest
20:14:21 [ws] closed before connect conn=... code=1005 reason=n/a
20:14:24 [ws] closed before connect conn=... code=1005 reason=n/a

Environment

  • OpenClaw 2026.5.7
  • Windows 10
  • WebChat channel

Additional Context

The session files on disk are intact after crash. The issue appears to be that the Gateway does not reconnect to the existing session on restart, creating a new one instead. This is particularly problematic because the crash was caused by the Gateway itself (event-loop starvation), not by the user closing the tab.

Is there any session recovery mechanism that should kick in after an unclean shutdown?

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING