After `sessions.reset` rotates a session: - stale lifecycle events from the old session/run must not mutate the new session row; - a late `end` / `error` from the old run should either be ignored for the new row or applied only to the old archived session identity; - the new session should remain clean/idle until a new turn starts; - `status="running"` with `hasActiveRun=false` should not be persisted as the fresh reset state.

openclaw - 💡(How to fix) Fix [Bug]: sessions.reset can be overwritten by stale lifecycle events from the old run

openclaw2026-05-31 07:01:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

This is separate from the context-overflow error itself. The bug is that a terminal lifecycle event from the pre-reset run can mutate the post-reset session.

Error Message

sessions.reset can rotate a channel session to a fresh sessionId, but an old in-flight embedded run can still emit later lifecycle start / end / error events keyed only by the same sessionKey. Those stale lifecycle events are then persisted into the new session row, leaving the fresh session marked running or failed even though hasActiveRun=false. This is separate from the context-overflow error itself. The bug is that a terminal lifecycle event from the pre-reset run can mutate the post-reset session. 4. The old pre-reset embedded run then emitted a terminal error lifecycle event from the old transcript/session file. [responses] error provider=openai api=openai-responses model=gpt-5.5 message=Request was aborted error="Context overflow: prompt too large for the model..." error="Context overflow: estimated context size exceeds safe threshold during tool loop."

a late end / error from the old run should either be ignored for the new row or applied only to the old archived session identity;

Root Cause

This is only a local mitigation, not a proposed final patch shape. The root fix could also abort/drain old runs more aggressively during reset, but the persistence layer should still be session-id guarded because late terminal events are expected in async systems.

Fix Action

Fix / Workaround

Implementation notes from local hotfix

A local installed-dist hotfix was enough to stop the stale writeback:

Code Example

{
  "before_final_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "running",
    "hasActiveRun": false,
    "totalTokens": 19644
  },
  "after_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "failed",
    "hasActiveRun": false,
    "startedAt": 1780209859315,
    "endedAt": 1780210156714,
    "runtimeMs": 297399
  }
}

---

[responses] error provider=openai api=openai-responses model=gpt-5.5 message=Request was aborted
sessions.reset succeeded

[tools] read failed: Path escapes sandbox root (...)

embedded_run_agent_end
  runId=ba63b0d4-99fb-4131-8fab-1997aff16c62
  isError=true
  error="Context overflow: prompt too large for the model..."

[context-overflow-diag]
  sessionKey=agent:main:slack:channel:<redacted>
  sessionFile=<old-session-id>.jsonl
  error="Context overflow: estimated context size exceeds safe threshold during tool loop."

---

Gateway connectivity probe: ok
Slack: enabled, configured, running, connected, health=healthy

---

{
  "sessionId": "768ce38f-b576-4523-9226-e420e44bd358",
  "status": null,
  "startedAt": null,
  "endedAt": null,
  "runtimeMs": null,
  "systemSent": false,
  "inputTokens": 0,
  "outputTokens": 0,
  "totalTokens": 0,
  "hasActiveRun": false
}

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug / session state corruption after reset

Summary

This is separate from the context-overflow error itself. The bug is that a terminal lifecycle event from the pre-reset run can mutate the post-reset session.

Environment

OpenClaw CLI/Gateway: 2026.5.28 (e932160)
OS: macOS arm64
Node: v24.15.0
Gateway: macOS LaunchAgent, loopback gateway
Channel: Slack socket mode, channel-scoped session
Provider/API path: openai / openai-responses
Model: gpt-5.5
Thinking default: xhigh

Related issues

Related reset/recovery cleanup class: #87310
Related upstream trigger in this run: #88499
Older closed context-overflow/reset background: #51154

Observed sequence

A Slack channel session hit context overflow during an embedded run.
Running sessions.reset returned a new session id for the same channel sessionKey.
Shortly after reset, the new session row showed status="running" while hasActiveRun=false.
The old pre-reset embedded run then emitted a terminal error lifecycle event from the old transcript/session file.
The new session row was overwritten to status="failed" even though the failing run belonged to the old session id.

Sanitized state snapshots:

{
  "before_final_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "running",
    "hasActiveRun": false,
    "totalTokens": 19644
  },
  "after_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "failed",
    "hasActiveRun": false,
    "startedAt": 1780209859315,
    "endedAt": 1780210156714,
    "runtimeMs": 297399
  }
}

Sanitized log shape:

[responses] error provider=openai api=openai-responses model=gpt-5.5 message=Request was aborted
sessions.reset succeeded

[tools] read failed: Path escapes sandbox root (...)

embedded_run_agent_end
  runId=ba63b0d4-99fb-4131-8fab-1997aff16c62
  isError=true
  error="Context overflow: prompt too large for the model..."

[context-overflow-diag]
  sessionKey=agent:main:slack:channel:<redacted>
  sessionFile=<old-session-id>.jsonl
  error="Context overflow: estimated context size exceeds safe threshold during tool loop."

The important detail is that the context-overflow diagnostic still referenced the old session file, while sessions.list had already moved the channel binding to the new session id.

Gateway and Slack health were normal throughout:

Gateway connectivity probe: ok
Slack: enabled, configured, running, connected, health=healthy

Expected behavior

After sessions.reset rotates a session:

stale lifecycle events from the old session/run must not mutate the new session row;
a late end / error from the old run should either be ignored for the new row or applied only to the old archived session identity;
the new session should remain clean/idle until a new turn starts;
status="running" with hasActiveRun=false should not be persisted as the fresh reset state.

Actual behavior

The session store is updated by sessionKey alone. After reset, that key points at the new session id, so a stale lifecycle event from the old run can overwrite the new row.

User-facing impact:

Slack appears to stop replying even though socket/auth/channel health are fine.
The session can show dirty running or failed state with no active run.
Additional resets can be required, and without a guard the stale event race can re-contaminate the fresh session.

Implementation notes from local hotfix

A local installed-dist hotfix was enough to stop the stale writeback:

carry the active sessionId in agent run context when registering embedded runs;
in session lifecycle persistence/snapshot projection, compare lifecycle event/session run context sessionId with the current store row's sessionId;
if both are known and differ, ignore that lifecycle event for the current session row.

Verification after local mitigation

After applying the guard, restarting the gateway, and resetting the affected Slack channel session:

{
  "sessionId": "768ce38f-b576-4523-9226-e420e44bd358",
  "status": null,
  "startedAt": null,
  "endedAt": null,
  "runtimeMs": null,
  "systemSent": false,
  "inputTokens": 0,
  "outputTokens": 0,
  "totalTokens": 0,
  "hasActiveRun": false
}

No new stalled session, Context overflow, or stale lifecycle writeback appeared in the following health/log check window.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

After sessions.reset rotates a session:

stale lifecycle events from the old session/run must not mutate the new session row;
a late end / error from the old run should either be ignored for the new row or applied only to the old archived session identity;
the new session should remain clean/idle until a new turn starts;
status="running" with hasActiveRun=false should not be persisted as the fresh reset state.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: sessions.reset can be overwritten by stale lifecycle events from the old run

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Implementation notes from local hotfix

Code Example

Bug type

Summary

Environment

Related issues

Observed sequence

Expected behavior

Actual behavior

Implementation notes from local hotfix

Verification after local mitigation

FAQ

Expected behavior

Still need to ship something?

TRENDING