openclaw - 💡(How to fix) Fix [Bug]: sessions.reset can be overwritten by stale lifecycle events from the old run

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

sessions.reset can rotate a channel session to a fresh sessionId, but an old in-flight embedded run can still emit later lifecycle start / end / error events keyed only by the same sessionKey. Those stale lifecycle events are then persisted into the new session row, leaving the fresh session marked running or failed even though hasActiveRun=false.

This is separate from the context-overflow error itself. The bug is that a terminal lifecycle event from the pre-reset run can mutate the post-reset session.

Error Message

sessions.reset can rotate a channel session to a fresh sessionId, but an old in-flight embedded run can still emit later lifecycle start / end / error events keyed only by the same sessionKey. Those stale lifecycle events are then persisted into the new session row, leaving the fresh session marked running or failed even though hasActiveRun=false. This is separate from the context-overflow error itself. The bug is that a terminal lifecycle event from the pre-reset run can mutate the post-reset session. 4. The old pre-reset embedded run then emitted a terminal error lifecycle event from the old transcript/session file. [responses] error provider=openai api=openai-responses model=gpt-5.5 message=Request was aborted error="Context overflow: prompt too large for the model..." error="Context overflow: estimated context size exceeds safe threshold during tool loop."

  • a late end / error from the old run should either be ignored for the new row or applied only to the old archived session identity;

Root Cause

This is only a local mitigation, not a proposed final patch shape. The root fix could also abort/drain old runs more aggressively during reset, but the persistence layer should still be session-id guarded because late terminal events are expected in async systems.

Fix Action

Fix / Workaround

Implementation notes from local hotfix

A local installed-dist hotfix was enough to stop the stale writeback:

This is only a local mitigation, not a proposed final patch shape. The root fix could also abort/drain old runs more aggressively during reset, but the persistence layer should still be session-id guarded because late terminal events are expected in async systems.

Code Example

{
  "before_final_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "running",
    "hasActiveRun": false,
    "totalTokens": 19644
  },
  "after_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "failed",
    "hasActiveRun": false,
    "startedAt": 1780209859315,
    "endedAt": 1780210156714,
    "runtimeMs": 297399
  }
}

---

[responses] error provider=openai api=openai-responses model=gpt-5.5 message=Request was aborted
sessions.reset succeeded

[tools] read failed: Path escapes sandbox root (...)

embedded_run_agent_end
  runId=ba63b0d4-99fb-4131-8fab-1997aff16c62
  isError=true
  error="Context overflow: prompt too large for the model..."

[context-overflow-diag]
  sessionKey=agent:main:slack:channel:<redacted>
  sessionFile=<old-session-id>.jsonl
  error="Context overflow: estimated context size exceeds safe threshold during tool loop."

---

Gateway connectivity probe: ok
Slack: enabled, configured, running, connected, health=healthy

---

{
  "sessionId": "768ce38f-b576-4523-9226-e420e44bd358",
  "status": null,
  "startedAt": null,
  "endedAt": null,
  "runtimeMs": null,
  "systemSent": false,
  "inputTokens": 0,
  "outputTokens": 0,
  "totalTokens": 0,
  "hasActiveRun": false
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug / session state corruption after reset

Summary

sessions.reset can rotate a channel session to a fresh sessionId, but an old in-flight embedded run can still emit later lifecycle start / end / error events keyed only by the same sessionKey. Those stale lifecycle events are then persisted into the new session row, leaving the fresh session marked running or failed even though hasActiveRun=false.

This is separate from the context-overflow error itself. The bug is that a terminal lifecycle event from the pre-reset run can mutate the post-reset session.

Environment

  • OpenClaw CLI/Gateway: 2026.5.28 (e932160)
  • OS: macOS arm64
  • Node: v24.15.0
  • Gateway: macOS LaunchAgent, loopback gateway
  • Channel: Slack socket mode, channel-scoped session
  • Provider/API path: openai / openai-responses
  • Model: gpt-5.5
  • Thinking default: xhigh

Related issues

  • Related reset/recovery cleanup class: #87310
  • Related upstream trigger in this run: #88499
  • Older closed context-overflow/reset background: #51154

Observed sequence

  1. A Slack channel session hit context overflow during an embedded run.
  2. Running sessions.reset returned a new session id for the same channel sessionKey.
  3. Shortly after reset, the new session row showed status="running" while hasActiveRun=false.
  4. The old pre-reset embedded run then emitted a terminal error lifecycle event from the old transcript/session file.
  5. The new session row was overwritten to status="failed" even though the failing run belonged to the old session id.

Sanitized state snapshots:

{
  "before_final_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "running",
    "hasActiveRun": false,
    "totalTokens": 19644
  },
  "after_old_event": {
    "sessionId": "098fc577-f61d-4ce6-962f-4a3c3e05f270",
    "status": "failed",
    "hasActiveRun": false,
    "startedAt": 1780209859315,
    "endedAt": 1780210156714,
    "runtimeMs": 297399
  }
}

Sanitized log shape:

[responses] error provider=openai api=openai-responses model=gpt-5.5 message=Request was aborted
sessions.reset succeeded

[tools] read failed: Path escapes sandbox root (...)

embedded_run_agent_end
  runId=ba63b0d4-99fb-4131-8fab-1997aff16c62
  isError=true
  error="Context overflow: prompt too large for the model..."

[context-overflow-diag]
  sessionKey=agent:main:slack:channel:<redacted>
  sessionFile=<old-session-id>.jsonl
  error="Context overflow: estimated context size exceeds safe threshold during tool loop."

The important detail is that the context-overflow diagnostic still referenced the old session file, while sessions.list had already moved the channel binding to the new session id.

Gateway and Slack health were normal throughout:

Gateway connectivity probe: ok
Slack: enabled, configured, running, connected, health=healthy

Expected behavior

After sessions.reset rotates a session:

  • stale lifecycle events from the old session/run must not mutate the new session row;
  • a late end / error from the old run should either be ignored for the new row or applied only to the old archived session identity;
  • the new session should remain clean/idle until a new turn starts;
  • status="running" with hasActiveRun=false should not be persisted as the fresh reset state.

Actual behavior

The session store is updated by sessionKey alone. After reset, that key points at the new session id, so a stale lifecycle event from the old run can overwrite the new row.

User-facing impact:

  • Slack appears to stop replying even though socket/auth/channel health are fine.
  • The session can show dirty running or failed state with no active run.
  • Additional resets can be required, and without a guard the stale event race can re-contaminate the fresh session.

Implementation notes from local hotfix

A local installed-dist hotfix was enough to stop the stale writeback:

  • carry the active sessionId in agent run context when registering embedded runs;
  • in session lifecycle persistence/snapshot projection, compare lifecycle event/session run context sessionId with the current store row's sessionId;
  • if both are known and differ, ignore that lifecycle event for the current session row.

This is only a local mitigation, not a proposed final patch shape. The root fix could also abort/drain old runs more aggressively during reset, but the persistence layer should still be session-id guarded because late terminal events are expected in async systems.

Verification after local mitigation

After applying the guard, restarting the gateway, and resetting the affected Slack channel session:

{
  "sessionId": "768ce38f-b576-4523-9226-e420e44bd358",
  "status": null,
  "startedAt": null,
  "endedAt": null,
  "runtimeMs": null,
  "systemSent": false,
  "inputTokens": 0,
  "outputTokens": 0,
  "totalTokens": 0,
  "hasActiveRun": false
}

No new stalled session, Context overflow, or stale lifecycle writeback appeared in the following health/log check window.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After sessions.reset rotates a session:

  • stale lifecycle events from the old session/run must not mutate the new session row;
  • a late end / error from the old run should either be ignored for the new row or applied only to the old archived session identity;
  • the new session should remain clean/idle until a new turn starts;
  • status="running" with hasActiveRun=false should not be persisted as the fresh reset state.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: sessions.reset can be overwritten by stale lifecycle events from the old run