openclaw - 💡(How to fix) Fix [Bug]: Session lifecycle never finalizes — status: "running" persists after session close, breaking maintenance eviction and dashboard observability [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71756Fetched 2026-04-26 05:08:48
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

Error Message

| #70347 | Cron outer timeout should emit lifecycle.error so sessions.json finalizes immediately |

Root Cause

Consequence 2 — Dashboard shows every historical session as active. Nerve, Control UI, and any tool reading sessions.list report all sessions as active because every entry has status: "running". A 2-minute cron run from 3 days ago is indistinguishable from a live interactive session.

Code Example

// sessions.json entry for a subagent that ran 3 days ago and completed normally:
{
  "status": "running",      // ← should be "completed" or "ended"
  "updatedAt": 1777134600292,  // 3 days ago
  "sessionId": "d5a486a6-385c-47fd-9279-4a40d7d53afd",
  "model": "MiniMax-M2.7",
  "startedAt": 1777134599292
  // no endedAt written
}
RAW_BUFFERClick to expand / collapse

Bug type: Behavior bug

Summary

When a session ends normally, the Gateway never updates the status field in sessions.json. The field stays "running" regardless of how the session ended — whether it ran for 2 minutes via a cron job or 8 hours interactively. This has two concrete consequences:

Consequence 1 — Maintenance eviction is broken. sessions cleanup and session.maintenance.pruneAfter are designed to keep sessions.json bounded by evicting entries older than a configurable age threshold. But pruneAfter uses updatedAt as a proxy — and with no terminal status value, a week-old entry for a session that ended normally looks identical to a currently-live session. Both have status: "running" and a recent-ish updatedAt. On a system running ~15 cron jobs with 48h sessionRetention, sessions.json accumulates 200+ entries within weeks, all showing status: "running".

Consequence 2 — Dashboard shows every historical session as active. Nerve, Control UI, and any tool reading sessions.list report all sessions as active because every entry has status: "running". A 2-minute cron run from 3 days ago is indistinguishable from a live interactive session.

Steps to reproduce

  1. sessions_spawn a subagent for any task
  2. Let it complete normally (no crash, no abort)
  3. Inspect ~/.openclaw/agents/main/sessions/sessions.json
  4. The entry has status: "running" and no endedAt — even though the session is done
// sessions.json entry for a subagent that ran 3 days ago and completed normally:
{
  "status": "running",      // ← should be "completed" or "ended"
  "updatedAt": 1777134600292,  // 3 days ago
  "sessionId": "d5a486a6-385c-47fd-9279-4a40d7d53afd",
  "model": "MiniMax-M2.7",
  "startedAt": 1777134599292
  // no endedAt written
}

Expected behavior

When a session closes normally, the Gateway should write status to a terminal value ("completed" or "ended") and populate endedAt. On natural close, timeout, or abort — all three paths should finalize the session entry.

Actual behavior

  • status stays "running" on natural close (the most common path)
  • status: "done" or status: "failed" only reflect the last turn's outcome, not the session lifecycle
  • endedAt is only written by sessions cleanup --fix-missing as a side-effect repair, not as part of normal session close
  • session_end lifecycle hooks only fire when a new session starts and the previous session ID differs — not on graceful session close (#57790)

Root cause

The status field was designed to reflect the outcome of the last communication turn, not the session lifecycle (documented in #64103). The session-close path was never wired to write a terminal status value. The lifecycle hook that would finalize sessions.json entries (session_end) is only fired on new-session-handoff, not on session termination.

Examples

ScenarioCurrent resultExpected result
Cron job runs for 45s then completesstatus: "running" persists indefinitelystatus: "completed", endedAt written
Subagent completes long task after 2hstatus: "running" persists indefinitelystatus: "completed", endedAt written
Session times out after wall clock limitstatus: "failed" on last turn, but entry stays openstatus: "timed_out", entry finalized
Main agent session ends on gateway SIGTERMsession_end never fires (#57790)All active entries receive status: "ended"
Interactive session runs 20 min then user goes idlestatus: "running" persists indefinitelystatus: "completed" after idle timeout

Proposed fix

  1. Terminal status values — add "completed", "aborted", "timed_out" as terminal lifecycle states alongside the existing turn-state values. Existing "running" / "done" / "failed" remain for backward compatibility.
  2. Wire session close to terminal status — the Gateway writes the appropriate terminal value when a session ends naturally, due to timeout (#70347), or due to external abort.
  3. session_end on graceful close — expand the session_end hook firing to include graceful session termination, not only new-session-handoff (#57790).
  4. Make eviction status-awaresessions cleanup treats any non-"running" status as an eviction candidate, making pruneAfter functional.
  5. Retroactive cleanup — provide a sessions cleanup --finalize-stale command that marks obviously-dead entries (status: "running" + no matching active .jsonl) as "ended".

Related issues

#Title
#64103Session status field values ("failed", "timeout", "done") mislead agents into spawning duplicate sessions
#67902Subagent sessions left as "running" in sessions.json after crash — no cleanup mechanism
#70347Cron outer timeout should emit lifecycle.error so sessions.json finalizes immediately
#57790Gateway shutdown/restart does not fire session_end for active sessions

extent analysis

TL;DR

Update the Gateway to write a terminal status value when a session ends, and modify the sessions cleanup mechanism to consider non-"running" statuses for eviction.

Guidance

  • Implement terminal status values ("completed", "aborted", "timed_out") alongside existing turn-state values.
  • Wire session close to terminal status by updating the Gateway to write the appropriate terminal value when a session ends naturally, due to timeout, or due to external abort.
  • Expand the session_end hook firing to include graceful session termination, not only new-session-handoff.
  • Modify sessions cleanup to treat any non-"running" status as an eviction candidate.

Example

No code snippet is provided as the issue does not contain sufficient code context.

Notes

The proposed fix involves updating the Gateway and sessions cleanup mechanism, which may require careful testing to ensure backward compatibility and correct functionality.

Recommendation

Apply the proposed fix, which involves updating the Gateway to write terminal status values and modifying the sessions cleanup mechanism, as it addresses the root cause of the issue and provides a comprehensive solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING