openclaw - 💡(How to fix) Fix Session lane stuck in 'running' after run dies — sessions.abort + gateway restart fail to clear stale state [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59878Fetched 2026-04-08 02:39:22
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Error Message

  • No error surfaced to the user — just permanent silence

Fix Action

Workaround

Sending /new via sessions.send RPC clears the stale state:

openclaw gateway call sessions.send \
  --params '{"key":"agent:main:main","message":"/new"}' \
  --timeout 15000

We have deployed a watchdog script that runs every 10 minutes, detects sessions stuck in "running" beyond 20 minutes with no active run, and auto-recovers via /new.

Code Example

agent:main:home-tablet-kitchen:  running for 10,571 min (7.3 days)
agent:main:home-tablet-test2:    running for 9,983 min
agent:main:home-tablet-test3:    running for 9,980 min
agent:main:default:              running for 2,888 min (2 days)
agent:main:web_..._7ad797481643: running for 1,309 min
agent:main:cron:721fa3b5-...:    running for 949 min
agent:home:default:              running for 203 min
agent:home:debug:                running for 189 min

---

lane wait exceeded: lane=session:agent:main:main waitedMs=1309962 queueAhead=2
lane wait exceeded: lane=session:agent:main:main waitedMs=842773 queueAhead=1
lane wait exceeded: lane=session:agent:main:main waitedMs=539336 queueAhead=0

---

openclaw gateway call sessions.send \
  --params '{"key":"agent:main:main","message":"/new"}' \
  --timeout 15000
RAW_BUFFERClick to expand / collapse

Bug Description

Session lanes get permanently stuck in status: "running" after the actual LLM run dies. The stale running state is not cleared by:

  1. sessions.abort RPC (returns {"status": "no-active-run"} confirming the run is dead)
  2. Gateway restart (stale state persists through restart)

This blocks the session lane indefinitely — every new message queues behind the dead lock and is never processed.

Scope

This is broader than #16331 (compaction-triggered lane jams). Any run that dies without proper lifecycle cleanup can cause this — we observed it across webchat, cron, home-node, and default sessions simultaneously.

Evidence (2026-04-02, version 2026.3.24)

11 zombie sessions found in a single check, some stuck for over a week:

agent:main:home-tablet-kitchen:  running for 10,571 min (7.3 days)
agent:main:home-tablet-test2:    running for 9,983 min
agent:main:home-tablet-test3:    running for 9,980 min
agent:main:default:              running for 2,888 min (2 days)
agent:main:web_..._7ad797481643: running for 1,309 min
agent:main:cron:721fa3b5-...:    running for 949 min
agent:home:default:              running for 203 min
agent:home:debug:                running for 189 min

Every single one returned "no-active-run" on sessions.abort.

Gateway logs show lane wait warnings going back hours:

lane wait exceeded: lane=session:agent:main:main waitedMs=1309962 queueAhead=2
lane wait exceeded: lane=session:agent:main:main waitedMs=842773 queueAhead=1
lane wait exceeded: lane=session:agent:main:main waitedMs=539336 queueAhead=0

Impact

  • Multiple users unable to get responses on webchat (messages silently queue forever)
  • No error surfaced to the user — just permanent silence
  • Affects any session type (webchat, cron, home-node, default)

Workaround

Sending /new via sessions.send RPC clears the stale state:

openclaw gateway call sessions.send \
  --params '{"key":"agent:main:main","message":"/new"}' \
  --timeout 15000

We have deployed a watchdog script that runs every 10 minutes, detects sessions stuck in "running" beyond 20 minutes with no active run, and auto-recovers via /new.

Expected Behavior

  1. sessions.abort should clear stale running state when no active run exists
  2. Gateway restart should reconcile session status against actual run state
  3. Ideally: a session-lane timeout that auto-recovers after a configurable threshold

Environment

  • OpenClaw 2026.3.24 (cff6dc9)
  • macOS ARM64 (Mac Studio)
  • Node v25.8.1
  • Model: anthropic/claude-opus-4-6 (1M context)

extent analysis

TL;DR

Sending a /new message via sessions.send RPC can clear the stale "running" state of a session lane.

Guidance

  • Verify that the session lane is stuck in "running" state by checking the session status and ensuring there is no active run.
  • Use the sessions.send RPC with a /new message to clear the stale state, as shown in the provided workaround command.
  • Consider implementing a watchdog script to automatically detect and recover stuck sessions, similar to the one deployed in the issue description.
  • Review the gateway logs for lane wait warnings to identify potential issues with session lanes.

Example

openclaw gateway call sessions.send \
  --params '{"key":"agent:main:main","message":"/new"}' \
  --timeout 15000

Notes

The provided workaround may not be a permanent fix, and the root cause of the issue may still need to be addressed. The watchdog script can help mitigate the issue, but it may not cover all scenarios.

Recommendation

Apply the workaround by sending a /new message via sessions.send RPC to clear the stale state, as it has been shown to be effective in resolving the issue. This can be done manually or through a watchdog script.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Session lane stuck in 'running' after run dies — sessions.abort + gateway restart fail to clear stale state [1 participants]