openclaw - 💡(How to fix) Fix Background tasks stay 'running' forever after gateway crash/restart, blocking channel reload + saturating CPU [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75307Fetched 2026-05-01 05:35:27
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Author
Timeline (top)
closed ×1commented ×1

When the gateway process crashes or is restarted while background task runs (cron, subagent, cli) are in-flight, the task records remain in 'running' status indefinitely. They block channel reload, accumulate across restarts, and eventually saturate the event loop.

Root Cause

When the gateway process crashes or is restarted while background task runs (cron, subagent, cli) are in-flight, the task records remain in 'running' status indefinitely. They block channel reload, accumulate across restarts, and eventually saturate the event loop.

Fix Action

Workaround

Manual openclaw gateway restart (full process replace, not SIGUSR1).

Code Example

[reload] channel reload still deferred after 11079287ms with 4 operation(s), 1 reply(ies), 2 embedded run(s), 32 task run(s) active
[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=5259.7 eventLoopUtilization=1 cpuCoreRatio=0.993
[diagnostic] stuck session: sessionId=... state=processing age=328s queueDepth=1
RAW_BUFFERClick to expand / collapse

Summary

When the gateway process crashes or is restarted while background task runs (cron, subagent, cli) are in-flight, the task records remain in 'running' status indefinitely. They block channel reload, accumulate across restarts, and eventually saturate the event loop.

Repro

  1. Trigger an isolated agentTurn cron with a long timeout (e.g. opus-backed nightly pass)
  2. While running, restart the gateway (openclaw gateway restart) or let it crash
  3. After restart: openclaw tasks shows the run still in 'running' state — forever

Observed in production

  • Gateway PID 2087813, ~3h45min uptime
  • 88.7% CPU sustained on 1-core VPS
  • 32 task runs stuck 'running' (most for hours)
  • Channel reload deferred 11,111,502ms (~3h) with 'still deferred' diagnostic logged every ~30s
  • User-facing session 'stuck processing' for 200-328s before model started
  • SIGUSR1 hot-reload didn't clear them; only openclaw gateway restart (full process replace) recovered. CPU dropped 88.7% → 5%

Logs (excerpt)

[reload] channel reload still deferred after 11079287ms with 4 operation(s), 1 reply(ies), 2 embedded run(s), 32 task run(s) active
[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=5259.7 eventLoopUtilization=1 cpuCoreRatio=0.993
[diagnostic] stuck session: sessionId=... state=processing age=328s queueDepth=1

Expected

On gateway boot: any background task run whose owning process/session is no longer alive should auto-reconcile to 'lost' (or 'interrupted') status. Channel reload should not be blocked indefinitely by stale task slots.

Workaround

Manual openclaw gateway restart (full process replace, not SIGUSR1).

Version

2026.4.27 (cbc2ba0)

extent analysis

TL;DR

Implement a mechanism to auto-reconcile background task runs to 'lost' or 'interrupted' status when the gateway process restarts or crashes.

Guidance

  • Investigate the current task management system to identify why task records remain in 'running' status after the gateway process crashes or restarts.
  • Implement a check during gateway boot to identify and reconcile task runs whose owning process/session is no longer alive.
  • Consider adding a timeout or heartbeat mechanism to detect and handle stuck or inactive task runs.
  • Review the channel reload logic to ensure it can handle task runs in 'lost' or 'interrupted' status without blocking indefinitely.

Example

No specific code snippet can be provided without more information about the task management system and channel reload logic.

Notes

The provided workaround of manual openclaw gateway restart suggests that a full process replace is required to recover from the issue, implying that the current implementation does not handle task run reconciliation properly.

Recommendation

Apply a workaround by implementing a task run reconciliation mechanism during gateway boot, as the current implementation does not handle task run status updates correctly after a crash or restart.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Background tasks stay 'running' forever after gateway crash/restart, blocking channel reload + saturating CPU [1 comments, 2 participants]