openclaw - 💡(How to fix) Fix Background tasks stay 'running' forever after gateway crash/restart, blocking channel reload + saturating CPU [1 comments, 2 participants]

openclaw2026-04-30 23:39:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75307•Fetched 2026-05-01 05:35:27

View on GitHub

Comments

Participants

Timeline

Reactions

Author

xonaman

Participants

clawsweeper[bot]

xonaman

Timeline (top)

closed ×1commented ×1

When the gateway process crashes or is restarted while background task runs (cron, subagent, cli) are in-flight, the task records remain in 'running' status indefinitely. They block channel reload, accumulate across restarts, and eventually saturate the event loop.

Root Cause

Fix Action

Workaround

Manual openclaw gateway restart (full process replace, not SIGUSR1).

Code Example

[reload] channel reload still deferred after 11079287ms with 4 operation(s), 1 reply(ies), 2 embedded run(s), 32 task run(s) active
[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=5259.7 eventLoopUtilization=1 cpuCoreRatio=0.993
[diagnostic] stuck session: sessionId=... state=processing age=328s queueDepth=1

RAW_BUFFERClick to expand / collapse

Summary

Repro

Trigger an isolated agentTurn cron with a long timeout (e.g. opus-backed nightly pass)
While running, restart the gateway (openclaw gateway restart) or let it crash
After restart: openclaw tasks shows the run still in 'running' state — forever

Observed in production

Gateway PID 2087813, ~3h45min uptime
88.7% CPU sustained on 1-core VPS
32 task runs stuck 'running' (most for hours)
Channel reload deferred 11,111,502ms (~3h) with 'still deferred' diagnostic logged every ~30s
User-facing session 'stuck processing' for 200-328s before model started
SIGUSR1 hot-reload didn't clear them; only openclaw gateway restart (full process replace) recovered. CPU dropped 88.7% → 5%

Logs (excerpt)

[reload] channel reload still deferred after 11079287ms with 4 operation(s), 1 reply(ies), 2 embedded run(s), 32 task run(s) active
[diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=5259.7 eventLoopUtilization=1 cpuCoreRatio=0.993
[diagnostic] stuck session: sessionId=... state=processing age=328s queueDepth=1

Expected

On gateway boot: any background task run whose owning process/session is no longer alive should auto-reconcile to 'lost' (or 'interrupted') status. Channel reload should not be blocked indefinitely by stale task slots.

Workaround

Manual openclaw gateway restart (full process replace, not SIGUSR1).

Version

2026.4.27 (cbc2ba0)

extent analysis

TL;DR

Implement a mechanism to auto-reconcile background task runs to 'lost' or 'interrupted' status when the gateway process restarts or crashes.

Guidance

Investigate the current task management system to identify why task records remain in 'running' status after the gateway process crashes or restarts.
Implement a check during gateway boot to identify and reconcile task runs whose owning process/session is no longer alive.
Consider adding a timeout or heartbeat mechanism to detect and handle stuck or inactive task runs.
Review the channel reload logic to ensure it can handle task runs in 'lost' or 'interrupted' status without blocking indefinitely.

Example

No specific code snippet can be provided without more information about the task management system and channel reload logic.

Notes

The provided workaround of manual openclaw gateway restart suggests that a full process replace is required to recover from the issue, implying that the current implementation does not handle task run reconciliation properly.

Recommendation

Apply a workaround by implementing a task run reconciliation mechanism during gateway boot, as the current implementation does not handle task run status updates correctly after a crash or restart.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#API middleware #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Background tasks stay 'running' forever after gateway crash/restart, blocking channel reload + saturating CPU [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Repro

Observed in production

Logs (excerpt)

Expected

Workaround

Version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Background tasks stay 'running' forever after gateway crash/restart, blocking channel reload + saturating CPU [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Repro

Observed in production

Logs (excerpt)

Expected

Workaround

Version

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING