openclaw - 💡(How to fix) Fix High host load & slow responses caused by stale openclaw worker process accumulation [3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76171Fetched 2026-05-03 04:41:22
View on GitHub
Comments
3
Participants
4
Timeline
5
Reactions
4
Timeline (top)
commented ×3subscribed ×1unsubscribed ×1

On 2026.4.29 (and persisting into 2026.4.30-beta.1), response times degrade to 2-3+ minutes per turn due to stale openclaw worker processes accumulating on the host and driving load average to 25-31.

Root Cause

On 2026.4.29 (and persisting into 2026.4.30-beta.1), response times degrade to 2-3+ minutes per turn due to stale openclaw worker processes accumulating on the host and driving load average to 25-31.

Code Example

# Example snapshot during degraded state
$ cat /proc/loadavg
31.35 26.12 22.83 27/46029 1142

# openclaw worker count
47 stale openclaw worker processes

---

$ cat /proc/loadavg
22.18 21.38 25.42 20/45524 498

# openclaw worker count
1
RAW_BUFFERClick to expand / collapse

Description

On 2026.4.29 (and persisting into 2026.4.30-beta.1), response times degrade to 2-3+ minutes per turn due to stale openclaw worker processes accumulating on the host and driving load average to 25-31.

Environment

  • Version: 2026.4.29 / 2026.4.30-beta.1
  • Deployment: Railway (Docker, node:24-bookworm-slim)

Behaviour

Worker processes spawn per cron/agent turn but do not exit after completing. Over time they accumulate:

  • Normal state: 1-2 openclaw workers
  • Degraded state: 40-50+ openclaw workers
  • Load average climbs from ~2 to 25-31
  • Response times go from <5s to 2-3+ minutes per turn

Running a manual cleanup kills the stale workers and temporarily restores normal response times, but they accumulate again within minutes if cron jobs are firing.

# Example snapshot during degraded state
$ cat /proc/loadavg
31.35 26.12 22.83 27/46029 1142

# openclaw worker count
47 stale openclaw worker processes

After cleanup:

$ cat /proc/loadavg
22.18 21.38 25.42 20/45524 498

# openclaw worker count
1

Reproduction

  1. Deploy on Railway with multiple cron jobs firing at >10/hr combined rate
  2. Observe load average climbing over 30-60 minutes
  3. Check worker process count — will be 40-50+

Additional context

  • Workers are stale/idle (not actively processing) but don't self-terminate
  • Manually killing workers aged >15min resolves the immediate symptom
  • The issue is worse when isolated agentTurn cron workers spawn frequently
  • A cleanup script (cleanup-zombies.py) running every 10 min helps but doesn't fully prevent accumulation at high cron rates
  • This was first observed and documented on 2026-05-01 and continues on beta.1

Expected behaviour

Worker processes should exit cleanly after completing their task.

extent analysis

TL;DR

Implement a reliable mechanism for openclaw worker processes to self-terminate after completing their tasks to prevent accumulation and load average increase.

Guidance

  • Investigate why openclaw worker processes are not exiting after completing their tasks, focusing on potential issues with task completion signals or process termination logic.
  • Consider implementing a timeout or idle detection mechanism to automatically terminate worker processes after a certain period of inactivity (e.g., 15 minutes).
  • Review the cleanup-zombies.py script to ensure it is effectively removing stale worker processes and consider increasing its frequency or integrating it into the worker process lifecycle.
  • Analyze cron job frequencies and their impact on worker process accumulation, potentially adjusting job schedules to reduce the combined rate.

Example

No specific code example is provided due to the lack of detailed implementation information, but a generic approach might involve setting up a timeout or using a process manager that can automatically handle idle process termination.

Notes

The exact cause of the worker processes not self-terminating is unclear, and resolving this issue may require deeper investigation into the application's logic and potentially its dependencies or the environment in which it's running.

Recommendation

Apply a workaround by enhancing the cleanup-zombies.py script to run more frequently or integrating an automatic termination mechanism into the worker process lifecycle, as upgrading to a fixed version is not explicitly mentioned as an option in the provided context.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix High host load & slow responses caused by stale openclaw worker process accumulation [3 comments, 4 participants]