openclaw - 💡(How to fix) Fix High host load & slow responses caused by stale openclaw worker process accumulation [3 comments, 4 participants]

openclaw2026-05-02 17:12:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#76171•Fetched 2026-05-03 04:41:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3subscribed ×1unsubscribed ×1

On 2026.4.29 (and persisting into 2026.4.30-beta.1), response times degrade to 2-3+ minutes per turn due to stale openclaw worker processes accumulating on the host and driving load average to 25-31.

Root Cause

Code Example

# Example snapshot during degraded state
$ cat /proc/loadavg
31.35 26.12 22.83 27/46029 1142

# openclaw worker count
47 stale openclaw worker processes

---

$ cat /proc/loadavg
22.18 21.38 25.42 20/45524 498

# openclaw worker count
1

RAW_BUFFERClick to expand / collapse

Description

Environment

Version: 2026.4.29 / 2026.4.30-beta.1
Deployment: Railway (Docker, node:24-bookworm-slim)

Behaviour

Worker processes spawn per cron/agent turn but do not exit after completing. Over time they accumulate:

Normal state: 1-2 openclaw workers
Degraded state: 40-50+ openclaw workers
Load average climbs from ~2 to 25-31
Response times go from <5s to 2-3+ minutes per turn

Running a manual cleanup kills the stale workers and temporarily restores normal response times, but they accumulate again within minutes if cron jobs are firing.

# Example snapshot during degraded state
$ cat /proc/loadavg
31.35 26.12 22.83 27/46029 1142

# openclaw worker count
47 stale openclaw worker processes

After cleanup:

$ cat /proc/loadavg
22.18 21.38 25.42 20/45524 498

# openclaw worker count
1

Reproduction

Deploy on Railway with multiple cron jobs firing at >10/hr combined rate
Observe load average climbing over 30-60 minutes
Check worker process count — will be 40-50+

Additional context

Workers are stale/idle (not actively processing) but don't self-terminate
Manually killing workers aged >15min resolves the immediate symptom
The issue is worse when isolated agentTurn cron workers spawn frequently
A cleanup script (cleanup-zombies.py) running every 10 min helps but doesn't fully prevent accumulation at high cron rates
This was first observed and documented on 2026-05-01 and continues on beta.1

Expected behaviour

Worker processes should exit cleanly after completing their task.

extent analysis

TL;DR

Implement a reliable mechanism for openclaw worker processes to self-terminate after completing their tasks to prevent accumulation and load average increase.

Guidance

Investigate why openclaw worker processes are not exiting after completing their tasks, focusing on potential issues with task completion signals or process termination logic.
Consider implementing a timeout or idle detection mechanism to automatically terminate worker processes after a certain period of inactivity (e.g., 15 minutes).
Review the cleanup-zombies.py script to ensure it is effectively removing stale worker processes and consider increasing its frequency or integrating it into the worker process lifecycle.
Analyze cron job frequencies and their impact on worker process accumulation, potentially adjusting job schedules to reduce the combined rate.

Example

No specific code example is provided due to the lack of detailed implementation information, but a generic approach might involve setting up a timeout or using a process manager that can automatically handle idle process termination.

Notes

The exact cause of the worker processes not self-terminating is unclear, and resolving this issue may require deeper investigation into the application's logic and potentially its dependencies or the environment in which it's running.

Recommendation

Apply a workaround by enhancing the cleanup-zombies.py script to run more frequently or integrating an automatic termination mechanism into the worker process lifecycle, as upgrading to a fixed version is not explicitly mentioned as an option in the provided context.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#agent execution #callback error #memory management #API rate limit #retriever error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix High host load & slow responses caused by stale openclaw worker process accumulation [3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Description

Environment

Behaviour

Reproduction

Additional context

Expected behaviour

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix High host load & slow responses caused by stale openclaw worker process accumulation [3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Description

Environment

Behaviour

Reproduction

Additional context

Expected behaviour

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING