openclaw - 💡(How to fix) Fix [Bug]: Gateway becomes CPU-saturated and agents stop progressing mid-work [1 comments, 2 participants]

openclaw2026-04-29 14:38:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74404•Fetched 2026-04-30 06:24:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

najef1979-code

Participants

clawsweeper[bot]

najef1979-code

Timeline (top)

mentioned ×2subscribed ×2commented ×1labeled ×1

Issues started when upgrading to 2026.4.23 and up, with last version 2026.4.25. Every version got worse, higher cpu and slower startup and agents slow response:

OpenClaw Gateway becomes CPU-saturated and agents stop progressing mid-work until I send another message. During the bad state, simple HTTP health endpoints can respond, but Gateway WS/RPC commands time out or close during connect.

Error Message

"error": "cron: job execution timed out"

Root Cause

Notes

I have not confirmed the root cause. The strongest evidence is high CPU across many libuv-worker threads plus repeated stuck-session diagnostics and WS/RPC failures.

Code Example

## Evidence

Process list while CPU was high:
text
openclaw-gateway 504% CPU
Per-thread CPU showed many hot `libuv-worker` threads:
text
libuv-worker 78.4%
libuv-worker 63.9%
libuv-worker 46.5%

ibuv-worker 45.1%
...
openclaw-gateway main thread ~21%
`strace -f -p <gateway-pid> -c` showed activity including:
text
epoll_pwait
statx
futex
write/read
Path-level strace showed repeated package/plugin/cron-state paths:
text

/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/package.json = ENOENT
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/package.json
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/extensions
/home/najef/.openclaw/cron/jobs-state.json
Logs repeatedly showed a stuck heartbeat session:
text
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6733s queueDepth=0
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6763s queueDepth=0
...
A cron task also timed out while this was happening:
json

{
  "runtime": "cron",
  "label": "Orion Heartbeat",
  "agentId": "neon",
  "childSessionKey": "agent:neon:telegram:direct:2101884310",
  "status": "timed_out",
  "error": "cron: job execution timed out"
}
## Expected behavior

Gateway should recover or fail stuck heartbeat/session work, not remain in `processing` for hours, and WS/RPC should not become unusable while HTTP health endpoints still respond.

## Notes
I have not confirmed the root cause. The strongest evidence is high CPU across many `libuv-worker` threads plus repeated stuck-session diagnostics and WS/RPC failures.

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Issues started when upgrading to 2026.4.23 and up, with last version 2026.4.25. Every version got worse, higher cpu and slower startup and agents slow response:

Environment

OpenClaw: 2026.4.26 (be8c246)
OS: Ubuntu 24.04.4 LTS
Node: v24.14.1
Install: npm global via nvm
Gateway bind: loopback, port 18789
systemd: not installed

Symptoms

Agents stop mid-work and only continue after I send another message.
openclaw-gateway reaches very high CPU:
- observed 322% CPU
- later observed 504% CPU
Gateway RPC becomes unreliable:
- openclaw gateway stability --json --limit 100 failed with gateway timeout after 10000ms
- openclaw cron show ... failed with gateway closed (1000 normal closure): no close reason
- openclaw logs --plain ... failed with gateway closed (1000)
HTTP liveness still worked:
- curl http://127.0.0.1:18789/healthz → {"ok":true,"status":"live"}
- curl http://[::1]:18789/healthz → {"ok":true,"status":"live"}
- /readyz returned {"ready":true,"failing":[]} at least once

Steps to reproduce

use 2026.4.23/24/25/26 or check online people complaining.

Expected behavior

in 2026.4.22 agents still respond quickly. Gateway uses max 10% at peak requests.

Actual behavior

CPU goes to 100% after a few minutes after gateway has started. Agents respond but slower and slower until they also stop during a session.

OpenClaw version

2026.4.23/24/25/26

Operating system

Ubuntu 24.04.4 LTS

Install method

npm global via nvm

Model

Minimax/Minimax m2.7

Provider / routing chain

Minimax/Minimax m2.7

Additional provider/model setup details

No response

Logs, screenshots, and evidence

## Evidence

Process list while CPU was high:
text
openclaw-gateway 504% CPU
Per-thread CPU showed many hot `libuv-worker` threads:
text
libuv-worker 78.4%
libuv-worker 63.9%
libuv-worker 46.5%

ibuv-worker 45.1%
...
openclaw-gateway main thread ~21%
`strace -f -p <gateway-pid> -c` showed activity including:
text
epoll_pwait
statx
futex
write/read
Path-level strace showed repeated package/plugin/cron-state paths:
text

/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/package.json = ENOENT
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/package.json
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/extensions
/home/najef/.openclaw/cron/jobs-state.json
Logs repeatedly showed a stuck heartbeat session:
text
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6733s queueDepth=0
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6763s queueDepth=0
...
A cron task also timed out while this was happening:
json

{
  "runtime": "cron",
  "label": "Orion Heartbeat",
  "agentId": "neon",
  "childSessionKey": "agent:neon:telegram:direct:2101884310",
  "status": "timed_out",
  "error": "cron: job execution timed out"
}
## Expected behavior

Gateway should recover or fail stuck heartbeat/session work, not remain in `processing` for hours, and WS/RPC should not become unusable while HTTP health endpoints still respond.

## Notes
I have not confirmed the root cause. The strongest evidence is high CPU across many `libuv-worker` threads plus repeated stuck-session diagnostics and WS/RPC failures.

Impact and severity

Newer versions of OpenClaw are useless for me, they hang, and agents become non responsive.

Additional information

No response

extent analysis

TL;DR

Downgrade to OpenClaw version 2026.4.22 to potentially resolve the CPU saturation and agent responsiveness issues.

Guidance

Investigate the libuv-worker threads to understand what is causing the high CPU usage, as they seem to be the primary contributors to the issue.
Review the cron task timeouts and stuck session diagnostics to see if there's a correlation between these events and the CPU saturation.
Consider monitoring the system resources and OpenClaw performance after downgrading to version 2026.4.22 to verify if the issue is resolved.
Look into the changes made in versions 2026.4.23 and later to identify potential causes of the regression.

Example

No specific code snippet can be provided without further information on the changes made in the newer versions of OpenClaw.

Notes

The root cause of the issue is still unknown, and downgrading to version 2026.4.22 is only a potential workaround. Further investigation is needed to identify the underlying cause and develop a permanent fix.

Recommendation

Apply the workaround of downgrading to OpenClaw version 2026.4.22, as it is the last known stable version that does not exhibit the CPU saturation and agent responsiveness issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Gateway should recover or fail stuck heartbeat/session work, not remain in processing for hours, and WS/RPC should not become unusable while HTTP health endpoints still respond.

#ssr #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: Gateway becomes CPU-saturated and agents stop progressing mid-work [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Notes

Code Example

Bug type

Beta release blocker

Summary

Environment

Symptoms

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING