openclaw - 💡(How to fix) Fix [Bug]: Gateway becomes CPU-saturated and agents stop progressing mid-work [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74404Fetched 2026-04-30 06:24:17
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
4
Timeline (top)
mentioned ×2subscribed ×2commented ×1labeled ×1

Issues started when upgrading to 2026.4.23 and up, with last version 2026.4.25. Every version got worse, higher cpu and slower startup and agents slow response:

OpenClaw Gateway becomes CPU-saturated and agents stop progressing mid-work until I send another message. During the bad state, simple HTTP health endpoints can respond, but Gateway WS/RPC commands time out or close during connect.

Error Message

"error": "cron: job execution timed out"

Root Cause

Notes

I have not confirmed the root cause. The strongest evidence is high CPU across many libuv-worker threads plus repeated stuck-session diagnostics and WS/RPC failures.

Code Example

## Evidence

Process list while CPU was high:
text
openclaw-gateway 504% CPU
Per-thread CPU showed many hot `libuv-worker` threads:
text
libuv-worker 78.4%
libuv-worker 63.9%
libuv-worker 46.5%

ibuv-worker 45.1%
...
openclaw-gateway main thread ~21%
`strace -f -p <gateway-pid> -c` showed activity including:
text
epoll_pwait
statx
futex
write/read
Path-level strace showed repeated package/plugin/cron-state paths:
text

/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/package.json = ENOENT
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/package.json
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/extensions
/home/najef/.openclaw/cron/jobs-state.json
Logs repeatedly showed a stuck heartbeat session:
text
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6733s queueDepth=0
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6763s queueDepth=0
...
A cron task also timed out while this was happening:
json

{
  "runtime": "cron",
  "label": "Orion Heartbeat",
  "agentId": "neon",
  "childSessionKey": "agent:neon:telegram:direct:2101884310",
  "status": "timed_out",
  "error": "cron: job execution timed out"
}
## Expected behavior

Gateway should recover or fail stuck heartbeat/session work, not remain in `processing` for hours, and WS/RPC should not become unusable while HTTP health endpoints still respond.

## Notes
I have not confirmed the root cause. The strongest evidence is high CPU across many `libuv-worker` threads plus repeated stuck-session diagnostics and WS/RPC failures.
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Issues started when upgrading to 2026.4.23 and up, with last version 2026.4.25. Every version got worse, higher cpu and slower startup and agents slow response:

OpenClaw Gateway becomes CPU-saturated and agents stop progressing mid-work until I send another message. During the bad state, simple HTTP health endpoints can respond, but Gateway WS/RPC commands time out or close during connect.

Environment

  • OpenClaw: 2026.4.26 (be8c246)
  • OS: Ubuntu 24.04.4 LTS
  • Node: v24.14.1
  • Install: npm global via nvm
  • Gateway bind: loopback, port 18789
  • systemd: not installed

Symptoms

  • Agents stop mid-work and only continue after I send another message.
  • openclaw-gateway reaches very high CPU:
    • observed 322% CPU
    • later observed 504% CPU
  • Gateway RPC becomes unreliable:
    • openclaw gateway stability --json --limit 100 failed with gateway timeout after 10000ms
    • openclaw cron show ... failed with gateway closed (1000 normal closure): no close reason
    • openclaw logs --plain ... failed with gateway closed (1000)
  • HTTP liveness still worked:
    • curl http://127.0.0.1:18789/healthz{"ok":true,"status":"live"}
    • curl http://[::1]:18789/healthz{"ok":true,"status":"live"}
    • /readyz returned {"ready":true,"failing":[]} at least once

Steps to reproduce

use 2026.4.23/24/25/26 or check online people complaining.

Expected behavior

in 2026.4.22 agents still respond quickly. Gateway uses max 10% at peak requests.

Actual behavior

CPU goes to 100% after a few minutes after gateway has started. Agents respond but slower and slower until they also stop during a session.

OpenClaw version

2026.4.23/24/25/26

Operating system

Ubuntu 24.04.4 LTS

Install method

npm global via nvm

Model

Minimax/Minimax m2.7

Provider / routing chain

Minimax/Minimax m2.7

Additional provider/model setup details

No response

Logs, screenshots, and evidence

## Evidence

Process list while CPU was high:
text
openclaw-gateway 504% CPU
Per-thread CPU showed many hot `libuv-worker` threads:
text
libuv-worker 78.4%
libuv-worker 63.9%
libuv-worker 46.5%

ibuv-worker 45.1%
...
openclaw-gateway main thread ~21%
`strace -f -p <gateway-pid> -c` showed activity including:
text
epoll_pwait
statx
futex
write/read
Path-level strace showed repeated package/plugin/cron-state paths:
text

/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/package.json = ENOENT
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/package.json
/home/najef/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/dist/extensions
/home/najef/.openclaw/cron/jobs-state.json
Logs repeatedly showed a stuck heartbeat session:
text
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6733s queueDepth=0
stuck session: sessionId=arnold sessionKey=agent:arnold:main:heartbeat state=processing age=6763s queueDepth=0
...
A cron task also timed out while this was happening:
json

{
  "runtime": "cron",
  "label": "Orion Heartbeat",
  "agentId": "neon",
  "childSessionKey": "agent:neon:telegram:direct:2101884310",
  "status": "timed_out",
  "error": "cron: job execution timed out"
}
## Expected behavior

Gateway should recover or fail stuck heartbeat/session work, not remain in `processing` for hours, and WS/RPC should not become unusable while HTTP health endpoints still respond.

## Notes
I have not confirmed the root cause. The strongest evidence is high CPU across many `libuv-worker` threads plus repeated stuck-session diagnostics and WS/RPC failures.

Impact and severity

Newer versions of OpenClaw are useless for me, they hang, and agents become non responsive.

Additional information

No response

extent analysis

TL;DR

Downgrade to OpenClaw version 2026.4.22 to potentially resolve the CPU saturation and agent responsiveness issues.

Guidance

  • Investigate the libuv-worker threads to understand what is causing the high CPU usage, as they seem to be the primary contributors to the issue.
  • Review the cron task timeouts and stuck session diagnostics to see if there's a correlation between these events and the CPU saturation.
  • Consider monitoring the system resources and OpenClaw performance after downgrading to version 2026.4.22 to verify if the issue is resolved.
  • Look into the changes made in versions 2026.4.23 and later to identify potential causes of the regression.

Example

No specific code snippet can be provided without further information on the changes made in the newer versions of OpenClaw.

Notes

The root cause of the issue is still unknown, and downgrading to version 2026.4.22 is only a potential workaround. Further investigation is needed to identify the underlying cause and develop a permanent fix.

Recommendation

Apply the workaround of downgrading to OpenClaw version 2026.4.22, as it is the last known stable version that does not exhibit the CPU saturation and agent responsiveness issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Gateway should recover or fail stuck heartbeat/session work, not remain in processing for hours, and WS/RPC should not become unusable while HTTP health endpoints still respond.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gateway becomes CPU-saturated and agents stop progressing mid-work [1 comments, 2 participants]