openclaw - 💡(How to fix) Fix [Bug]: Gateway CPU 60-80% on ARM64 Linux, event loop blocked 37s (v2026.4.29) [5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76057Fetched 2026-05-03 04:42:49
View on GitHub
Comments
5
Participants
4
Timeline
12
Reactions
3
Author
Timeline (top)
commented ×5mentioned ×4closed ×1subscribed ×1

Gateway process consumes 60-80% CPU on ARM64 Linux (aarch64) with v2026.4.29, even with minimal activity. Event loop is blocked up to 37-40 seconds per agent run, causing Web UI disconnections and 2-3 minute unavailability after each conversation.

Related to #75707 and #75688 — same version, similar symptoms, but on ARM64 architecture with different model provider (xiaomi-coding/mimo-v2.5-pro).

Error Message

2026-05-02T11:07:12.781Z warn diagnostic liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=43s eventLoopDelayP99Ms=40835.7 eventLoopDelayMaxMs=40835.7 eventLoopUtilization=1 cpuCoreRatio=1.029

2026-05-02T11:00:20.687Z warn agent/embedded [trace:embedded-run] prep stages: phase=stream-ready totalMs=58745 stages=core-plugin-tools:9179ms, bundle-tools:8897ms, system-prompt:20228ms, stream-setup:19680ms

Root Cause

Root Causes Identified

Fix Action

Fix / Workaround

Workarounds Applied

WorkaroundEffect
Disabled 3-minute cron jobCPU 80% → 65%, event loop delay 37s → 24s
Cleaned zombie session files (297 → 72)Minor improvement
Set contextTokens: 128000Prevents compaction safeguard loop
Set NODE_OPTIONS=--max-old-space-size=1536 (in systemd)Not yet picked up by gateway
Old plugin-runtime-deps removed (690MB)Minor

After all workarounds: still 62-65% CPU, event loop still blocked during agent runs.

Code Example

core-plugin-tools:  9-13 seconds
bundle-tools:       9-12 seconds
system-prompt:      20-29 seconds  ← largest bottleneck
stream-setup:       19-27 seconds
────────────────────────────────
total prep:         58-80 seconds (blocking event loop)

---

eventLoopDelayP99Ms: 37,000-40,000ms
eventLoopUtilization: 1.0 (saturated)
cpuCoreRatio: 1.017-1.029

---

rchar: 54GB (in ~2 hours of uptime)
syscr: 37 million read syscalls
plugin-runtime-deps: 98,619 files, 12,003 directories, 2.4GB

---

2026-05-02T11:07:12.781Z warn diagnostic liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=43s eventLoopDelayP99Ms=40835.7 eventLoopDelayMaxMs=40835.7 eventLoopUtilization=1 cpuCoreRatio=1.029

2026-05-02T11:00:20.687Z warn agent/embedded [trace:embedded-run] prep stages: phase=stream-ready totalMs=58745 stages=core-plugin-tools:9179ms, bundle-tools:8897ms, system-prompt:20228ms, stream-setup:19680ms
RAW_BUFFERClick to expand / collapse

[Bug]: Gateway CPU 60-80% on ARM64 Linux, event loop blocked 37s (v2026.4.29)

Bug type

Behavior bug (incorrect output/state without crash)

Summary

Gateway process consumes 60-80% CPU on ARM64 Linux (aarch64) with v2026.4.29, even with minimal activity. Event loop is blocked up to 37-40 seconds per agent run, causing Web UI disconnections and 2-3 minute unavailability after each conversation.

Related to #75707 and #75688 — same version, similar symptoms, but on ARM64 architecture with different model provider (xiaomi-coding/mimo-v2.5-pro).

Environment

  • OS: Ubuntu Linux 6.6.89-Gold_bug (aarch64 / ARM64)
  • Node: v22.22.2
  • OpenClaw: v2026.4.29 (a448042)
  • Hardware: 8-core ARM64, 15GB RAM
  • Agent model: xiaomi-coding/mimo-v2.5-pro
  • Channel: openclaw-weixin
  • Cron jobs: 6 total (1 running every 3 minutes)

Symptom

  1. Gateway process (node) consistently at 60-80% CPU, load average 12-14 on 8 cores
  2. Each agent run's prep phase takes 58-80 seconds, blocking the event loop
  3. Web UI disconnects during agent runs and cannot reconnect for 2-3 minutes
  4. Memory grows from ~650MB to ~1GB over time
  5. Restarting (SIGUSR1 or kill) does not resolve the issue — CPU returns to high levels immediately

Diagnostic Data — Agent Run Prep Breakdown

Every embedded-run shows these consistently slow stages:

core-plugin-tools:  9-13 seconds
bundle-tools:       9-12 seconds
system-prompt:      20-29 seconds  ← largest bottleneck
stream-setup:       19-27 seconds
────────────────────────────────
total prep:         58-80 seconds (blocking event loop)

Diagnostic Data — Event Loop

eventLoopDelayP99Ms: 37,000-40,000ms
eventLoopUtilization: 1.0 (saturated)
cpuCoreRatio: 1.017-1.029

Diagnostic Data — I/O

rchar: 54GB (in ~2 hours of uptime)
syscr: 37 million read syscalls
plugin-runtime-deps: 98,619 files, 12,003 directories, 2.4GB

Root Causes Identified

  1. Plugin runtime staging on every agent run — 98K files in ~/.openclaw/plugin-runtime-deps/ are scanned on every embedded-run. core-plugin-tools stage takes 9-13 seconds each time.

  2. system-prompt construction is CPU-bound — Takes 20-29 seconds per run. Likely tokenizing/building tool definitions from 11 skills + 15 plugins on the main thread.

  3. stream-setup blocks event loop — 19-27 seconds per run. Likely re-tokenizing context for the model API call on the main thread.

  4. Cron job amplification — A cron job running every 3 minutes (stock alert monitor) triggered continuous agent runs, each blocking the event loop for 60-80 seconds. After disabling it, CPU dropped from 80% to 65%.

  5. Compaction safeguardcontextTokens was unset (defaulting to 32000), potentially triggering compaction loops. Set to 128000.

Workarounds Applied

WorkaroundEffect
Disabled 3-minute cron jobCPU 80% → 65%, event loop delay 37s → 24s
Cleaned zombie session files (297 → 72)Minor improvement
Set contextTokens: 128000Prevents compaction safeguard loop
Set NODE_OPTIONS=--max-old-space-size=1536 (in systemd)Not yet picked up by gateway
Old plugin-runtime-deps removed (690MB)Minor

After all workarounds: still 62-65% CPU, event loop still blocked during agent runs.

Key Difference from #75707

The reporter in #75707 achieved ~15% idle CPU after workarounds. Our case still shows 62-65% CPU. Possible reasons:

  • ARM64 architecture may have slower file I/O for the 98K plugin-runtime-deps
  • The system-prompt and stream-setup stages may have ARM64-specific performance issues
  • NODE_OPTIONS was not picked up (gateway restarted by OpenClaw process manager, not systemd)

Additional Observations

  • plugins.allow whitelist is configured with 15 plugins — unused providers' dependencies (253MB @anthropic-ai, 193MB @zed-industries, 172MB @openai, 118MB @lancedb, etc.) are still installed and scanned
  • plugins.entries.X.enabled: false does not prevent dependency loading (confirmed per #75707)
  • Web UI disconnection is directly correlated with event loop blocking during agent prep

Suggestions

  1. Cache plugin-runtime-deps — Don't re-scan 98K files on every agent run; cache by spec hash
  2. Move heavy prep off main thread — system-prompt and stream-setup should not block the event loop
  3. ARM64 optimization — File I/O for large node_modules may need ARM64-specific optimization
  4. Respect plugins.allow for dependency loading — Only load dependencies for allowed plugins
  5. NODE_OPTIONS should be configurable in gateway config, not just via systemd env

Logs

2026-05-02T11:07:12.781Z warn diagnostic liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu interval=43s eventLoopDelayP99Ms=40835.7 eventLoopDelayMaxMs=40835.7 eventLoopUtilization=1 cpuCoreRatio=1.029

2026-05-02T11:00:20.687Z warn agent/embedded [trace:embedded-run] prep stages: phase=stream-ready totalMs=58745 stages=core-plugin-tools:9179ms, bundle-tools:8897ms, system-prompt:20228ms, stream-setup:19680ms

extent analysis

TL;DR

Cache plugin-runtime-deps and move heavy prep stages off the main thread to reduce CPU usage and event loop blocking.

Guidance

  • Identify and implement caching for plugin-runtime-deps to avoid re-scanning 98K files on every agent run, potentially using a spec hash-based cache.
  • Offload CPU-bound stages like system-prompt and stream-setup to a separate thread or process to prevent event loop blocking.
  • Investigate ARM64-specific optimizations for file I/O operations, as the current architecture may be contributing to performance issues.
  • Review and adjust the plugins.allow configuration to ensure only necessary dependencies are loaded, reducing unnecessary overhead.
  • Verify that NODE_OPTIONS are correctly applied and consider making them configurable within the gateway config.

Example

No specific code example is provided, as the issue requires a more architectural and configuration-based solution.

Notes

The provided workarounds have shown some improvement, but further optimization is needed to address the remaining CPU usage and event loop blocking issues. The ARM64 architecture and specific plugin configurations may require tailored solutions.

Recommendation

Apply workaround: Implement caching for plugin-runtime-deps and offload heavy prep stages to reduce CPU usage and event loop blocking, as these changes are likely to have the most significant impact on resolving the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING