openclaw - 💡(How to fix) Fix [Bug]: Gateway CPU pinned at 100%: root causes & workarounds (complements #75688) [6 comments, 6 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75707Fetched 2026-05-02 05:31:20
View on GitHub
Comments
6
Participants
6
Timeline
24
Reactions
6
Timeline (top)
cross-referenced ×10commented ×6subscribed ×6labeled ×2

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Related to #75688 — same version, same symptoms (100% CPU from startup, ~724MB RAM, node.list 20s+ latency). This issue provides identified root causes and working mitigations.

Root Cause

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Fix Action

Fix / Workaround

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Related to #75688 — same version, same symptoms (100% CPU from startup, ~724MB RAM, node.list 20s+ latency). This issue provides identified root causes and working mitigations.

5. plugins.entries.X.enabled: false does not prevent loading

Setting lossless-claw to enabled: false in plugins.entries does not prevent it from loading. The only workaround is to use plugins.allow as a whitelist to explicitly block it.

Code Example

[compaction-safeguard] Compaction safeguard: no real conversation messages to summarize; writing compaction boundary to suppress re-trigger loop.

---

[diagnostic] liveness warning: reasons=event_loop_delay interval=36s eventLoopDelayP99Ms=21.3 eventLoopDelayMaxMs=10351.5 eventLoopUtilization=0.662

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Related to #75688 — same version, same symptoms (100% CPU from startup, ~724MB RAM, node.list 20s+ latency). This issue provides identified root causes and working mitigations.

Environment

  • OS: Ubuntu Linux (systemd user service)
  • Node: v22.22.1
  • OpenClaw: v2026.4.29 (gateway mode)
  • Agent: groq/qwen3-32b (free tier), fallbacks: deepseek-v4-flash, gemini-2.5-flash
  • Channels: WhatsApp (Baileys)
  • Hardware: dedicated Linux server (24GB RAM, 8 cores)

Symptom

Gateway process sits at 100-130% CPU permanently, even with zero inbound messages. The gateway becomes unresponsive or responds with 60s+ delays. Killing and restarting reproduces the issue within minutes.

Root Causes Identified

After extensive debugging, we found multiple independent issues compounding into permanent CPU saturation:

1. Zombie sessions re-launching on every boot (main culprit)

Persisted session files in ~/.openclaw/agents/*/sessions/*.jsonl re-launch "embedded runs" on every gateway start. Even after the user runs /new on WhatsApp, the old session file remains on disk and triggers a new agent run at boot.

2. Compaction safeguard re-trigger loop

When a session has an empty or already-compacted context, the safeguard fires repeatedly:

[compaction-safeguard] Compaction safeguard: no real conversation messages to summarize; writing compaction boundary to suppress re-trigger loop.

Despite the log saying "suppress re-trigger loop", it does not actually stop — it triggers another embedded run toward the LLM on the next cycle.

3. Groq free tier 6000 TPM → fallback cascade with full re-tokenization

Accumulated context (~50k tokens) exceeds Groq's 6000 TPM limit → 413 rejection → fallback to DeepSeek → timeout → fallback to Gemini. Each fallback re-tokenizes the entire context on the Node.js main thread (CPU-bound).

4. Discord slash command deploy retry loop (even when disabled)

With channels.discord.enabled: false, the plugin still attempts to deploy slash commands at boot → gets rate-limited by Discord (429) → retries indefinitely in a tight loop.

5. plugins.entries.X.enabled: false does not prevent loading

Setting lossless-claw to enabled: false in plugins.entries does not prevent it from loading. The only workaround is to use plugins.allow as a whitelist to explicitly block it.

6. V8 GC thrashing — unbounded heap (ref: #13758)

Without --max-old-space-size, the heap grows unbounded with large conversation contexts, causing constant GC thrashing. Related to #13758 / #6413.

7. Plugin runtime staging on every inbound message

31 NPM dependencies are re-resolved on every single inbound message (even if already installed). Takes 1-16 seconds + CPU each time.

Workarounds Applied

WorkaroundCPU Impact
Delete zombie sessions (rm ~/.openclaw/agents/*/sessions/*.jsonl)100%+ → 30%
NODE_OPTIONS=--max-old-space-size=1536 in systemd envReduces GC thrashing
Disable Discord channel entirelyEliminates 429 retry loop
Use plugins.allow whitelist to block unwanted pluginsPrevents parasitic loading
Disable hooks.internal.entries.session-memoryReduces unnecessary disk writes
Set contextTokens: 128000 (was 32000)Stops compaction safeguard loop
Purge entire ~/.openclaw/agents/ directoryClean session reset

After all workarounds: ~15% idle CPU (acceptable), with temporary spikes during message processing (tokenization + model resolution + streaming).

Suggestions

  1. Compaction safeguard should not trigger an embedded run when there's nothing to compact — it should just no-op
  2. Sessions should have a TTL or auto-clean when the user starts a new session
  3. Plugin runtime staging should cache by spec hash instead of re-resolving on every message
  4. plugins.entries.X.enabled: false should be sufficient to prevent loading without needing a plugins.allow whitelist
  5. Disabled channels (enabled: false) should not load any connection logic or attempt external API calls at boot
  6. Model fallback should not re-tokenize the full context from scratch — the token count from the first attempt should be reusable

Diagnostic breadcrumbs

[diagnostic] liveness warning: reasons=event_loop_delay interval=36s eventLoopDelayP99Ms=21.3 eventLoopDelayMaxMs=10351.5 eventLoopUtilization=0.662

Event loop blocked for 10+ seconds during idle — confirms main-thread CPU spin, not I/O wait.

Correlation with #75688

The reporter in #75688 observes the same pattern on macOS ARM64:

  • 100% CPU from startup, never drops
  • node.list latency 21-35s (we also see 9-11s)
  • ~724MB RSS (we see 745MB before fixes)
  • Plugin bundled runtime deps (30-31 specs) staging overhead
  • Web UI polling exacerbates but is not causative

Their CPU profile shows all samples in uv_run → uv__io_poll → uv__stream_io, which is consistent with our finding that the event loop is saturated by synchronous tokenization and plugin resolution work blocking the libuv I/O thread.

The difference: we isolated the causes by disabling components one by one and identified that zombie sessions + compaction safeguard loop are the primary drivers, with plugin staging and disabled-but-still-active channels as amplifiers.

Steps to reproduce

  1. Configure gateway mode with groq/qwen3-32b (free tier) + fallbacks
  2. Enable WhatsApp (Baileys) channel, disable Discord (enabled: false)
  3. Let a few sessions accumulate in ~/.openclaw/agents/*/sessions/
  4. Restart the gateway
  5. Observe CPU immediately climbing to 100%+ with no inbound messages

Expected behavior

Gateway should be near-idle (~1-5% CPU) when no messages are being processed. Fallback cascades
should not trigger CPU-bound re-tokenization. Disabled channels/plugins should not run any logic.

Actual behavior

Gateway sits at 100-130% CPU permanently with zero inbound messages. Responses take 60s+, node.list latency 20s+. Reproduces within minutes of restart

OpenClaw version

v2026.4.29

Operating system

Ubuntu Linux

Install method

No response

Model

groq/qwen3-32b

Provider / routing chain

groq/qwen3-32b → deepseek-v4-flash → gemini-2.5-flash

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

Delete zombie sessions and apply workarounds to mitigate CPU saturation, such as setting NODE_OPTIONS=--max-old-space-size=1536 and disabling unnecessary plugins and channels.

Guidance

  • Identify and remove zombie sessions by deleting files in ~/.openclaw/agents/*/sessions/*.jsonl to prevent re-launching on every boot.
  • Apply the suggested workarounds, such as setting NODE_OPTIONS=--max-old-space-size=1536 to reduce GC thrashing and disabling Discord channel entirely to eliminate the 429 retry loop.
  • Use plugins.allow whitelist to block unwanted plugins and prevent parasitic loading.
  • Consider disabling hooks.internal.entries.session-memory to reduce unnecessary disk writes.
  • Set contextTokens to a higher value (e.g., 128000) to stop the compaction safeguard loop.

Example

No code snippet is provided as the issue is more related to configuration and environment setup.

Notes

The provided workarounds have been tested and shown to reduce CPU usage from 100-130% to ~15% idle CPU. However, the root causes are complex and multifaceted, requiring a combination of fixes to fully mitigate the issue.

Recommendation

Apply the workarounds, as they have been shown to be effective in reducing CPU usage and mitigating the issue. Specifically, delete zombie sessions, set NODE_OPTIONS=--max-old-space-size=1536, and disable unnecessary plugins and channels.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Gateway should be near-idle (~1-5% CPU) when no messages are being processed. Fallback cascades
should not trigger CPU-bound re-tokenization. Disabled channels/plugins should not run any logic.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gateway CPU pinned at 100%: root causes & workarounds (complements #75688) [6 comments, 6 participants]