openclaw - 💡(How to fix) Fix [Bug]: Gateway CPU pinned at 100%: root causes & workarounds (complements #75688) [6 comments, 6 participants]

openclaw2026-05-01 15:17:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75707•Fetched 2026-05-02 05:31:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×10commented ×6subscribed ×6labeled ×2

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Related to #75688 — same version, same symptoms (100% CPU from startup, ~724MB RAM, node.list 20s+ latency). This issue provides identified root causes and working mitigations.

Root Cause

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Fix Action

Fix / Workaround

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Related to #75688 — same version, same symptoms (100% CPU from startup, ~724MB RAM, node.list 20s+ latency). This issue provides identified root causes and working mitigations.

5. `plugins.entries.X.enabled: false` does not prevent loading

Setting lossless-claw to enabled: false in plugins.entries does not prevent it from loading. The only workaround is to use plugins.allow as a whitelist to explicitly block it.

Code Example

[compaction-safeguard] Compaction safeguard: no real conversation messages to summarize; writing compaction boundary to suppress re-trigger loop.

---

[diagnostic] liveness warning: reasons=event_loop_delay interval=36s eventLoopDelayP99Ms=21.3 eventLoopDelayMaxMs=10351.5 eventLoopUtilization=0.662

---

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Related to #75688 — same version, same symptoms (100% CPU from startup, ~724MB RAM, node.list 20s+ latency). This issue provides identified root causes and working mitigations.

Environment

OS: Ubuntu Linux (systemd user service)
Node: v22.22.1
OpenClaw: v2026.4.29 (gateway mode)
Agent: groq/qwen3-32b (free tier), fallbacks: deepseek-v4-flash, gemini-2.5-flash
Channels: WhatsApp (Baileys)
Hardware: dedicated Linux server (24GB RAM, 8 cores)

Symptom

Gateway process sits at 100-130% CPU permanently, even with zero inbound messages. The gateway becomes unresponsive or responds with 60s+ delays. Killing and restarting reproduces the issue within minutes.

Root Causes Identified

After extensive debugging, we found multiple independent issues compounding into permanent CPU saturation:

1. Zombie sessions re-launching on every boot (main culprit)

Persisted session files in ~/.openclaw/agents/*/sessions/*.jsonl re-launch "embedded runs" on every gateway start. Even after the user runs /new on WhatsApp, the old session file remains on disk and triggers a new agent run at boot.

2. Compaction safeguard re-trigger loop

When a session has an empty or already-compacted context, the safeguard fires repeatedly:

[compaction-safeguard] Compaction safeguard: no real conversation messages to summarize; writing compaction boundary to suppress re-trigger loop.

Despite the log saying "suppress re-trigger loop", it does not actually stop — it triggers another embedded run toward the LLM on the next cycle.

3. Groq free tier 6000 TPM → fallback cascade with full re-tokenization

Accumulated context (~50k tokens) exceeds Groq's 6000 TPM limit → 413 rejection → fallback to DeepSeek → timeout → fallback to Gemini. Each fallback re-tokenizes the entire context on the Node.js main thread (CPU-bound).

4. Discord slash command deploy retry loop (even when disabled)

With channels.discord.enabled: false, the plugin still attempts to deploy slash commands at boot → gets rate-limited by Discord (429) → retries indefinitely in a tight loop.

5. `plugins.entries.X.enabled: false` does not prevent loading

Setting lossless-claw to enabled: false in plugins.entries does not prevent it from loading. The only workaround is to use plugins.allow as a whitelist to explicitly block it.

6. V8 GC thrashing — unbounded heap (ref: #13758)

Without --max-old-space-size, the heap grows unbounded with large conversation contexts, causing constant GC thrashing. Related to #13758 / #6413.

7. Plugin runtime staging on every inbound message

31 NPM dependencies are re-resolved on every single inbound message (even if already installed). Takes 1-16 seconds + CPU each time.

Workarounds Applied

Workaround	CPU Impact
Delete zombie sessions (`rm ~/.openclaw/agents//sessions/.jsonl`)	100%+ → 30%
`NODE_OPTIONS=--max-old-space-size=1536` in systemd env	Reduces GC thrashing
Disable Discord channel entirely	Eliminates 429 retry loop
Use `plugins.allow` whitelist to block unwanted plugins	Prevents parasitic loading
Disable `hooks.internal.entries.session-memory`	Reduces unnecessary disk writes
Set `contextTokens: 128000` (was 32000)	Stops compaction safeguard loop
Purge entire `~/.openclaw/agents/` directory	Clean session reset

After all workarounds: ~15% idle CPU (acceptable), with temporary spikes during message processing (tokenization + model resolution + streaming).

Suggestions

Compaction safeguard should not trigger an embedded run when there's nothing to compact — it should just no-op
Sessions should have a TTL or auto-clean when the user starts a new session
Plugin runtime staging should cache by spec hash instead of re-resolving on every message
plugins.entries.X.enabled: false should be sufficient to prevent loading without needing a plugins.allow whitelist
Disabled channels (enabled: false) should not load any connection logic or attempt external API calls at boot
Model fallback should not re-tokenize the full context from scratch — the token count from the first attempt should be reusable

Diagnostic breadcrumbs

[diagnostic] liveness warning: reasons=event_loop_delay interval=36s eventLoopDelayP99Ms=21.3 eventLoopDelayMaxMs=10351.5 eventLoopUtilization=0.662

Event loop blocked for 10+ seconds during idle — confirms main-thread CPU spin, not I/O wait.

Correlation with #75688

The reporter in #75688 observes the same pattern on macOS ARM64:

100% CPU from startup, never drops
node.list latency 21-35s (we also see 9-11s)
~724MB RSS (we see 745MB before fixes)
Plugin bundled runtime deps (30-31 specs) staging overhead
Web UI polling exacerbates but is not causative

Their CPU profile shows all samples in uv_run → uv__io_poll → uv__stream_io, which is consistent with our finding that the event loop is saturated by synchronous tokenization and plugin resolution work blocking the libuv I/O thread.

The difference: we isolated the causes by disabling components one by one and identified that zombie sessions + compaction safeguard loop are the primary drivers, with plugin staging and disabled-but-still-active channels as amplifiers.

Steps to reproduce

Configure gateway mode with groq/qwen3-32b (free tier) + fallbacks
Enable WhatsApp (Baileys) channel, disable Discord (enabled: false)
Let a few sessions accumulate in ~/.openclaw/agents/*/sessions/
Restart the gateway
Observe CPU immediately climbing to 100%+ with no inbound messages

Expected behavior

Gateway should be near-idle (~1-5% CPU) when no messages are being processed. Fallback cascades
should not trigger CPU-bound re-tokenization. Disabled channels/plugins should not run any logic.

Actual behavior

Gateway sits at 100-130% CPU permanently with zero inbound messages. Responses take 60s+, node.list latency 20s+. Reproduces within minutes of restart

OpenClaw version

v2026.4.29

Operating system

Ubuntu Linux

Install method

No response

Model

groq/qwen3-32b

Provider / routing chain

groq/qwen3-32b → deepseek-v4-flash → gemini-2.5-flash

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

Delete zombie sessions and apply workarounds to mitigate CPU saturation, such as setting NODE_OPTIONS=--max-old-space-size=1536 and disabling unnecessary plugins and channels.

Guidance

Identify and remove zombie sessions by deleting files in ~/.openclaw/agents/*/sessions/*.jsonl to prevent re-launching on every boot.
Apply the suggested workarounds, such as setting NODE_OPTIONS=--max-old-space-size=1536 to reduce GC thrashing and disabling Discord channel entirely to eliminate the 429 retry loop.
Use plugins.allow whitelist to block unwanted plugins and prevent parasitic loading.
Consider disabling hooks.internal.entries.session-memory to reduce unnecessary disk writes.
Set contextTokens to a higher value (e.g., 128000) to stop the compaction safeguard loop.

Example

No code snippet is provided as the issue is more related to configuration and environment setup.

Notes

The provided workarounds have been tested and shown to reduce CPU usage from 100-130% to ~15% idle CPU. However, the root causes are complex and multifaceted, requiring a combination of fixes to fully mitigate the issue.

Recommendation

Apply the workarounds, as they have been shown to be effective in reducing CPU usage and mitigating the issue. Specifically, delete zombie sessions, set NODE_OPTIONS=--max-old-space-size=1536, and disable unnecessary plugins and channels.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Gateway should be near-idle (~1-5% CPU) when no messages are being processed. Fallback cascades
should not trigger CPU-bound re-tokenization. Disabled channels/plugins should not run any logic.

#api #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: Gateway CPU pinned at 100%: root causes & workarounds (complements #75688) [6 comments, 6 participants]

Recommended Tools

GitHub issue graph ai analysis

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Root Cause

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Fix Action

Fix / Workaround

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

5. plugins.entries.X.enabled: false does not prevent loading

Code Example

Bug type

Beta release blocker

Summary

Gateway CPU 100-130% idle — root causes identified & workarounds (v2026.4.29)

Environment

Symptom

Root Causes Identified

1. Zombie sessions re-launching on every boot (main culprit)

2. Compaction safeguard re-trigger loop

3. Groq free tier 6000 TPM → fallback cascade with full re-tokenization

4. Discord slash command deploy retry loop (even when disabled)

5. plugins.entries.X.enabled: false does not prevent loading

6. V8 GC thrashing — unbounded heap (ref: #13758)

7. Plugin runtime staging on every inbound message

Workarounds Applied

Suggestions

Diagnostic breadcrumbs

Correlation with #75688

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

5. `plugins.entries.X.enabled: false` does not prevent loading

5. `plugins.entries.X.enabled: false` does not prevent loading