openclaw - 💡(How to fix) Fix Event loop saturation during startup: synchronous model-prewarm and session-locks block event loop for 28-64 seconds

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw 2026.5.19 suffers from severe event loop saturation during startup. Two synchronous startup sidecars — model-prewarm and session-locks — block the Node.js event loop for seconds at a time, producing max event loop delays of 28–64 seconds, utilization of 93–96%, and heap pressure exceeding 1GB. This saturation cascades into multiple user-visible failures:

  • Discord WS READY never fires — heartbeat ACKs can't be sent in time, Discord closes the connection (code 1000). Bot appears online but can't receive guild messages. (see #79794)
  • Typing indicator delays — even with typingMode: "instant", typing doesn't fire until the event loop has capacity
  • Memory pressure — heap hits 1.1–1.3GB (threshold 1GB) during the startup burst
  • Gateway restart cascade — systemd RestartSec=5 causes rapid restart attempts, each re-triggering the same saturation

Root Cause

OpenClaw 2026.5.19 suffers from severe event loop saturation during startup. Two synchronous startup sidecars — model-prewarm and session-locks — block the Node.js event loop for seconds at a time, producing max event loop delays of 28–64 seconds, utilization of 93–96%, and heap pressure exceeding 1GB. This saturation cascades into multiple user-visible failures:

  • Discord WS READY never fires — heartbeat ACKs can't be sent in time, Discord closes the connection (code 1000). Bot appears online but can't receive guild messages. (see #79794)
  • Typing indicator delays — even with typingMode: "instant", typing doesn't fire until the event loop has capacity
  • Memory pressure — heap hits 1.1–1.3GB (threshold 1GB) during the startup burst
  • Gateway restart cascade — systemd RestartSec=5 causes rapid restart attempts, each re-triggering the same saturation

Fix Action

Workaround

The bot self-heals after 1–3 minutes once the startup sidecars complete. Increasing systemd RestartSec (e.g., to 30s) reduces the restart cascade. No way to disable model-prewarm or defer session-locks parsing from user config.

Code Example

19:02:56  systemd starts openclaw-gateway.service
19:03:07  http server listening (23 plugins; 9.4s)
19:03:09  [discord] starting provider
19:03:10  gateway ready
19:03:13  [discord] bot probe resolved (REST — works fine)
19:03:14  [discord] channels resolved (6 channels, REST — works fine)
19:03:19  [discord] client initialized; awaiting gateway readiness
          ↕ startup sidecars running: model-prewarm (4.2s) + session-locks (1.6s)
          ↕ agent work queued
          ↕ event loop blocked — max delay 64.5 SECONDS
19:04:48  [diagnostic] liveness warning (p99=1389ms, max=64525ms, util=0.927)
19:04:48  [discord] Gateway websocket closed: 1000
~2 minutes of silence — no log output
~19:05    Discord WS silently auto-reconnects. Bot starts responding.
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw 2026.5.19 suffers from severe event loop saturation during startup. Two synchronous startup sidecars — model-prewarm and session-locks — block the Node.js event loop for seconds at a time, producing max event loop delays of 28–64 seconds, utilization of 93–96%, and heap pressure exceeding 1GB. This saturation cascades into multiple user-visible failures:

  • Discord WS READY never fires — heartbeat ACKs can't be sent in time, Discord closes the connection (code 1000). Bot appears online but can't receive guild messages. (see #79794)
  • Typing indicator delays — even with typingMode: "instant", typing doesn't fire until the event loop has capacity
  • Memory pressure — heap hits 1.1–1.3GB (threshold 1GB) during the startup burst
  • Gateway restart cascade — systemd RestartSec=5 causes rapid restart attempts, each re-triggering the same saturation

Environment

  • OpenClaw: 2026.5.19 (a185ca2)
  • Node.js: v24.15.0
  • OS: Ubuntu 24.04.4 LTS, x86_64, systemd user service
  • Plugins: 23
  • Agents: 7
  • Session stores: 168 sessions totaling ~3.7MB JSON, parsed synchronously on every startup

Evidence

Liveness warnings across 17 startups in 45 minutes

Every successful startup produced a liveness warning within 30–90 seconds:

StartupTimep99 delayMax delayEL utilPrewarmSession locksHeap
#318:221,430ms44,426ms95.9%2,662ms1,206ms
#518:271,985ms3,496ms95.3%3,135ms1,241ms1,303MB
#718:392,321ms28,739ms94.1%1,876ms830ms1,110MB
#818:431,983ms24,562ms92.9%1,820ms795ms
#1719:031,389ms64,525ms92.7%4,199ms1,624ms

Full startup timeline (representative: startup #17)

19:02:56  systemd starts openclaw-gateway.service
19:03:07  http server listening (23 plugins; 9.4s)
19:03:09  [discord] starting provider
19:03:10  gateway ready
19:03:13  [discord] bot probe resolved (REST — works fine)
19:03:14  [discord] channels resolved (6 channels, REST — works fine)
19:03:19  [discord] client initialized; awaiting gateway readiness
          ↕ startup sidecars running: model-prewarm (4.2s) + session-locks (1.6s)
          ↕ agent work queued
          ↕ event loop blocked — max delay 64.5 SECONDS
19:04:48  [diagnostic] liveness warning (p99=1389ms, max=64525ms, util=0.927)
19:04:48  [discord] Gateway websocket closed: 1000
          ↕ ~2 minutes of silence — no log output
~19:05    Discord WS silently auto-reconnects. Bot starts responding.

Breakdown of saturation sources

1. model-prewarm (1.8–4.2s per startup) Synchronously loads model weights immediately after gateway ready. Blocks the event loop entirely. Runs on every restart — with 17 restarts in 45 min, that's 17 prewarm cycles.

2. session-locks (0.8–1.6s per startup) Parses JSON session stores for all agents on every startup. With 168 sessions totaling ~3.7MB of JSON, this is a significant synchronous parse. Scales with session count — will get worse over time.

3. Agent work starts immediately Queued agent work begins processing before startup sidecars finish, competing for the already-saturated event loop.

4. Restart cascade amplifies the problem systemd RestartSec=5 + StartLimitBurst=5 means 5 rapid restarts before systemd gives up. Each restart re-runs prewarm + session-locks.

Discord READY failure mechanism

The Discord gateway handshake (HELLO → IDENTIFY → READY) requires multiple event loop ticks for WS frame parsing, heartbeat ACK responses, and IDENTIFY payload send. Discord expects heartbeat ACK within heartbeat_interval (typically 41.25s). With max event loop delays of 28–64s, ACKs are missed, and Discord closes the connection with code 1000.

The bot self-heals once the event loop calms down (~2 min post-startup) — Discord.js auto-reconnects silently. But this reconnect is not logged, making it invisible to monitoring.

Memory pressure

Heap exceeded the 1GB threshold on 2 of 5 liveness-warned startups:

  • Startup #5: heapUsedBytes=1,303MB (RSS=1,437MB)
  • Startup #7: heapUsedBytes=1,110MB

Suggested fixes

  1. Defer model-prewarm — prewarm is only useful when the first user message arrives. Deferring to after provider READY events (or lazy-loading on first model call) would eliminate the largest single-source block (1.8–4.2s).

  2. Async session-locks parsing — use streaming JSON parse, JSON.parse in a worker thread, or lazy-load sessions on first access instead of parsing all session data synchronously at startup.

  3. Defer agent work until provider READY — don't start processing queued agent messages until provider websocket handshakes complete.

  4. Log WS reconnect events — the silent auto-reconnect after ~2 minutes produces zero log output. Add logging for WS reconnect so operators can distinguish permanent failure from self-healing.

  5. Session store hygiene — an expiry or compaction mechanism would reduce the parse cost over time as sessions accumulate.

Workaround

The bot self-heals after 1–3 minutes once the startup sidecars complete. Increasing systemd RestartSec (e.g., to 30s) reduces the restart cascade. No way to disable model-prewarm or defer session-locks parsing from user config.

Related issues

  • #79794 — Discord gateway READY never fires (multiple reporters, confirmed regression in 2026.5.x)
  • #78910 — Discord WS 1006 rapid disconnect loop
  • #81172 — memory-core blocks event loop

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Event loop saturation during startup: synchronous model-prewarm and session-locks block event loop for 28-64 seconds