openclaw - ✅(Solved) Fix [Bug]: channels.whatsapp.start-account blocks event loop ~40s, triggering reconnect storm [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78165Fetched 2026-05-06 06:16:23
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1

The channels.whatsapp.start-account startup phase blocks the Node event loop for ~30–40s on a single tick, causing an event_loop_delay liveness warning and tripping WhatsApp's keepalive timeout, which forces an immediate reconnect that re-runs the same expensive bootstrap (reconnect storm).

Root Cause

  • Related upstream patch already shipped in patches/@[email protected] addresses media-stream races but not the auth-state bootstrap cost.
  • Workaround: full gateway restart (gateway restart) or openclaw channels logout/login --channel whatsapp --account default resets the session so the next start-account is cheaper, but the underlying expensive synchronous step remains.
  • This is the same code path that has historically produced silent listener drops (see local notes: whatsapp-relogin cron auto-revive failures). Suspect the two are related — when the bootstrap blocks long enough, the watchdog cron itself can't run because the loop is stuck.

Fix Action

Fix / Workaround

  • Related upstream patch already shipped in patches/@[email protected] addresses media-stream races but not the auth-state bootstrap cost.
  • Workaround: full gateway restart (gateway restart) or openclaw channels logout/login --channel whatsapp --account default resets the session so the next start-account is cheaper, but the underlying expensive synchronous step remains.
  • This is the same code path that has historically produced silent listener drops (see local notes: whatsapp-relogin cron auto-revive failures). Suspect the two are related — when the bootstrap blocks long enough, the watchdog cron itself can't run because the loop is stuck.

PR fix notes

PR #78178: fix: yield large WhatsApp signal key reads

Description (problem / solution / changelog)

Fixes #78165.

Summary

  • Wrap WhatsApp/Baileys signal key reads so large keys.get(type, ids) requests are split into 64-key chunks.
  • Yield to the event loop between chunks with setImmediate, preventing large multifile auth stores from monopolizing gateway startup.
  • Preserve small-read behavior and existing Baileys auth/socket contracts; no new config knobs.

Tests

  • PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm test extensions/whatsapp/src/session.test.ts
  • PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm test extensions/whatsapp/src/connection-controller.test.ts extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts
  • PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm exec oxfmt --check extensions/whatsapp/src/session.ts extensions/whatsapp/src/session.test.ts
  • git diff --check
  • PATH="/tmp/openclaw-pnpm-shim:$PATH" node scripts/check-changed.mjs (typechecks passed; failed on unrelated existing Slack lint extensions/slack/src/monitor/provider-support.ts:334 preserve-caught-error)

Changed files

  • extensions/whatsapp/src/session.test.ts (modified, +29/-0)
  • extensions/whatsapp/src/session.ts (modified, +37/-1)

Code Example

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

---

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

---

// dist/server-channels-CP_Np2uZ.js  (~line 401)
const trackedPromise = Promise.resolve().then(() =>
  measureStartup(`channels.${channelId}.start-account`,
    () => startAccount({ cfg, accountId: id, account, runtime, abortSignal: abort.signal, log, ... })
  )
)
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

The channels.whatsapp.start-account startup phase blocks the Node event loop for ~30–40s on a single tick, causing an event_loop_delay liveness warning and tripping WhatsApp's keepalive timeout, which forces an immediate reconnect that re-runs the same expensive bootstrap (reconnect storm).

Steps to reproduce

  1. Run OpenClaw 2026.5.4 on Windows 11 with a long-lived WhatsApp account paired (existing Baileys session, large signal store).
  2. Trigger or wait for a WhatsApp socket reconnect (e.g. transient network blip, gateway restart, or openclaw channels login --channel whatsapp --account default).
  3. Observe the gateway diagnostic emitted during startup of the WhatsApp account.

Reproduced live at 20:38 local with the diagnostic below; the same start-account phase reappears every time the WA listener is brought back up after being idle for hours.

Expected behavior

channels.<id>.start-account should never block the event loop long enough to trip the liveness warning (eventLoopDelayMaxMs < 1000ms) or the upstream WhatsApp keepalive (~30s), so a single reconnect completes in one bootstrap cycle without cascading further reconnects. Other channels (Telegram, Signal) bootstrap in <1s on the same gateway and do not exhibit this pattern.

Actual behavior

Single tick of work inside start-account blocks the loop for ~40s. The diagnostic line:

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

While the loop is blocked, the WhatsApp WebSocket keepalive misses its window and the underlying provider drops the socket, which then auto-reconnects and re-enters start-account — looping until the bootstrap finally completes (typically one or two extra cycles, ~60–80s of unresponsiveness).

The relevant frame in the bundled output (dist/server-channels-CP_Np2uZ.js:401) is just a Promise.resolve().then() around plugin.gateway.startAccount(...) — the long synchronous chunk lives inside the WhatsApp plugin's startAccount (Baileys auth-state load + replay), so chunking/yielding (e.g. await new Promise(setImmediate) between heavy steps, or moving the auth-state replay to a worker_thread) needs to happen inside that plugin, not at the call site.

OpenClaw version

2026.5.4

Operating system

Windows 11 (10.0.26200, x64)

Install method

npm global (npm install -g openclaw, Node v24.15.0)

Model

github-copilot/claude-opus-4.7

Provider / routing chain

openclaw -> github-copilot

Additional provider/model setup details

Provider/model is unrelated to the symptom — start-account runs inside the gateway/channel-plugin host before the agent runtime is involved. Same gateway hosts Telegram + webchat with no event-loop warnings.

Logs, screenshots, and evidence

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

Bundled call site (for reference, not a fix location):

// dist/server-channels-CP_Np2uZ.js  (~line 401)
const trackedPromise = Promise.resolve().then(() =>
  measureStartup(`channels.${channelId}.start-account`,
    () => startAccount({ cfg, accountId: id, account, runtime, abortSignal: abort.signal, log, ... })
  )
)

The fix needs to land inside the bundled WhatsApp plugin's startAccount implementation (Baileys-side auth-state load / signal-store rehydrate).

Impact and severity

  • Affected: any user with a non-trivial WhatsApp session on 2026.5.4 — fires every time the WA listener bootstraps.
  • Severity: High. During the 40s+ event-loop stall the entire gateway is unresponsive: no message routing, no tool calls, no diagnostics. Other unrelated sessions (webchat, Telegram) freeze in lockstep.
  • Frequency: Every observed reconnect (3/3 today). Once stable the listener stays up, so users who never reconnect won't notice; users with flaky links or who restart the gateway hit it every time.
  • Consequence: Missed inbound WhatsApp messages during the stall window, cron job runs delayed (waiting=0 queued=1 in the diagnostic shows queued work), and stuck processing,q=1,age=41s work items on the affected session.

Additional information

  • Related upstream patch already shipped in patches/@[email protected] addresses media-stream races but not the auth-state bootstrap cost.
  • Workaround: full gateway restart (gateway restart) or openclaw channels logout/login --channel whatsapp --account default resets the session so the next start-account is cheaper, but the underlying expensive synchronous step remains.
  • This is the same code path that has historically produced silent listener drops (see local notes: whatsapp-relogin cron auto-revive failures). Suspect the two are related — when the bootstrap blocks long enough, the watchdog cron itself can't run because the loop is stuck.

extent analysis

TL;DR

The WhatsApp plugin's startAccount implementation needs to be modified to yield control back to the event loop to prevent blocking and subsequent reconnect storms.

Guidance

  • Identify the expensive synchronous steps within the WhatsApp plugin's startAccount implementation, specifically the Baileys auth-state load and signal-store rehydrate.
  • Introduce asynchronous processing or chunking to yield control back to the event loop, preventing the 40s+ blockage.
  • Consider using await new Promise(setImmediate) or moving the auth-state replay to a worker_thread to offload the heavy processing.
  • Verify the fix by monitoring the event loop delay and WhatsApp keepalive timeouts after implementing the changes.

Example

// Example of yielding control back to the event loop
async function startAccount(...) {
  // Expensive step 1
  await expensiveStep1();
  await new Promise(setImmediate); // Yield control back to the event loop
  // Expensive step 2
  await expensiveStep2();
}

Notes

The provided workaround of restarting the gateway or logging out and back in only temporarily resets the session, but does not address the underlying issue. The related upstream patch does not fix the auth-state bootstrap cost.

Recommendation

Apply a workaround by modifying the WhatsApp plugin's startAccount implementation to yield control back to the event loop, as this will prevent the reconnect storms and event loop blockage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

channels.<id>.start-account should never block the event loop long enough to trip the liveness warning (eventLoopDelayMaxMs < 1000ms) or the upstream WhatsApp keepalive (~30s), so a single reconnect completes in one bootstrap cycle without cascading further reconnects. Other channels (Telegram, Signal) bootstrap in <1s on the same gateway and do not exhibit this pattern.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: channels.whatsapp.start-account blocks event loop ~40s, triggering reconnect storm [1 pull requests, 1 comments, 2 participants]