openclaw - ✅(Solved) Fix [Bug]: channels.whatsapp.start-account blocks event loop ~40s, triggering reconnect storm [1 pull requests, 1 comments, 2 participants]

Q: Expected behavior

`channels. .start-account` should never block the event loop long enough to trip the liveness warning (`eventLoopDelayMaxMs` < 1000ms) or the upstream WhatsApp keepalive (~30s), so a single reconnect completes in one bootstrap cycle without cascading further reconnects. Other channels (Telegram, Signal) bootstrap in <1s on the same gateway and do not exhibit this pattern.

openclaw2026-05-06 00:58:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#78165•Fetched 2026-05-06 06:16:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Carcamo-ben

Participants

Carcamo-ben

clawsweeper[bot]

Timeline (top)

commented ×1cross-referenced ×1

The channels.whatsapp.start-account startup phase blocks the Node event loop for ~30–40s on a single tick, causing an event_loop_delay liveness warning and tripping WhatsApp's keepalive timeout, which forces an immediate reconnect that re-runs the same expensive bootstrap (reconnect storm).

Root Cause

Related upstream patch already shipped in patches/@[email protected] addresses media-stream races but not the auth-state bootstrap cost.
Workaround: full gateway restart (gateway restart) or openclaw channels logout/login --channel whatsapp --account default resets the session so the next start-account is cheaper, but the underlying expensive synchronous step remains.
This is the same code path that has historically produced silent listener drops (see local notes: whatsapp-relogin cron auto-revive failures). Suspect the two are related — when the bootstrap blocks long enough, the watchdog cron itself can't run because the loop is stuck.

Fix Action

Fix / Workaround

Related upstream patch already shipped in patches/@[email protected] addresses media-stream races but not the auth-state bootstrap cost.
Workaround: full gateway restart (gateway restart) or openclaw channels logout/login --channel whatsapp --account default resets the session so the next start-account is cheaper, but the underlying expensive synchronous step remains.
This is the same code path that has historically produced silent listener drops (see local notes: whatsapp-relogin cron auto-revive failures). Suspect the two are related — when the bootstrap blocks long enough, the watchdog cron itself can't run because the loop is stuck.

PR fix notes

PR #78178: fix: yield large WhatsApp signal key reads

Repository: openclaw/openclaw
Author: bryce-d-greybeard
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/78178

Description (problem / solution / changelog)

Fixes #78165.

Summary

Wrap WhatsApp/Baileys signal key reads so large keys.get(type, ids) requests are split into 64-key chunks.
Yield to the event loop between chunks with setImmediate, preventing large multifile auth stores from monopolizing gateway startup.
Preserve small-read behavior and existing Baileys auth/socket contracts; no new config knobs.

Tests

PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm test extensions/whatsapp/src/session.test.ts
PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm test extensions/whatsapp/src/connection-controller.test.ts extensions/whatsapp/src/auto-reply.web-auto-reply.connection-and-logging.e2e.test.ts
PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm exec oxfmt --check extensions/whatsapp/src/session.ts extensions/whatsapp/src/session.test.ts
git diff --check
PATH="/tmp/openclaw-pnpm-shim:$PATH" node scripts/check-changed.mjs (typechecks passed; failed on unrelated existing Slack lint extensions/slack/src/monitor/provider-support.ts:334 preserve-caught-error)

Changed files

extensions/whatsapp/src/session.test.ts (modified, +29/-0)
extensions/whatsapp/src/session.ts (modified, +37/-1)

Code Example

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

---

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

---

// dist/server-channels-CP_Np2uZ.js  (~line 401)
const trackedPromise = Promise.resolve().then(() =>
  measureStartup(`channels.${channelId}.start-account`,
    () => startAccount({ cfg, accountId: id, account, runtime, abortSignal: abort.signal, log, ... })
  )
)

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Run OpenClaw 2026.5.4 on Windows 11 with a long-lived WhatsApp account paired (existing Baileys session, large signal store).
Trigger or wait for a WhatsApp socket reconnect (e.g. transient network blip, gateway restart, or openclaw channels login --channel whatsapp --account default).
Observe the gateway diagnostic emitted during startup of the WhatsApp account.

Reproduced live at 20:38 local with the diagnostic below; the same start-account phase reappears every time the WA listener is brought back up after being idle for hours.

Expected behavior

channels.<id>.start-account should never block the event loop long enough to trip the liveness warning (eventLoopDelayMaxMs < 1000ms) or the upstream WhatsApp keepalive (~30s), so a single reconnect completes in one bootstrap cycle without cascading further reconnects. Other channels (Telegram, Signal) bootstrap in <1s on the same gateway and do not exhibit this pattern.

Actual behavior

Single tick of work inside start-account blocks the loop for ~40s. The diagnostic line:

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

While the loop is blocked, the WhatsApp WebSocket keepalive misses its window and the underlying provider drops the socket, which then auto-reconnects and re-enters start-account — looping until the bootstrap finally completes (typically one or two extra cycles, ~60–80s of unresponsiveness).

The relevant frame in the bundled output (dist/server-channels-CP_Np2uZ.js:401) is just a Promise.resolve().then() around plugin.gateway.startAccount(...) — the long synchronous chunk lives inside the WhatsApp plugin's startAccount (Baileys auth-state load + replay), so chunking/yielding (e.g. await new Promise(setImmediate) between heavy steps, or moving the auth-state replay to a worker_thread) needs to happen inside that plugin, not at the call site.

OpenClaw version

2026.5.4

Operating system

Windows 11 (10.0.26200, x64)

Install method

npm global (npm install -g openclaw, Node v24.15.0)

Model

github-copilot/claude-opus-4.7

Provider / routing chain

openclaw -> github-copilot

Additional provider/model setup details

Provider/model is unrelated to the symptom — start-account runs inside the gateway/channel-plugin host before the agent runtime is involved. Same gateway hosts Telegram + webchat with no event-loop warnings.

Logs, screenshots, and evidence

20:38:10 [diagnostic] liveness warning: reasons=event_loop_delay interval=57s eventLoopDelayP99Ms=33.8 eventLoopDelayMaxMs=40064 eventLoopUtilization=0.744 cpuCoreRatio=0.793 active=1 waiting=0 queued=1 phase=channels.whatsapp.start-account recentPhases=sidecars.restart-sentinel:0ms,sidecars.subagent-recovery:2ms,sidecars.main-session-recovery:5ms,post-attach.update-sentinel:0ms,sidecars.session-locks:30ms,post-ready.maintenance:732ms work=[active=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s) queued=agent:main:whatsapp:direct:+14374222496(processing,q=1,age=41s)]

Bundled call site (for reference, not a fix location):

// dist/server-channels-CP_Np2uZ.js  (~line 401)
const trackedPromise = Promise.resolve().then(() =>
  measureStartup(`channels.${channelId}.start-account`,
    () => startAccount({ cfg, accountId: id, account, runtime, abortSignal: abort.signal, log, ... })
  )
)

The fix needs to land inside the bundled WhatsApp plugin's startAccount implementation (Baileys-side auth-state load / signal-store rehydrate).

Impact and severity

Affected: any user with a non-trivial WhatsApp session on 2026.5.4 — fires every time the WA listener bootstraps.
Severity: High. During the 40s+ event-loop stall the entire gateway is unresponsive: no message routing, no tool calls, no diagnostics. Other unrelated sessions (webchat, Telegram) freeze in lockstep.
Frequency: Every observed reconnect (3/3 today). Once stable the listener stays up, so users who never reconnect won't notice; users with flaky links or who restart the gateway hit it every time.
Consequence: Missed inbound WhatsApp messages during the stall window, cron job runs delayed (waiting=0 queued=1 in the diagnostic shows queued work), and stuck processing,q=1,age=41s work items on the affected session.

Additional information

Related upstream patch already shipped in patches/@[email protected] addresses media-stream races but not the auth-state bootstrap cost.
Workaround: full gateway restart (gateway restart) or openclaw channels logout/login --channel whatsapp --account default resets the session so the next start-account is cheaper, but the underlying expensive synchronous step remains.
This is the same code path that has historically produced silent listener drops (see local notes: whatsapp-relogin cron auto-revive failures). Suspect the two are related — when the bootstrap blocks long enough, the watchdog cron itself can't run because the loop is stuck.

extent analysis

TL;DR

The WhatsApp plugin's startAccount implementation needs to be modified to yield control back to the event loop to prevent blocking and subsequent reconnect storms.

Guidance

Identify the expensive synchronous steps within the WhatsApp plugin's startAccount implementation, specifically the Baileys auth-state load and signal-store rehydrate.
Introduce asynchronous processing or chunking to yield control back to the event loop, preventing the 40s+ blockage.
Consider using await new Promise(setImmediate) or moving the auth-state replay to a worker_thread to offload the heavy processing.
Verify the fix by monitoring the event loop delay and WhatsApp keepalive timeouts after implementing the changes.

Example

// Example of yielding control back to the event loop
async function startAccount(...) {
  // Expensive step 1
  await expensiveStep1();
  await new Promise(setImmediate); // Yield control back to the event loop
  // Expensive step 2
  await expensiveStep2();
}

Notes

The provided workaround of restarting the gateway or logging out and back in only temporarily resets the session, but does not address the underlying issue. The related upstream patch does not fix the auth-state bootstrap cost.

Recommendation

Apply a workaround by modifying the WhatsApp plugin's startAccount implementation to yield control back to the event loop, as this will prevent the reconnect storms and event loop blockage.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#GPU setup #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: channels.whatsapp.start-account blocks event loop ~40s, triggering reconnect storm [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #78178: fix: yield large WhatsApp signal key reads

Description (problem / solution / changelog)

Summary

Tests

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING