openclaw - 💡(How to fix) Fix WhatsApp 408 disconnects in 2026.4.27 are caused by event-loop blocking up to 100s, not Baileys [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75122Fetched 2026-05-01 05:37:57
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
2
Author
Timeline (top)
commented ×1

The chronic WhatsApp status 408 — Connection Terminated disconnects we see in 2026.4.27 (also 2026.4.24) are not a Baileys bug — the new liveness diagnostic subsystem in 2026.4.27 shows the OpenClaw gateway's Node event loop is being blocked for tens of seconds at a time. While the loop is blocked, Baileys' keepalive ping cannot fire, so WhatsApp Web closes the socket at ~60s with HTTP 408. Any in-flight outbound reply is dropped (no replay).

Root Cause

The chronic WhatsApp status 408 — Connection Terminated disconnects we see in 2026.4.27 (also 2026.4.24) are not a Baileys bug — the new liveness diagnostic subsystem in 2026.4.27 shows the OpenClaw gateway's Node event loop is being blocked for tens of seconds at a time. While the loop is blocked, Baileys' keepalive ping cannot fire, so WhatsApp Web closes the socket at ~60s with HTTP 408. Any in-flight outbound reply is dropped (no replay).

Fix Action

Fix / Workaround

Workaround in use

Hourly cron resets the direct-DM session transcript when it crosses size or token-pct thresholds, keeping LLM call latency short enough that replies sometimes land before the next 408. This is operationally noisy and shouldn't be needed.

Code Example

16:38:22  eventLoopDelayMaxMs=5226     util=0.188  active=0 queued=0  (startup)
16:40:26  eventLoopDelayMaxMs=100059   util=1.000  active=0 queued=0  ★ 100s block
16:43:31  eventLoopDelayMaxMs=7222     util=0.306  active=1 queued=1
16:48:41  eventLoopDelayMaxMs=97039    util=0.992  active=0 queued=0  ★ 97s block
16:53:00  eventLoopDelayMaxMs=32799    util=0.905  active=1 queued=1  ★ 33s block
RAW_BUFFERClick to expand / collapse

Summary

The chronic WhatsApp status 408 — Connection Terminated disconnects we see in 2026.4.27 (also 2026.4.24) are not a Baileys bug — the new liveness diagnostic subsystem in 2026.4.27 shows the OpenClaw gateway's Node event loop is being blocked for tens of seconds at a time. While the loop is blocked, Baileys' keepalive ping cannot fire, so WhatsApp Web closes the socket at ~60s with HTTP 408. Any in-flight outbound reply is dropped (no replay).

Environment

  • OpenClaw 2026.4.27 (cbc2ba0) (also reproduced on 2026.4.24)
  • Bundled @whiskeysockets/[email protected] (this is the npm latest)
  • Node 22.22.2, Linux 6.8.0-106-generic, Contabo VPS — 4 cores / 7.8 GB / load 1.09
  • Single WhatsApp account, gateway in local mode, loopback bind
  • Model: openai-codex/gpt-5.5

Capacity is not the issue. The event loop is being blocked synchronously.

Evidence — liveness warnings from a single 15-min window today

16:38:22  eventLoopDelayMaxMs=5226     util=0.188  active=0 queued=0  (startup)
16:40:26  eventLoopDelayMaxMs=100059   util=1.000  active=0 queued=0  ★ 100s block
16:43:31  eventLoopDelayMaxMs=7222     util=0.306  active=1 queued=1
16:48:41  eventLoopDelayMaxMs=97039    util=0.992  active=0 queued=0  ★ 97s block
16:53:00  eventLoopDelayMaxMs=32799    util=0.905  active=1 queued=1  ★ 33s block

Notice the active=0 queued=0 rows: the loop is blocked even when the agent runtime says nothing is in flight, so the cause isn't simply LLM latency.

Symptom timing (deterministic)

inbound408 closeΔ
16:43:2116:44:2463s
16:52:1616:53:0044s
10:08:5410:10:371m43s
10:14:0310:15:461m43s
10:39:3910:40:3152s

Bimodal at ~52-63s and ~103s — consistent with Baileys' default keepAliveIntervalMs (30s) + defaultQueryTimeoutMs (60s) firing during event-loop blocks.

Hypotheses to investigate (the diagnostic subsystem already collects the

data — please surface its async-resource trace alongside the warning)

  1. Memory subsystem fallback: chunks_vec not updated — sqlite-vec unavailable. Vector recall degraded. — does the FTS fallback do a synchronous scan on each agent turn?
  2. Session transcript loading: a 6.6 MB / 386-line .jsonl was correlating with the worst blocks. Is loadSessionTranscript synchronous JSON parsing on the main thread?
  3. Plugin runtime-deps re-extraction at startup (we hit the 100s block at 16:40, two minutes after process start) — extracting tarballs sync?
  4. There is also a stale-tree GC bug: plugin-runtime-deps/ accumulates a directory per OpenClaw version forever. After two upgrades we had 3.5 GB of stale trees alongside the active one. Manual rm -rf was needed.

Asks

  1. Identify and fix what blocks the event loop. The diagnostic subsystem already detects it — please log the async-resource at the time of the block.
  2. On WhatsApp socket close after a queued outbound, replay the reply on reconnect rather than dropping it.
  3. Expose Baileys timeout/keepalive options in channels.whatsapp so operators can mitigate without waiting on a fix.
  4. Garbage-collect stale plugin-runtime-deps/openclaw-<version>-* trees on upgrade.

Workaround in use

Hourly cron resets the direct-DM session transcript when it crosses size or token-pct thresholds, keeping LLM call latency short enough that replies sometimes land before the next 408. This is operationally noisy and shouldn't be needed.

extent analysis

TL;DR

The event loop blockage in OpenClaw's Node.js application is likely caused by synchronous operations, such as JSON parsing or database queries, which need to be identified and fixed to prevent WhatsApp connection terminations.

Guidance

  • Investigate the hypotheses provided, such as memory subsystem fallback, session transcript loading, and plugin runtime-deps re-extraction, to identify the synchronous operation causing the event loop blockage.
  • Use the diagnostic subsystem to log the async-resource trace alongside the warning to gain more insights into the issue.
  • Consider exposing Baileys timeout/keepalive options in channels.whatsapp to allow operators to mitigate the issue temporarily.
  • Implement garbage collection for stale plugin-runtime-deps/openclaw-<version>-* trees on upgrade to prevent accumulation of unnecessary data.

Example

No code snippet is provided as the issue requires identification of the specific synchronous operation causing the event loop blockage.

Notes

The provided data suggests that the issue is not related to capacity, but rather to synchronous operations blocking the event loop. The workaround in use, which involves hourly cron resets of the direct-DM session transcript, is operationally noisy and should not be necessary once the root cause is fixed.

Recommendation

Apply a workaround by exposing Baileys timeout/keepalive options in channels.whatsapp to allow operators to mitigate the issue temporarily, while investigating and fixing the root cause of the event loop blockage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix WhatsApp 408 disconnects in 2026.4.27 are caused by event-loop blocking up to 100s, not Baileys [1 comments, 2 participants]