openclaw - ✅(Solved) Fix [Bug]: WhatsApp: outbound messages silently lost during WebSocket reconnect — no retry after recovery [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59275Fetched 2026-04-08 02:26:34
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

When the WhatsApp WebSocket connection drops (status 408 timeout), outbound agent replies that are in-flight are permanently lost. The gateway reconnects successfully within 2–4 seconds, but failed messages are never re-queued or retried. The user receives no response, and the agent session has no indication delivery failed.

Root Cause

When the WhatsApp WebSocket connection drops (status 408 timeout), outbound agent replies that are in-flight are permanently lost. The gateway reconnects successfully within 2–4 seconds, but failed messages are never re-queued or retried. The user receives no response, and the agent session has no indication delivery failed.

Fix Action

Fixed

PR fix notes

PR #54183: WhatsApp: add configurable send retry for transient network errors

Description (problem / solution / changelog)

Closes #54103

Summary

Adds configurable outbound send retry with exponential backoff for the WhatsApp channel, applied to all three send paths (text, media, poll). The retry is active by default with no config change needed.

What changed:

  • src/config/types.whatsapp.ts — added retry?: OutboundRetryConfig to WhatsAppSharedConfig, inherited by both WhatsAppConfig and WhatsAppAccountConfig
  • src/config/zod-schema.providers-whatsapp.ts — added retry: RetryConfigSchema to WhatsAppSharedSchema
  • extensions/whatsapp/src/outbound-retry.ts (new) — shared retry helpers: withWhatsAppSendRetry, resolveWhatsAppRetryConfig, and defaults; isolated here so both the production plugin and the test adapter import from one place
  • extensions/whatsapp/src/channel.ts — wraps the injected sendMessageWhatsApp and sendPollWhatsApp calls with retry before passing them to createWhatsAppOutboundBase; this is the actual production outbound path used by the plugin
  • extensions/whatsapp/src/outbound-adapter.ts — also wraps its three send sites with retry (used in tests); imports from outbound-retry.ts instead of duplicating helpers
  • src/config/bundled-channel-config-metadata.generated.ts — regenerated to include the new retry property in the WhatsApp channel schema
  • docs/.generated/config-baseline.json / .jsonl — regenerated to document the 10 new channels.whatsapp[.accounts.*].retry.* config paths

Key design decisions:

  • Only clearly transient errors are retried (/timeout|connect|reset|closed|unavailable|temporarily/i) to avoid duplicate message delivery on non-idempotent sends
  • Account-level config takes precedence over channel-level; default account ID is resolved via resolveDefaultWhatsAppAccountId so channels.whatsapp.accounts.<default-id>.retry is honored even without an explicit accountId
  • Account key lookup uses resolveAccountEntry (case-insensitive) to match the pattern used throughout WhatsApp account resolution
  • Reuses existing RetryConfig / retryAsync / resolveRetryConfig infrastructure already used by Telegram

Config example

Channel-level (applies to all accounts):

{
  "channels": {
    "whatsapp": {
      "retry": {
        "attempts": 5,
        "minDelayMs": 3000,
        "maxDelayMs": 60000,
        "jitter": 0.1
      }
    }
  }
}

Per-account override (account-level takes precedence):

{
  "channels": {
    "whatsapp": {
      "retry": {
        "attempts": 3,
        "minDelayMs": 1000,
        "maxDelayMs": 30000
      },
      "accounts": {
        "my-account": {
          "retry": {
            "attempts": 2,
            "minDelayMs": 500,
            "maxDelayMs": 10000
          }
        }
      }
    }
  }
}

To disable retries for a specific account:

{
  "channels": {
    "whatsapp": {
      "accounts": {
        "my-account": {
          "retry": { "attempts": 1 }
        }
      }
    }
  }
}

Defaults (no config needed): 3 attempts, 1s–30s exponential backoff, 10% jitter.

Test plan

  • pnpm test:extension whatsapp — 85 tests, all pass
  • pnpm test:contracts:channels — all pass
  • pnpm build — passes, no [INEFFECTIVE_DYNAMIC_IMPORT] warnings
  • pnpm check — 0 warnings, 0 errors
  • pnpm config:channels:check + pnpm config:docs:check — both pass
  • New tests: retry on transient error succeeds on second attempt; non-transient errors not retried; attempts config limit respected
  • Fully tested locally (see test plan above)
  • Follows existing Telegram retry pattern (OutboundRetryConfig, retryAsync, resolveRetryConfig)
  • Understands what the code does

Changed files

  • docs/.generated/config-baseline.json (modified, +160/-74)
  • docs/.generated/config-baseline.jsonl (modified, +31/-18)
  • docs/plugins/architecture.md (modified, +3/-3)
  • extensions/whatsapp/src/channel.test.ts (modified, +35/-0)
  • extensions/whatsapp/src/channel.ts (modified, +13/-0)
  • extensions/whatsapp/src/outbound-adapter.poll.test.ts (modified, +25/-0)
  • extensions/whatsapp/src/outbound-adapter.sendpayload.test.ts (modified, +149/-0)
  • extensions/whatsapp/src/outbound-adapter.ts (modified, +36/-20)
  • extensions/whatsapp/src/outbound-retry.ts (added, +92/-0)
  • extensions/whatsapp/src/send.test.ts (modified, +114/-0)
  • extensions/whatsapp/src/send.ts (modified, +45/-7)
  • src/channels/plugins/whatsapp-shared.ts (modified, +66/-22)
  • src/config/bundled-channel-config-metadata.generated.ts (modified, +77/-128)
  • src/config/types.whatsapp.ts (modified, +3/-0)
  • src/config/zod-schema.providers-whatsapp.ts (modified, +2/-0)
  • src/infra/retry-policy.test.ts (modified, +117/-1)
  • src/infra/retry-policy.ts (modified, +64/-40)

Code Example

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.4s…
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+2s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+3s)
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)

---

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.13s…
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+7s)

---

**Incident 12026-04-01 14:59:50 PDT (OpenClaw 2026.4.1)**

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.4s…
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+2s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+3s)
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)

Connection recovered at 14:59:54. Message never retried. User had to ask "did you respond?"

**Incident 22026-04-01 15:02:19 PDT**

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.13s…
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+7s)

Note: send attempted *after* reconnect succeeded — suggests a race condition where the send uses a stale connection reference.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When the WhatsApp WebSocket connection drops (status 408 timeout), outbound agent replies that are in-flight are permanently lost. The gateway reconnects successfully within 2–4 seconds, but failed messages are never re-queued or retried. The user receives no response, and the agent session has no indication delivery failed.

Steps to reproduce

  1. Agent generates a reply to a WhatsApp user message
  2. WebSocket drops (408 Request Time-out) during the send window
  3. Gateway logs Failed sending web auto-reply 1–3 times against the closed connection
  4. Connection recovers ("Listening for personal WhatsApp inbound messages")
  5. Failed message is never retried — silently dropped

Expected behavior

Failed outbound messages should be re-queued and retried (2–3 attempts with backoff) once the WebSocket connection is re-established. If all retries fail, the agent session should receive a delivery failure notification so it can inform the user. Especially helpful for those of us who run off Starlink for internet and sometimes get short connection drops.

Actual behavior

Incident 1 — 2026-04-01 14:59:50 PDT (OpenClaw 2026.4.1)

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.4s…
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+2s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+3s)
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)

Connection recovered at 14:59:54. Message never retried. User had to ask "did you respond?"

Incident 2 — 2026-04-01 15:02:19 PDT

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.13s…
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+7s)

Note: send attempted after reconnect succeeded — suggests a race condition where the send uses a stale connection reference.

OpenClaw version

2026.4.1

Operating system

Ubuntu 24.04

Install method

npm global

Model

anthropic/opus-4-6

Provider / routing chain

openclaw -> whatsapp

Additional provider/model setup details

Environment

  • OpenClaw: 2026.4.1 (da64a97)
  • OS: Ubuntu, Linux 6.8.0-106-generic (x64)
  • Node: v25.8.2
  • Connection: Starlink (residential satellite — inherently variable latency)
  • Frequency: 9 failed sends / 3 disconnects in the past 7 days

Related Issues

  • #38058 — Same problem, fewer details
  • #54103 — Feature request for configurable retry
  • #36659 — Similar retry feature request
  • PR #54183 — Open PR with a proposed fix (not yet merged)

Logs, screenshots, and evidence

**Incident 12026-04-01 14:59:50 PDT (OpenClaw 2026.4.1)**

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.4s…
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+2s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+3s)
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)

Connection recovered at 14:59:54. Message never retried. User had to ask "did you respond?"

**Incident 22026-04-01 15:02:19 PDT**

[whatsapp] Web connection closed (status 408). Retry 1/12 in 2.13s…
[whatsapp] Listening for personal WhatsApp inbound messages.                  (+4s)
[whatsapp] Failed sending web auto-reply to +1503XXXXXXX: Connection Closed   (+7s)

Note: send attempted *after* reconnect succeeded — suggests a race condition where the send uses a stale connection reference.

Impact and severity

Affected: Whatsapp Impact: Moderate Consequence: Never see reply from my claw, have to request the information a second or third time. Causes confusion with my claw and I sometimes think the gateway crashed so I waste time going to check but in reality the gateway was fine but my whatsapp on my phone never got the reply.

Additional information

No response

extent analysis

TL;DR

Implement a retry mechanism for outbound messages that failed due to a dropped WebSocket connection, ensuring messages are re-queued and retried upon reconnection.

Guidance

  • Review the proposed fix in PR #54183, which may address the issue by implementing a configurable retry feature for failed messages.
  • Investigate the race condition suggested by Incident 2, where a send attempt uses a stale connection reference after reconnection, and consider implementing a mechanism to ensure that only the latest connection reference is used for sends.
  • Consider adding a delivery failure notification to the agent session if all retries fail, to inform the user of the issue.
  • Evaluate the current retry logic for WebSocket connections, ensuring it is sufficient for handling temporary disconnections like those experienced with Starlink satellite internet.

Example

No specific code example is provided due to the lack of detailed implementation details in the issue, but reviewing and potentially merging PR #54183 could be a starting point.

Notes

The issue seems to be related to the handling of temporary WebSocket disconnections and the lack of a robust retry mechanism for failed outbound messages. The proposed fix in PR #54183 and addressing the potential race condition could mitigate the problem.

Recommendation

Apply the workaround by implementing a retry mechanism for failed messages, potentially by merging PR #54183, as it directly addresses the issue of messages being lost due to connection drops.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Failed outbound messages should be re-queued and retried (2–3 attempts with backoff) once the WebSocket connection is re-established. If all retries fail, the agent session should receive a delivery failure notification so it can inform the user. Especially helpful for those of us who run off Starlink for internet and sometimes get short connection drops.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING