openclaw - ✅(Solved) Fix BlueBubbles: replay missed webhook messages after gateway restart (cursor + fetchBlueBubblesHistory + processMessage) [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#66721Fetched 2026-04-15 06:24:42
View on GitHub
Comments
0
Participants
1
Timeline
18
Reactions
0
Participants
Assignees
Timeline (top)
referenced ×10cross-referenced ×4assigned ×1closed ×1

Error Message

  • extensions/bluebubbles/src/monitor.ts — in registerBlueBubblesWebhookTarget, after successful route registration, fire-and-forget runBlueBubblesCatchup(account, deps) on a microtask. Log one INFO summary line per account: bluebubbles catchup: account=<id> replayed=N skipped=M window_ms=.... Errors are caught, logged at WARN, and never block the target registration.

Root Cause

  • fetchBlueBubblesHistory(chatGuid, limit, opts) in extensions/bluebubbles/src/history.ts already speaks /api/v1/chat/{guid}/messages. For catchup we want the flat /api/v1/message/query?after=<ts> endpoint that bb-catchup.sh uses (cross-chat in a single call, server-side cursor filter) — this needs a small new helper fetchBlueBubblesMessagesSince(sinceMs, limit, opts) next to it.
  • processMessage in monitor-processing.ts is already the canonical inbound handler. The catchup path can call it directly with the normalized payload — no need for the HTTP re-POST hop bb-catchup.sh does (the re-POST only exists because the workspace script can't reach into the gateway process).
  • monitor-reply-cache.ts + the persistent inbound dedupe from #66230 already protect against double-processing if a BB webhook and a catchup replay of the same GUID both arrive.

Fix Action

Fix / Workaround

Time (local)Event
11:05:14Gateway stopped (openclaw gateway stop, pgrep-clean, healthz refused)
11:08:35BB dispatches msg 1 → connect ECONNREFUSED 127.0.0.1:18789, no retry logged
11:08:53BB dispatches msg 2 → ECONNREFUSED, no retry
11:09:15BB dispatches msg 3 → ECONNREFUSED, no retry
11:09:56Gateway start issued
11:10:29Plugin bootstrap complete

BB-server log (edited, redacted):

[2026-04-14 11:08:35] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
[2026-04-14 11:08:53] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
[2026-04-14 11:09:15] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
# ... nothing more about msgs 1/2/3 — BB never re-dispatches
  1. BB Server's WebhookService is fire-and-forget on failure. Every Dispatching event with a failed POST is logged once and never retried, regardless of whether the failure is ECONNRESET (gateway wedged) or ECONNREFUSED (gateway stopped).
  2. BB Server's MessagePoller does NOT replay missed webhooks on webhook-receiver reconnection. After the gateway came back up and registered its webhook target, there were zero fresh Dispatching lines for msgs 1/2/3 — the only new dispatches were for the replies the agent eventually sent. The ~1-week MessagePoller lookback that #66230's design relies on is driven by BB's own reconnection events (to Messages.app / APNs), not by webhook-target HTTP reachability.
  3. Without external recovery, all three messages would have been permanently lost from the agent's perspective.

PR fix notes

PR #66760: feat(bluebubbles): replay missed webhook messages after gateway restart (#66721)

Description (problem / solution / changelog)

Summary

Fixes #66721. Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered while the gateway was unreachable and replays them through the existing processMessage pipeline.

The hole this closes: BB Server's WebhookService is fire-and-forget on POST failure (no retries) and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the OpenClaw gateway was down, restarting, or wedged were permanently lost.

This PR is stacked on top of #66810 (persistent inbound dedupe). The dedupe makes catchup safe to be aggressive: if a webhook delivery already succeeded for a GUID, the catchup replay of that same GUID is dropped at the dedupe boundary. Recommend landing #66810 first; this PR will rebase cleanly onto main once it does.

Design

  • extensions/bluebubbles/src/catchup.ts (new, ~290 LoC):
    • fetchBlueBubblesMessagesSince(sinceMs, limit, opts) calls /api/v1/message/query with {after, sort:\"ASC\", with:[chat, chat.participants, attachment]} so replays carry the same shape normalizeWebhookMessage already handles on live dispatch.
    • loadBlueBubblesCatchupCursor / saveBlueBubblesCatchupCursor persist a single {lastSeenMs, updatedAt} per account at <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json, using the plugin-sdk's atomic JSON helpers. File layout mirrors the inbound-dedupe store from #66810.
    • runBlueBubblesCatchup(target) orchestrates: clamp config, skip recent runs (<30s), clamp window to maxAgeMinutes, fetch, filter isFromMe and pre-cursor records, dispatch to processMessage, advance cursor to nowMs on success.
  • monitor.ts: after the webhook target registers, fire catchup as a background task; errors are logged but never block the channel-ready signal.
  • config-schema.ts: new optional catchup block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults are on with 2h lookback / 50 msg cap / 30-min first-run lookback.

Why this approach

The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66810's persistent dedupe automatically.

Safety

  • Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch all apply unchanged.
  • Dedupes against #66810's persistent inbound GUID cache: a webhook delivery that already succeeded cannot be reprocessed by catchup.
  • Never dispatches isFromMe records (double-checked before and after normalization) so the agent's own sends cannot enter the inbound path.
  • Cursor is only advanced on a successful query; a failed fetch leaves the prior cursor in place so the next run retries the same window.
  • 30s minimum interval between runs prevents churn on rolling restarts.
  • Hard ceilings: 12h max lookback, 500 messages per run.
  • Cursor uses nowMs rather than the latest observed message timestamp, so messages with dateCreated > nowMs (BB host vs gateway host clock skew) are not silently skipped — they replay on the next sweep, where dedupe handles any duplicate.

Validation

Automated

  • New scoped tests in extensions/bluebubbles/src/catchup.test.ts (14 cases): cursor round-trip, per-account scoping, FS-unsafe account IDs, firstRunLookbackMinutes, maxAgeMinutes clamp, enabled: false, rapid-restart skip, isFromMe filter (pre- and post-normalization), query-failure-preserves-cursor, per-message failure isolation, pre-cursor defense-in-depth.
  • Full BlueBubbles suite passes: 403/403.
  • pnpm check green (madge, tsgo, oxlint, webhook-auth-body-order, no-pairing-store-group, pairing-account-scope).

Live end-to-end (macOS, BB Server 1.9.x, 2026-04-14)

Repeating the original repro from #66721's issue body, this time against the new in-process catchup with the workspace shim disabled:

  1. Stopped gateway cleanly. Verified port refused, no process.
  2. Sent 3 distinct iMessages from a second device. BB-server log shows all 3 dispatches failed with connect ECONNREFUSED 127.0.0.1:18789 and never retried (confirms the hole this PR fixes is real and that BB does not replay on receiver-side reachability).
  3. Started gateway. Webhook target registered; catchup fired in the background.
  4. Gateway log:
    [bluebubbles] [default] BlueBubbles catchup:
      replayed=3 skipped_fromMe=0 skipped_preCursor=0 failed=0 fetched=3 window_ms=517184
  5. All 3 messages produced agent replies that were delivered back via BB outbound. Persistent cursor file appeared at ~/.openclaw/bluebubbles/catchup/<accountId>__<hash>.json. A subsequent gateway restart with no new inbound activity logged replayed=0 fetched=0 (no-op, as expected).

Test plan

  • pnpm test extensions/bluebubbles/src/catchup.test.ts — 14/14
  • pnpm test extensions/bluebubbles/ — 403/403
  • pnpm check — green
  • Live macOS end-to-end repro: stop gateway, send N messages (verified N×ECONNREFUSED in BB log), start gateway, assert N messages replayed via the catchup path with cursor advanced and dedupe state populated, and no-op on subsequent restart
  • Maintainer review

Order of operations

#66810 (persistent inbound dedupe) is a prerequisite. Once it lands, this branch will rebase onto main and the PR will retarget automatically.

Changed files

  • CHANGELOG.md (modified, +2/-0)
  • extensions/bluebubbles/src/accounts.ts (modified, +1/-1)
  • extensions/bluebubbles/src/catchup.test.ts (added, +568/-0)
  • extensions/bluebubbles/src/catchup.ts (added, +400/-0)
  • extensions/bluebubbles/src/config-schema.ts (modified, +15/-0)
  • extensions/bluebubbles/src/inbound-dedupe.test.ts (added, +58/-0)
  • extensions/bluebubbles/src/inbound-dedupe.ts (added, +172/-0)
  • extensions/bluebubbles/src/monitor-processing.ts (modified, +97/-0)
  • extensions/bluebubbles/src/monitor.ts (modified, +15/-2)
  • extensions/bluebubbles/src/test-support/monitor-test-support.ts (modified, +2/-0)

PR #66853: feat(bluebubbles): replay missed webhook messages after gateway restart (#66721)

Description (problem / solution / changelog)

Summary

Fixes #66721. Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered while the gateway was unreachable and replays them through the existing processMessage pipeline.

The hole this closes: BB Server's WebhookService is fire-and-forget on POST failure (no retries) and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. So inbound messages delivered while the OpenClaw gateway was down, restarting, or wedged were permanently lost — verified with a controlled experiment on macOS.

This PR was previously #66760, stacked on the now-merged dedupe work (#66816). After that landed, this was rebased onto main and reopened as a standalone PR against main.

Design

  • New extensions/bluebubbles/src/catchup.ts:
    • fetchBlueBubblesMessagesSince(sinceMs, limit, opts) calls /api/v1/message/query with {after, sort:"ASC", with:["chat","chat.participants","attachment"]} so replays carry the same shape normalizeWebhookMessage already handles on live dispatch.
    • loadBlueBubblesCatchupCursor / saveBlueBubblesCatchupCursor persist a single {lastSeenMs, updatedAt} per account under <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json, using the plugin-sdk's atomic JSON helpers. File layout mirrors the inbound-dedupe store from #66816.
    • runBlueBubblesCatchup(target) orchestrates: clamp config, fetch, filter isFromMe and pre-cursor records, dispatch to processMessage, advance cursor.
  • Modified monitor.ts: after the webhook target registers, fire catchup as a background task; errors are logged but never block the channel-ready signal.
  • Modified config-schema.ts: new optional catchup block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults are on with 2h lookback / 50 msg cap / 30-min first-run lookback.
  • Modified accounts.ts: adds catchup to the account-merge nestedObjectKeys list so per-account overrides deep-merge on top of channel-level defaults (mirroring the existing network precedent).

Why this approach

The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66816's persistent dedupe automatically.

Safety

  • Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch all apply unchanged.
  • Dedupes against #66816's persistent inbound GUID cache: a webhook delivery that already succeeded cannot be reprocessed by catchup.
  • Never dispatches isFromMe records (double-checked before and after normalization) so the agent's own sends cannot enter the inbound path.
  • Catchup runs once per gateway startup and does NOT skip on rapid restarts — skipping would permanently lose any messages that arrived during the brief downtime between the two startups.
  • Cursor only advances to nowMs on fully-successful runs. On processMessage failure, the cursor is held just before the earliest failure so the next run retries from there. On truncation (fetchedCount === perRunLimit), the cursor advances only to the last-fetched timestamp so the next gateway startup picks up the unfetched tail.
  • A future-dated cursor (NTP rollback, manual clock adjust) is treated as unusable and falls through to the firstRunLookback path; the cursor is repaired at the end of the run.
  • Cursor uses nowMs (clamped to the latest observed timestamp on truncation) rather than the latest observed message timestamp unconditionally, to avoid stuck rescans from clock skew between BB-host and gateway-host.
  • Hard ceilings: 12h max lookback, 500 messages per run.

Validation

Automated

  • New scoped tests in extensions/bluebubbles/src/catchup.test.ts (21 cases): cursor round-trip, per-account scoping, filesystem-unsafe account IDs, first-run lookback + maxAge clamp, enabled: false, rapid-restart-still-runs, isFromMe filter (pre- and post-normalization), query-failure-preserves-cursor, per-message failure isolation, held-cursor-on-retryable-failure, clamp-to-prior-cursor, future-cursor recovery, pre-cursor defense-in-depth, perRunLimit warn / no-warn, truncation-cursor advances only to page boundary, and first-run maxAge clamp.
  • Full BlueBubbles suite passes: 410/410.
  • pnpm check green (madge, tsgo, oxlint, webhook-auth-body-order, no-pairing-store-group, pairing-account-scope).

Live end-to-end (macOS, BB Server 1.9.x, 2026-04-14)

Repeating the original repro from #66721's issue body against the new in-process catchup with the workspace shim disabled:

  1. Stopped gateway cleanly. Verified port refused, no process.
  2. Sent 3 distinct iMessages from a second device. BB-server log shows all 3 dispatches failed with connect ECONNREFUSED 127.0.0.1:18789 and never retried.
  3. Started gateway. Webhook target registered; catchup fired in the background.
  4. Gateway log:
    [bluebubbles] [default] BlueBubbles catchup:
      replayed=3 skipped_fromMe=0 skipped_preCursor=0 failed=0 fetched=3 window_ms=517184
  5. All 3 messages produced agent replies delivered back via BB outbound. Persistent cursor file appeared at ~/.openclaw/bluebubbles/catchup/<accountId>__<hash>.json. Subsequent gateway restart with no new inbound activity logged replayed=0 fetched=0 (no-op, as expected).

Test plan

  • pnpm test extensions/bluebubbles/src/catchup.test.ts — 21/21
  • pnpm test extensions/bluebubbles/ — 410/410
  • pnpm check — green
  • Live macOS end-to-end repro: stop gateway, send N messages (N×ECONNREFUSED in BB log), start gateway, assert N replayed via catchup, cursor advanced, dedupe state populated, no-op on subsequent restart
  • Maintainer review

Review-feedback history

This branch carries review feedback that was gathered on the prior PR #66760 (closed automatically when its base lobster/bb-inbound-dedupe was deleted after #66816 merged). All findings were addressed in-branch; the 5 follow-up commits on this branch are:

  • 1b4402219c Greptile P2: align catchup state-dir with canonical SDK resolver; warn on perRunLimit truncation.
  • 8d5129ff79 Codex P1: hold catchup cursor on retryable failures; Codex P2: clock-skew gate precondition.
  • d154af9518 Codex P2: clamp first-run catchup window to maxAge; deep-merge catchup overrides.
  • 34e0d52f6a Codex P2: treat future-dated cursor as unusable; recover via firstRunLookback.
  • 5ac997f733 Codex P1 ×2: always run catchup on startup (remove min-interval gate); advance cursor to page boundary on truncation.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/bluebubbles/src/accounts.ts (modified, +1/-1)
  • extensions/bluebubbles/src/catchup.test.ts (added, +568/-0)
  • extensions/bluebubbles/src/catchup.ts (added, +400/-0)
  • extensions/bluebubbles/src/config-schema.ts (modified, +15/-0)
  • extensions/bluebubbles/src/monitor.ts (modified, +15/-2)

PR #66857: feat(bluebubbles): replay missed webhook messages after gateway restart (#66721)

Description (problem / solution / changelog)

Summary

Fixes #66721. Adds an in-process startup catchup pass to the BlueBubbles channel that queries BB Server for messages delivered while the gateway was unreachable and replays them through the existing processMessage pipeline.

The hole this closes: BB Server's WebhookService is fire-and-forget on POST failure (no retries) and BB's MessagePoller only re-fires webhooks on BB-side reconnection events (Messages.app / APNs), not on webhook-receiver recovery. Messages delivered while the gateway was down, restarting, or wedged were permanently lost — verified with a controlled experiment on macOS.

This PR supersedes #66853 (which was the stacked follow-up to #66760 / dedupe PR #66816). Same diff, collapsed to a single commit for cleaner review. History of review feedback is preserved in the superseded PR trail; all P1 and P2 findings from Greptile / Codex / Aisle were addressed in-branch before this squash.

Design

  • New extensions/bluebubbles/src/catchup.ts:
    • fetchBlueBubblesMessagesSince(sinceMs, limit, opts) calls /api/v1/message/query with {after, sort:"ASC", with:["chat","chat.participants","attachment"]} so replays carry the same shape normalizeWebhookMessage already handles on live dispatch.
    • loadBlueBubblesCatchupCursor / saveBlueBubblesCatchupCursor persist a single {lastSeenMs, updatedAt} per account under <stateDir>/bluebubbles/catchup/<accountId>__<hash>.json, using the plugin-sdk's atomic JSON helpers. File layout mirrors the inbound-dedupe store from #66816, and the resolver is the canonical openclaw/plugin-sdk/state-paths.resolveStateDir (same helper dedupe uses) so the two stores share a single root.
    • runBlueBubblesCatchup(target) orchestrates: clamp config, fetch, filter isFromMe and pre-cursor records, dispatch to processMessage, advance cursor.
  • Modified monitor.ts: fire catchup as a background task after webhook target registers; errors are logged but never block the channel-ready signal.
  • Modified config-schema.ts: new optional catchup block (enabled, maxAgeMinutes, perRunLimit, firstRunLookbackMinutes); defaults on with 2h lookback / 50 msg cap / 30-min first-run lookback.
  • Modified accounts.ts: adds catchup to the account-merge nestedObjectKeys list so per-account overrides deep-merge on top of channel-level defaults, mirroring the existing network precedent.

Why this approach

The fix mirrors a workspace-level shell script that's been running on a real OpenClaw install for ~4 weeks (~100 LoC of bash + python doing the same query/filter/POST flow). Porting it into the BB channel itself means every install gets recovery for free, calls processMessage directly (no re-POST hop), and benefits from #66816's persistent dedupe automatically.

Safety

  • Goes through the same processMessage path webhooks use, so auth, allowlist, pairing, and downstream agent dispatch all apply unchanged.
  • Dedupes against #66816's persistent inbound GUID cache: a webhook delivery that already succeeded cannot be reprocessed by catchup.
  • Never dispatches isFromMe records (double-checked before and after normalization) so the agent's own sends cannot enter the inbound path.
  • Catchup runs once per gateway startup and does NOT skip on rapid restarts — skipping would permanently lose any messages that arrived during the brief downtime between the two startups.
  • Cursor only advances to nowMs on fully-successful runs. On processMessage failure, the cursor is held just before the earliest failure timestamp so the next run retries from there. On truncation (fetchedCount === perRunLimit), the cursor advances only to the last-fetched timestamp so the next gateway startup picks up the unfetched tail.
  • A future-dated cursor (NTP rollback, manual clock adjust) is treated as unusable and falls through to the firstRunLookback path; the cursor is repaired at the end of the run.
  • First-run lookback clamped to the maxAge ceiling so maxAgeMinutes: 5, firstRunLookbackMinutes: 30 cannot exceed the operator's stated cap.
  • Hard ceilings: 12h max lookback, 500 messages per run.
  • Loud WARNING emitted when fetchedCount hits perRunLimit so operators know a single startup didn't drain the full backlog.

Validation

Automated

  • New scoped tests in extensions/bluebubbles/src/catchup.test.ts (21 cases): cursor round-trip, per-account scoping, filesystem-unsafe account IDs, firstRunLookback default and maxAge clamp, enabled: false, rapid-restart-still-runs, isFromMe filter (pre- and post-normalization), query-failure-preserves-cursor, per-message failure isolation, held-cursor-on-retryable-failure, clamp-to-prior-cursor, future-cursor recovery, pre-cursor defense-in-depth, perRunLimit warn / no-warn, and truncation-cursor advances only to page boundary.
  • Full BlueBubbles suite passes: 410/410.
  • pnpm check green (madge, tsgo, oxlint, webhook-auth-body-order, no-pairing-store-group, pairing-account-scope).

Live end-to-end (macOS, BB Server 1.9.x, 2026-04-14)

Repeating the original repro from #66721's issue body with the new in-process catchup:

  1. Stopped gateway cleanly. Verified port refused, no process.
  2. Sent 3 distinct iMessages from a second device. BB-server log shows all 3 dispatches failed with connect ECONNREFUSED 127.0.0.1:18789 and never retried.
  3. Started gateway. Webhook target registered; catchup fired in the background.
  4. Gateway log:
    [bluebubbles] [default] BlueBubbles catchup:
      replayed=3 skipped_fromMe=0 skipped_preCursor=0 failed=0 fetched=3 window_ms=517184
  5. All 3 messages produced agent replies delivered back via BB outbound. Persistent cursor file appeared at ~/.openclaw/bluebubbles/catchup/<accountId>__<hash>.json. Subsequent gateway restart with no new inbound activity logged replayed=0 fetched=0 (no-op).

Test plan

  • pnpm test extensions/bluebubbles/src/catchup.test.ts — 21/21
  • pnpm test extensions/bluebubbles/ — 410/410
  • pnpm check — green
  • Live macOS end-to-end repro
  • Maintainer review

History (for reviewer context)

This PR carries ~11 hours of iterative bot review that happened on the prior PRs (#66760 → #66853). Squashing here for clean review; the findings addressed were:

  • Greptile P2 — align state-dir with canonical SDK resolver; warn on perRunLimit truncation
  • Codex P1 — hold cursor on retryable processMessage failures
  • Codex P1 — always run catchup on startup (no min-interval skip)
  • Codex P1 — keep cursor behind unfetched pages when perRunLimit is hit
  • Codex P2 — clamp first-run window to maxAge
  • Codex P2 — deep-merge catchup overrides at account level
  • Codex P2 — treat future-dated cursor as unusable
  • Codex P2 — clock-skew gate precondition (later obviated by removing the gate)
  • Aisle — 2 of 5 findings apply (password-in-URL and OPENCLAW_STATE_DIR symlink); both are cross-cutting BB-plugin patterns best addressed in separate PRs against the SDK/plugin conventions. Other 3 Aisle findings were in files this PR doesn't touch (stale-SHA scan from pre-rebase).

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/bluebubbles/src/accounts.ts (modified, +1/-1)
  • extensions/bluebubbles/src/catchup.test.ts (added, +621/-0)
  • extensions/bluebubbles/src/catchup.ts (added, +430/-0)
  • extensions/bluebubbles/src/config-schema.ts (modified, +15/-0)
  • extensions/bluebubbles/src/monitor.ts (modified, +15/-2)

Code Example

[2026-04-14 11:08:35] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
[2026-04-14 11:08:53] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
[2026-04-14 11:09:15] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
# ... nothing more about msgs 1/2/3BB never re-dispatches

---

bb-catchup: found 3 missed message(s)
  replayed: [<chat>] from=<handle> text=dive test 1
  replayed: [<chat>] from=<handle> text=dive test 2
  replayed: [<chat>] from=<handle> text=dive test 3
bb-catchup: replayed=3 failed=0

---

catchup?: {
    enabled?: boolean;         // default true
    maxAgeMinutes?: number;    // default 120, hard cap 720
    perRunLimit?: number;      // default 50, hard cap 500
    firstRunLookbackMinutes?: number; // default 30
  }
RAW_BUFFERClick to expand / collapse

Problem

When the OpenClaw gateway is down, wedging, or restarting, inbound BlueBubbles messages delivered during the outage window are permanently lost. The underlying iMessages are intact (they remain in Messages.app and in BB Server's DB), but the agent never sees them, never replies, and no recovery happens when the gateway comes back up. This is the BlueBubbles analog of #50093 (WhatsApp) and is partially related to #38307 (stale-socket restarts).

Validated by a controlled experiment (2026-04-14)

I stopped the gateway cleanly, sent three distinct test iMessages to a monitored handle, waited, then started the gateway — instrumenting both ~/Library/Logs/bluebubbles-server/main.log and ~/.openclaw/logs/gateway.log.

Timeline:

Time (local)Event
11:05:14Gateway stopped (openclaw gateway stop, pgrep-clean, healthz refused)
11:08:35BB dispatches msg 1 → connect ECONNREFUSED 127.0.0.1:18789, no retry logged
11:08:53BB dispatches msg 2 → ECONNREFUSED, no retry
11:09:15BB dispatches msg 3 → ECONNREFUSED, no retry
11:09:56Gateway start issued
11:10:29Plugin bootstrap complete

BB-server log (edited, redacted):

[2026-04-14 11:08:35] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
[2026-04-14 11:08:53] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
[2026-04-14 11:09:15] [WebhookService] Failed to dispatch "new-message" event → connect ECONNREFUSED 127.0.0.1:18789
# ... nothing more about msgs 1/2/3 — BB never re-dispatches

Findings:

  1. BB Server's WebhookService is fire-and-forget on failure. Every Dispatching event with a failed POST is logged once and never retried, regardless of whether the failure is ECONNRESET (gateway wedged) or ECONNREFUSED (gateway stopped).
  2. BB Server's MessagePoller does NOT replay missed webhooks on webhook-receiver reconnection. After the gateway came back up and registered its webhook target, there were zero fresh Dispatching lines for msgs 1/2/3 — the only new dispatches were for the replies the agent eventually sent. The ~1-week MessagePoller lookback that #66230's design relies on is driven by BB's own reconnection events (to Messages.app / APNs), not by webhook-target HTTP reachability.
  3. Without external recovery, all three messages would have been permanently lost from the agent's perspective.

Proof the proposed fix works: bb-catchup.sh (Lobster workspace)

My Lobster install has been running a workspace script (openclaw-agents/lobster/scripts/bb-catchup.sh) that implements exactly this proposal. It's been in production for ~4 weeks and recovered all three messages in the experiment above. Its design:

  • Cursor: ~/.openclaw/bb-last-seen-ms (epoch ms), updated after every successful replay pass.
  • Query: POST /api/v1/message/query?password=... with body {"limit":50,"sort":"ASC","after":<cursor>,"with":["chat","chat.participants","attachment"]}.
  • Filter: drop isFromMe, drop own-handle senders, drop pre-cursor messages (defense in depth).
  • Replay: wrap each message in {"type":"new-message","data":<message>} and POST to the gateway's BB webhook endpoint — same path BB itself uses, so processMessage() handles it identically.
  • Bounds: 2-hour max lookback, 50-message cap, 0.5s between POSTs.
  • Trigger: invoked from BOOT.md as boot task #1 on gateway startup.

Experimental result from that script in the run above:

bb-catchup: found 3 missed message(s)
  replayed: [<chat>] from=<handle> text=dive test 1
  replayed: [<chat>] from=<handle> text=dive test 2
  replayed: [<chat>] from=<handle> text=dive test 3
bb-catchup: replayed=3 failed=0

The agent then produced inbound-session entries for all three, matching a clean webhook delivery. Proof the pattern is sound.

What I want to land upstream

Port the bb-catchup pattern into the BlueBubbles channel itself so every OpenClaw install gets message recovery for free, and the workspace script can be retired.

The BB extension already has all the primitives:

  • fetchBlueBubblesHistory(chatGuid, limit, opts) in extensions/bluebubbles/src/history.ts already speaks /api/v1/chat/{guid}/messages. For catchup we want the flat /api/v1/message/query?after=<ts> endpoint that bb-catchup.sh uses (cross-chat in a single call, server-side cursor filter) — this needs a small new helper fetchBlueBubblesMessagesSince(sinceMs, limit, opts) next to it.
  • processMessage in monitor-processing.ts is already the canonical inbound handler. The catchup path can call it directly with the normalized payload — no need for the HTTP re-POST hop bb-catchup.sh does (the re-POST only exists because the workspace script can't reach into the gateway process).
  • monitor-reply-cache.ts + the persistent inbound dedupe from #66230 already protect against double-processing if a BB webhook and a catchup replay of the same GUID both arrive.

Implementation plan

New files

  • extensions/bluebubbles/src/catchup.ts (~150 LoC)
    • fetchBlueBubblesMessagesSince(sinceMs, limit, opts) — POST /api/v1/message/query with after: sinceMs, sort: "ASC", with: [\"chat\",\"chat.participants\",\"attachment\"], bounded by limit; resilient to the same URL-variant fallbacks as fetchBlueBubblesHistory.
    • loadCursor(accountId) / saveCursor(accountId, ms) — file-backed state at ~/.openclaw/bluebubbles/catchup-cursor/<accountId>.json (matches the layout #66230 introduces for persistent dedupe). Atomic write via tmp+rename.
    • runBlueBubblesCatchup(account, deps) — orchestrator: loads cursor (fall back to now - 30min on first run), clamps lookback to MAX_AGE_MS (default 2h), calls the query helper, filters isFromMe and self-handles, normalizes each row through the same path webhook POSTs use (normalizeWebhookMessage etc.), and invokes processMessage(...) for each. Updates cursor on success.
  • extensions/bluebubbles/src/catchup.test.ts (~200 LoC)
    • Cursor persistence round-trip, first-run default, atomic-write survival across simulated crash mid-write.
    • Filter correctness: isFromMe, pre-cursor timestamp, self-handle address match.
    • Clamp math: MAX_AGE_MS boundary, identical timestamps, monotonic-clock skew.
    • End-to-end: stub the BB API, stub processMessage, assert call count and argument shape.
    • Interaction with #66230's inbound dedupe: replayed GUID already in dedupe file → processMessage called but early-exits.

Modified files

  • extensions/bluebubbles/src/monitor.ts — in registerBlueBubblesWebhookTarget, after successful route registration, fire-and-forget runBlueBubblesCatchup(account, deps) on a microtask. Log one INFO summary line per account: bluebubbles catchup: account=<id> replayed=N skipped=M window_ms=.... Errors are caught, logged at WARN, and never block the target registration.
  • extensions/bluebubbles/src/monitor-processing.ts — thread a new optional origin: \"webhook\" | \"catchup\" through processMessage so telemetry can distinguish replays. Default \"webhook\" preserves existing callers.
  • extensions/bluebubbles/src/config-schema.ts — add optional catchup block under the BB channel entry:
    catchup?: {
      enabled?: boolean;         // default true
      maxAgeMinutes?: number;    // default 120, hard cap 720
      perRunLimit?: number;      // default 50, hard cap 500
      firstRunLookbackMinutes?: number; // default 30
    }
  • CHANGELOG.md## Unreleased > ### Fixes bullet: "BlueBubbles: replay missed webhook messages after gateway restart via a persistent cursor and /api/v1/message/query?after=<ts> pass (fixes #66721)."

Safety / invariants

  • Default on, bounded. enabled: true out of the box because the downside of no-recovery is loud and user-visible; maxAgeMinutes and perRunLimit clamp the blast radius.
  • Never processes isFromMe — agent's own sends cannot be mistaken for inbound.
  • Cursor is persisted only on success. A failed run leaves the cursor at its previous value so the next run retries; the clamp prevents unbounded growth.
  • Idempotent with #66230. If a webhook delivery and a catchup pass both surface the same GUID, the persistent dedupe drops the second. Catchup can therefore be aggressive without risk of double-reply.
  • No new network surface. Only existing BB REST endpoints (same as fetchBlueBubblesHistory and bb-catchup.sh).
  • No new inbound code path. Catchup goes through processMessage — the exact same handler webhooks already use.

Test plan

  • Unit tests in catchup.test.ts as listed above (pass pnpm test extensions/bluebubbles/src/catchup.test.ts).
  • Full BB suite passes (pnpm test extensions/bluebubbles/).
  • pnpm check green.
  • Live repro on macOS using the same protocol as the 2026-04-14 experiment: stop gateway, send N messages, start gateway, assert: (a) processMessage called N times with origin: \"catchup\", (b) cursor file updated, (c) inbound dedupe file contains N new GUIDs, (d) re-running catchup is a no-op.
  • Regression: with #66230's dedupe active, send a message while gateway is up (webhook delivers normally), restart gateway, assert catchup sees it in the query window but processMessage early-exits on the dedupe hit — no double reply.

Order of operations with #66230

#66230 (persistent inbound dedupe) is a prerequisite for catchup to be safe to turn on by default. Recommend landing #66230 first, then this issue's fix.

Retirement of workspace script

Once this ships in a released OpenClaw, openclaw-agents/lobster/scripts/bb-catchup.sh and its BOOT.md invocation should be removed. Keeping both would double-process during the one-turn window where catchup runs at gateway startup, which #66230's dedupe handles correctly but introduces unnecessary overhead.

Related

  • #66230 — persistent inbound webhook dedupe (prerequisite; makes aggressive replay safe).
  • #50093 — WhatsApp analog; same class of problem on a different channel.
  • #38307 — BlueBubbles stale-socket restart gaps; same symptom class, different trigger.
  • #51814 — Native agent wake-up after gateway restart (complementary; that's about resuming in-flight agent work, this is about recovering inbound events).

extent analysis

TL;DR

Implement a catchup mechanism in the BlueBubbles channel to recover missed messages after an OpenClaw gateway restart by utilizing a persistent cursor and querying the /api/v1/message/query?after=<ts> endpoint.

Guidance

  • To address the issue of lost messages during gateway downtime, introduce a new fetchBlueBubblesMessagesSince function in extensions/bluebubbles/src/history.ts to query messages since a given timestamp.
  • Create a runBlueBubblesCatchup function in extensions/bluebubbles/src/catchup.ts to orchestrate the catchup process, including loading the cursor, querying messages, filtering, and replaying them through processMessage.
  • Modify extensions/bluebubbles/src/monitor.ts to fire-and-forget the catchup process after successful webhook target registration.
  • Ensure the catchup mechanism is idempotent with the persistent inbound dedupe feature (#66230) to prevent double-processing of messages.

Example

// extensions/bluebubbles/src/catchup.ts
export function fetchBlueBubblesMessagesSince(sinceMs: number, limit: number, opts: any) {
  // Implement query to /api/v1/message/query?after=<ts> with sorting and filtering
}

export function runBlueBubblesCatchup(account: any, deps: any) {
  // Load cursor, query messages, filter, and replay through processMessage
}

Notes

The proposed fix relies on the fetchBlueBubblesMessagesSince function and the runBlueBubblesCatchup orchestrator. It's essential to ensure the catchup mechanism is properly integrated with the existing webhook handling and dedupe features.

Recommendation

Apply the proposed catchup mechanism to recover missed messages after an OpenClaw gateway restart, as it has been successfully tested and provides a reliable solution to the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING