openclaw - 💡(How to fix) Fix WhatsApp: health-monitor race condition causes multi-hour outages [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70463Fetched 2026-04-24 05:57:43
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

Current Workaround

Manual gateway restart every ~7 hours (not sustainable).

Code Example

03:02:13 ⚠️ No messages received in 30m - restarting connection
03:02:54 [health-monitor] restarting (reason: disconnected)
03:02:59 channel stop exceeded 5000ms after abort; continuing shutdown
03:03:16 channel exited (408)
# Silence for 7 hours until manual restart
09:56:17 [gateway] ready

---

channels:
  whatsapp:
    web:
      messageTimeoutMinutes: 35  # Higher than server disconnect timing
      watchdogCheckSeconds: 120  # Less frequent to avoid race condition
RAW_BUFFERClick to expand / collapse

OpenClaw WhatsApp Issue: Health monitor race condition with Baileys 30-minute disconnections

Problem Summary

WhatsApp Web gateway experiences a race condition that causes ~7-hour outages:

  1. Normal operation: WhatsApp server disconnects every ~30 minutes (status 499)
  2. Baileys retry mechanism: Automatically attempts reconnection (Retry 1/12 in ~2s)
  3. Health monitor: Also detects disconnection and restarts channel
  4. Race condition: Both systems try to control the same channel simultaneously
  5. Result: Channel enters shutdown state, Baileys retry stops, health monitor restart fails
  6. Outcome: ~7-hour complete outage until manual gateway restart

Expected Behavior

  • Baileys retry should continue until successful reconnection
  • Health monitor should work independently without interfering with retries
  • No manual intervention should be required for routine disconnections

Root Cause Analysis

Timing

  • messageTimeoutMs: 1,800,000ms (30 minutes) - hardcoded default
  • watchdogCheckMs: 60,000ms (1 minute) - hardcoded default
  • WhatsApp server disconnect: ~30 minutes (variable timing)

Race Condition Details

  1. minutesSinceLastMessage > 30 triggers watchdog restart
  2. Baileys retry mechanism is also active simultaneously
  3. Channel gets stuck in "channel stop exceeded 5000ms after abort" state
  4. No further retries occur from either system
03:02:13 ⚠️ No messages received in 30m - restarting connection
03:02:54 [health-monitor] restarting (reason: disconnected)
03:02:59 channel stop exceeded 5000ms after abort; continuing shutdown
03:03:16 channel exited (408)
# Silence for 7 hours until manual restart
09:56:17 [gateway] ready

Proposed Solution

Phase 1: Configurable timeouts (Priority)

Add config keys to prevent hardcoding and allow adjustments:

channels:
  whatsapp:
    web:
      messageTimeoutMinutes: 35  # Higher than server disconnect timing
      watchdogCheckSeconds: 120  # Less frequent to avoid race condition

Phase 2: Coordination mechanism (Future enhancement)

Implement mutual exclusion or coordination between Baileys retry and health monitor.

Current Workaround

Manual gateway restart every ~7 hours (not sustainable).

Impact

  • Critical for WhatsApp reliability
  • Affects cron jobs (Ryan pep talk, portfolio briefs, etc.)
  • Only affects isolated sessions (main sessions work)

Related Issues

  • Issue #123: [Potential duplicate if system coordination bugs exist]
  • Baileys: [link to related Baileys issues if any]

Steps to Reproduce

  1. Run OpenClaw WhatsApp gateway
  2. Wait ~3-4 hours for race condition to occur
  3. Observe complete disconnection lasting ~7 hours
  4. Manual restart required

Technical Details

  • OpenClaw version: [current version]
  • Baileys library: [version from package.json]
  • Node.js version: [current version]
  • WhatsApp Web client: Platform-dependent

extent analysis

TL;DR

Adjusting the messageTimeoutMinutes and watchdogCheckSeconds configuration values can help mitigate the race condition causing the 7-hour outages.

Guidance

  • Review the proposed solution's Phase 1 and consider implementing configurable timeouts to prevent hardcoding and allow adjustments.
  • Verify that the messageTimeoutMinutes value is set higher than the server disconnect timing to avoid triggering the watchdog restart prematurely.
  • Test the updated configuration to ensure that the Baileys retry mechanism and health monitor work independently without interfering with each other.
  • Consider implementing a mutual exclusion or coordination mechanism between Baileys retry and health monitor as a future enhancement.

Example

channels:
  whatsapp:
    web:
      messageTimeoutMinutes: 35
      watchdogCheckSeconds: 120

Notes

The provided solution focuses on adjusting timeouts, which may not completely resolve the issue but can help mitigate the race condition. A more comprehensive solution would involve implementing a coordination mechanism between the Baileys retry and health monitor systems.

Recommendation

Apply the proposed workaround by adjusting the messageTimeoutMinutes and watchdogCheckSeconds configuration values, as this is a more immediate and feasible solution compared to implementing a coordination mechanism, which may require more significant changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING