openclaw - ✅(Solved) Fix [Bug]: Gateway crashes with uncaught exception when Discord health monitor triggers stale-socket restart [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56854Fetched 2026-04-08 01:46:52
View on GitHub
Comments
3
Participants
3
Timeline
8
Reactions
0
Timeline (top)
commented ×3cross-referenced ×2labeled ×2referenced ×1

Discord provider crashes the entire gateway with an uncaught exception when the health monitor detects a stale socket and attempts a restart. The abort handler sets maxReconnectAttempts to 0, then the WebSocket close handler tries to reconnect and throws.

Error Message

Error: Max reconnect attempts (0) reached after code 1005 at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47) at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)

Root Cause

Root cause: Race condition between the health monitor restart and the abort handler. The onAbort callback (line 6952) sets maxAttempts=0 to prevent reconnection during intentional shutdown, but the health-monitor-initiated restart triggers the same abort flow, and then the WebSocket close handler attempts reconnection against the now-disabled reconnect config.

PR fix notes

PR #213: Add upstream intelligence report from Scout

Description (problem / solution / changelog)

Added Scout report and journal files identifying inherited defects from the parent repository (openclaw/openclaw), including #56854, #56832, and #56892.


PR created automatically by Jules for task 7521594946744656005 started by @MillionthOdin16

<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

Summary by CodeRabbit

  • Documentation
    • Added three new files documenting known issues, defect reports, and upstream problem tracking with detailed descriptions and impact assessments.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Changed files

  • .jules/scout.md (added, +14/-0)
  • openclaw_issues.txt (added, +100/-0)
  • scout-report.txt (added, +19/-0)
  • test_output.log (added, +964/-0)

PR #58988: fix(discord): use safe disconnect in onAbort to prevent gateway crash

Description (problem / solution / changelog)

Summary

Fixes #56854

The Discord provider crashes the entire gateway with an uncaught exception when the health monitor triggers an abort. The onAbort handler sets gateway.options.reconnect.maxAttempts to 0 then calls gateway.disconnect(). The socket close handler sees maxAttempts=0 and throws "Max reconnect attempts (0) reached", killing the process.

Root cause

// onAbort handler (line 404)
params.gateway.options.reconnect = { maxAttempts: 0 };
params.gateway.disconnect(); // triggers close handler -> throw

The codebase already has disconnectGatewaySocketWithoutAutoReconnect() (line 135) which strips close/error listeners before disconnecting, specifically to prevent this race.

Fix

Replace the raw disconnect() call in onAbort with disconnectGatewaySocketWithoutAutoReconnect(). This reuses the existing safe disconnect pattern.

Test plan

  • Gateway survives Discord health monitor restart cycles without crashing
  • Intentional abort (e.g., config reload) cleanly disconnects Discord

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) [email protected]

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (modified, +19/-2)

Code Example

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)

---

02:04:04+05:00 [health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
02:04:28+05:00 [discord] [default] starting provider (@Openclaw)
02:39:04+05:00 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (provider-CAlWEl41.js:3307:9)
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Discord provider crashes the entire gateway with an uncaught exception when the health monitor detects a stale socket and attempts a restart. The abort handler sets maxReconnectAttempts to 0, then the WebSocket close handler tries to reconnect and throws.

Steps to reproduce

  1. Start OpenClaw gateway 2026.3.24 with Discord channel enabled on Windows 11.
  2. Wait ~35 minutes for the health monitor to detect a stale Discord WebSocket.
  3. Health monitor triggers a provider restart, firing the abort signal.
  4. The onAbort handler (line 6952 in provider-CAlWEl41.js) sets gateway.options.reconnect = { maxAttempts: 0 }.
  5. gateway.disconnect() closes the WebSocket, triggering handleClose -> handleReconnectionAttempt.
  6. Since maxAttempts is 0 and reconnectAttempts >= 0, it throws: Error: Max reconnect attempts (0) reached after code 1005.
  7. Uncaught exception kills the entire gateway process.

Expected behavior

The health monitor restart should cleanly disconnect and reconnect the Discord provider without crashing the gateway. The abort handler should not interfere with the reconnection logic during a health-monitor-initiated restart.

Actual behavior

Gateway crashes with uncaught exception:

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)

This happens every time the health monitor triggers a Discord restart (observed 3+ times, always ~35 min after gateway start).

OpenClaw version

2026.3.24

Operating system

Windows 11 (10.0.26200)

Install method

pnpm global

Model

kilocode/kilo-auto/free

Provider / routing chain

openclaw -> kilocode (kilo-auto/free auto-router)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

02:04:04+05:00 [health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
02:04:28+05:00 [discord] [default] starting provider (@Openclaw)
02:39:04+05:00 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (provider-CAlWEl41.js:3307:9)

Impact and severity

Affected: Anyone running Discord channel on OpenClaw (Windows confirmed, likely cross-platform) Severity: Critical - gateway process dies completely, all channels go offline Frequency: Every ~35 minutes (whenever health monitor fires stale-socket check on Discord) Consequence: Missed messages on all channels (WhatsApp, Discord, webchat) until manual restart. Requires external process wrapper to maintain uptime.

Additional information

Root cause: Race condition between the health monitor restart and the abort handler. The onAbort callback (line 6952) sets maxAttempts=0 to prevent reconnection during intentional shutdown, but the health-monitor-initiated restart triggers the same abort flow, and then the WebSocket close handler attempts reconnection against the now-disabled reconnect config.

Suggested fix: Either (a) don't set maxAttempts=0 in the abort handler and instead use a separate flag to skip reconnection during intentional shutdown, or (b) have the health monitor restart bypass the abort handler's reconnect suppression, or (c) catch the error in handleReconnectionAttempt when maxAttempts=0 instead of throwing uncaught.

Default reconnect config: line 6447 sets reconnect: { maxAttempts: 50 }, but the abort handler overrides to 0.

extent analysis

Fix Plan

To resolve the issue, we will implement a separate flag to skip reconnection during intentional shutdown, instead of setting maxAttempts to 0. This will prevent the race condition between the health monitor restart and the abort handler.

Here are the steps to fix the issue:

  • Introduce a new flag isIntentionalShutdown to track whether the shutdown is intentional or not.
  • In the onAbort handler, set isIntentionalShutdown to true instead of setting maxAttempts to 0.
  • In the handleReconnectionAttempt function, check the isIntentionalShutdown flag before attempting to reconnect. If it's true, skip reconnection.

Example code:

// Introduce a new flag to track intentional shutdown
let isIntentionalShutdown = false;

// In the onAbort handler
onAbort = () => {
  isIntentionalShutdown = true;
  // ...
}

// In the handleReconnectionAttempt function
handleReconnectionAttempt = () => {
  if (isIntentionalShutdown) {
    // Skip reconnection if it's an intentional shutdown
    return;
  }
  // ...
}

Verification

To verify that the fix worked, follow these steps:

  • Start the OpenClaw gateway with the Discord channel enabled.
  • Wait for the health monitor to detect a stale Discord WebSocket and trigger a restart.
  • Check the logs to ensure that the gateway does not crash with an uncaught exception.
  • Verify that the Discord provider reconnects successfully after the restart.

Extra Tips

  • Make sure to reset the isIntentionalShutdown flag to false after a successful restart to allow for reconnection attempts in case of future failures.
  • Consider adding additional logging to track the state of the isIntentionalShutdown flag and the reconnection attempts to aid in debugging and monitoring.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The health monitor restart should cleanly disconnect and reconnect the Discord provider without crashing the gateway. The abort handler should not interfere with the reconnection logic during a health-monitor-initiated restart.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING