openclaw - ✅(Solved) Fix Discord health monitor triggers uncaught exception → gateway crash (2026.3.24) [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55026Fetched 2026-04-08 01:33:34
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
cross-referenced ×3closed ×1commented ×1locked ×1

Error Message

Error: Max reconnect attempts (0) reached after code 1005 at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)

Root Cause

Root cause analysis:

  • staleEventThresholdMs defaults to 30 minutes (1,800,000ms)
  • When exceeded, health monitor triggers restart via abort signal
  • onAbort sets reconnect.maxAttempts = 0 (intentional for clean shutdown)
  • gateway.disconnect() throws uncaught exception because maxAttempts is 0
  • Gateway process dies and restarts from scratch

Fix Action

Fix / Workaround

Workaround: Disabled Discord health monitor: channels.discord.healthMonitor.enabled: false

PR fix notes

PR #55042: fix(discord): suppress intentional reconnect exception during health monitor restart

Description (problem / solution / changelog)

Summary

When the Discord health monitor triggers a restart (e.g., due to stale-socket detection), the onAbort handler sets maxAttempts to 0 before calling disconnect(). The gateway library interprets this as a reconnection failure and throws "Max reconnect attempts (0) reached" — causing an uncaught exception that crashes the entire gateway process.

Root Cause

The gateway library's disconnect flow treats maxAttempts: 0 as a signal that reconnection exhausted its attempts, throwing an error. However, in the health monitor's abort flow, maxAttempts: 0 is intentionally set to prevent reconnection during a clean shutdown — this exception should not bubble up.

Fix

Wrap the gateway.disconnect() call in a try-catch that:

  1. Suppresses the expected "Max reconnect attempts" error (this is an intentional abort, not a connection failure)
  2. Rethrows any other unexpected errors (preserves visibility into real issues)

Testing

Added two test cases:

  • suppresses 'Max reconnect attempts' exception during intentional abort — verifies the fix
  • rethrows non-reconnect exceptions during abort — ensures we don't suppress legitimate errors

All 17 tests in provider.lifecycle.test.ts pass.

Impact

  • Before: Gateway crashes every ~30 minutes when health monitor triggers restart
  • After: Health monitor gracefully restarts Discord connection without crashing

Fixes #55026

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +52/-0)
  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +12/-1)

PR #55000: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering

Description (problem / solution / changelog)

Summary

A transient Discord WebSocket disconnection (e.g. close code 1005) or an intentional health-monitor restart can trigger an uncaught Max reconnect attempts (0) exception that crashes the entire gateway process. This kills all channels (Feishu, WhatsApp, Telegram, etc.), not just Discord.

This PR fixes the root cause of these gateway crashes by correcting the supervisor phase transition ordering during an abort.

Fixes #54931 Fixes #54894

Root Cause

The gateway crash happens due to an error routing mismatch during teardown:

  1. onAbort() sets maxAttempts to 0 and calls gateway.disconnect().
  2. disconnect() synchronously triggers @buape/carbon's handleClosehandleReconnectionAttempt, which emits a Max reconnect attempts (0) error on the gateway emitter.
  3. The gateway-supervisor is still in the active phase at this point, so it routes the error to the lifecycle handler instead of suppressing it.
  4. The error surfaces as an uncaught exception, causing the entire gateway process to exit.

The supervisor already has the correct teardown suppression logic (it logs and swallows late errors during the teardown phase). The bug is simply that onAbort() never transitions the supervisor to the teardown phase before disconnecting.

Changes

1. Fix teardown ordering in onAbort()

Call params.gatewaySupervisor.detachLifecycle() before gateway.disconnect(). This ensures the supervisor enters the teardown phase and correctly suppresses the synchronous error emitted during disconnect.

2. Add lifecycleStopping safety net

In the catch block, when lifecycleStopping is already true, we no longer re-throw errors. This acts as a defense-in-depth guard for any edge case where an error might still escape during an intentional shutdown.

Test Results

  • The extension-fast (extension-fast-discord, discord) CI check passes.
  • Verified locally that triggering an abort correctly suppresses the disconnect error and allows the gateway to restart gracefully without crashing the main process.

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +100/-4)
  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +25/-0)

Code Example

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)

---

{"0":"{\"subsystem\":\"gateway/health-monitor\"}","1":"[discord:default] health-monitor: restarting (reason: stale-socket)","time":"2026-03-26T00:55:41.855-07:00"}
{"0":"[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005","time":"2026-03-26T00:55:42.019-07:00"}
RAW_BUFFERClick to expand / collapse

Bug: Discord health monitor triggers uncaught exception → gateway crash (2026.3.24)

Version: 2026.3.24 (just updated from 2026.3.23-2)

Description: The Discord channel health monitor detects a "stale-socket" after 30 minutes of inactivity and triggers a restart. During this restart, the onAbort handler sets gateway.options.reconnect = { maxAttempts: 0 }, then calls gateway.disconnect(). This throws an uncaught exception:

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)

The exception crashes the entire gateway process instead of cleanly restarting the Discord plugin.

Logs:

{"0":"{\"subsystem\":\"gateway/health-monitor\"}","1":"[discord:default] health-monitor: restarting (reason: stale-socket)","time":"2026-03-26T00:55:41.855-07:00"}
{"0":"[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005","time":"2026-03-26T00:55:42.019-07:00"}

Root cause analysis:

  • staleEventThresholdMs defaults to 30 minutes (1,800,000ms)
  • When exceeded, health monitor triggers restart via abort signal
  • onAbort sets reconnect.maxAttempts = 0 (intentional for clean shutdown)
  • gateway.disconnect() throws uncaught exception because maxAttempts is 0
  • Gateway process dies and restarts from scratch

Workaround: Disabled Discord health monitor: channels.discord.healthMonitor.enabled: false

Expected behavior: Health monitor restarts should not crash the gateway. The disconnect should be graceful even with maxAttempts: 0 since this is an intentional restart, not a connection failure.

Impact: Gateway becomes unreachable every ~30 minutes until health monitor is disabled.

Environment:

  • OS: Ubuntu 6.17
  • Node: v22.22.1
  • OpenClaw: 2026.3.24
  • Channel: Discord (default)

extent analysis

Fix Plan

To fix the issue, we need to modify the onAbort handler to handle the disconnect without throwing an uncaught exception when maxAttempts is 0.

Step-by-Step Solution

  • Modify the onAbort handler to catch and handle the exception:
gateway.on('abort', () => {
  gateway.options.reconnect = { maxAttempts: 0 };
  try {
    gateway.disconnect();
  } catch (error) {
    // Handle the exception, e.g., log it and continue with the restart
    console.error('Error during disconnect:', error);
  }
});
  • Alternatively, you can also modify the SafeGatewayPlugin.handleReconnectionAttempt function to not throw an exception when maxAttempts is 0:
// In provider-CAlWEl41.js
SafeGatewayPlugin.handleReconnectionAttempt = function() {
  // ...
  if (this.options.reconnect.maxAttempts === 0) {
    // Do not throw an exception, just return or log a message
    console.log('Max reconnect attempts (0) reached, disconnecting...');
    return;
  }
  // ...
};

Verification

To verify that the fix worked, you can:

  • Enable the Discord health monitor again by setting channels.discord.healthMonitor.enabled: true
  • Wait for the health monitor to trigger a restart (after 30 minutes of inactivity)
  • Check the logs to see if the gateway restarts cleanly without crashing

Extra Tips

  • Make sure to test the fix in a staging environment before deploying it to production
  • Consider adding additional logging or monitoring to detect and handle similar issues in the future
  • Review the onAbort handler and SafeGatewayPlugin.handleReconnectionAttempt function to ensure they are handling errors and exceptions correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING