openclaw - ✅(Solved) Fix Discord health monitor triggers uncaught exception → gateway crash (2026.3.24) [2 pull requests, 1 comments, 2 participants]

openclaw2026-03-26 08:01:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#55026•Fetched 2026-04-08 01:33:34

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dchihood-ship-it

Participants

dchihood-ship-it

steipete

Timeline (top)

cross-referenced ×3closed ×1commented ×1locked ×1

Error Message

Error: Max reconnect attempts (0) reached after code 1005 at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)

Root Cause

Root cause analysis:

staleEventThresholdMs defaults to 30 minutes (1,800,000ms)
When exceeded, health monitor triggers restart via abort signal
onAbort sets reconnect.maxAttempts = 0 (intentional for clean shutdown)
gateway.disconnect() throws uncaught exception because maxAttempts is 0
Gateway process dies and restarts from scratch

Fix Action

Fix / Workaround

Workaround: Disabled Discord health monitor: channels.discord.healthMonitor.enabled: false

PR fix notes

PR #55042: fix(discord): suppress intentional reconnect exception during health monitor restart

Repository: openclaw/openclaw
Author: Bartok9
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/55042

Description (problem / solution / changelog)

Summary

When the Discord health monitor triggers a restart (e.g., due to stale-socket detection), the onAbort handler sets maxAttempts to 0 before calling disconnect(). The gateway library interprets this as a reconnection failure and throws "Max reconnect attempts (0) reached" — causing an uncaught exception that crashes the entire gateway process.

Root Cause

The gateway library's disconnect flow treats maxAttempts: 0 as a signal that reconnection exhausted its attempts, throwing an error. However, in the health monitor's abort flow, maxAttempts: 0 is intentionally set to prevent reconnection during a clean shutdown — this exception should not bubble up.

Fix

Wrap the gateway.disconnect() call in a try-catch that:

Suppresses the expected "Max reconnect attempts" error (this is an intentional abort, not a connection failure)
Rethrows any other unexpected errors (preserves visibility into real issues)

Testing

Added two test cases:

suppresses 'Max reconnect attempts' exception during intentional abort — verifies the fix
rethrows non-reconnect exceptions during abort — ensures we don't suppress legitimate errors

All 17 tests in provider.lifecycle.test.ts pass.

Impact

Before: Gateway crashes every ~30 minutes when health monitor triggers restart
After: Health monitor gracefully restarts Discord connection without crashing

Fixes #55026

Changed files

extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +52/-0)
extensions/discord/src/monitor/provider.lifecycle.ts (modified, +12/-1)

PR #55000: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering

Repository: openclaw/openclaw
Author: openperf
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/55000

Description (problem / solution / changelog)

Summary

A transient Discord WebSocket disconnection (e.g. close code 1005) or an intentional health-monitor restart can trigger an uncaught Max reconnect attempts (0) exception that crashes the entire gateway process. This kills all channels (Feishu, WhatsApp, Telegram, etc.), not just Discord.

This PR fixes the root cause of these gateway crashes by correcting the supervisor phase transition ordering during an abort.

Fixes #54931 Fixes #54894

Root Cause

The gateway crash happens due to an error routing mismatch during teardown:

onAbort() sets maxAttempts to 0 and calls gateway.disconnect().
disconnect() synchronously triggers @buape/carbon's handleClose → handleReconnectionAttempt, which emits a Max reconnect attempts (0) error on the gateway emitter.
The gateway-supervisor is still in the active phase at this point, so it routes the error to the lifecycle handler instead of suppressing it.
The error surfaces as an uncaught exception, causing the entire gateway process to exit.

The supervisor already has the correct teardown suppression logic (it logs and swallows late errors during the teardown phase). The bug is simply that onAbort() never transitions the supervisor to the teardown phase before disconnecting.

Changes

1. Fix teardown ordering in onAbort()

Call params.gatewaySupervisor.detachLifecycle() before gateway.disconnect(). This ensures the supervisor enters the teardown phase and correctly suppresses the synchronous error emitted during disconnect.

2. Add lifecycleStopping safety net

In the catch block, when lifecycleStopping is already true, we no longer re-throw errors. This acts as a defense-in-depth guard for any edge case where an error might still escape during an intentional shutdown.

Test Results

The extension-fast (extension-fast-discord, discord) CI check passes.
Verified locally that triggering an abort correctly suppresses the disconnect error and allows the gateway to restart gracefully without crashing the main process.

Changed files

extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +100/-4)
extensions/discord/src/monitor/provider.lifecycle.ts (modified, +25/-0)

Code Example

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)

---

{"0":"{\"subsystem\":\"gateway/health-monitor\"}","1":"[discord:default] health-monitor: restarting (reason: stale-socket)","time":"2026-03-26T00:55:41.855-07:00"}
{"0":"[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005","time":"2026-03-26T00:55:42.019-07:00"}

RAW_BUFFERClick to expand / collapse

Bug: Discord health monitor triggers uncaught exception → gateway crash (2026.3.24)

Version: 2026.3.24 (just updated from 2026.3.23-2)

Description: The Discord channel health monitor detects a "stale-socket" after 30 minutes of inactivity and triggers a restart. During this restart, the onAbort handler sets gateway.options.reconnect = { maxAttempts: 0 }, then calls gateway.disconnect(). This throws an uncaught exception:

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)

The exception crashes the entire gateway process instead of cleanly restarting the Discord plugin.

Logs:

{"0":"{\"subsystem\":\"gateway/health-monitor\"}","1":"[discord:default] health-monitor: restarting (reason: stale-socket)","time":"2026-03-26T00:55:41.855-07:00"}
{"0":"[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005","time":"2026-03-26T00:55:42.019-07:00"}

Root cause analysis:

staleEventThresholdMs defaults to 30 minutes (1,800,000ms)
When exceeded, health monitor triggers restart via abort signal
onAbort sets reconnect.maxAttempts = 0 (intentional for clean shutdown)
gateway.disconnect() throws uncaught exception because maxAttempts is 0
Gateway process dies and restarts from scratch

Workaround: Disabled Discord health monitor: channels.discord.healthMonitor.enabled: false

Expected behavior: Health monitor restarts should not crash the gateway. The disconnect should be graceful even with maxAttempts: 0 since this is an intentional restart, not a connection failure.

Impact: Gateway becomes unreachable every ~30 minutes until health monitor is disabled.

Environment:

OS: Ubuntu 6.17
Node: v22.22.1
OpenClaw: 2026.3.24
Channel: Discord (default)

extent analysis

Fix Plan

To fix the issue, we need to modify the onAbort handler to handle the disconnect without throwing an uncaught exception when maxAttempts is 0.

Step-by-Step Solution

Modify the onAbort handler to catch and handle the exception:

gateway.on('abort', () => {
  gateway.options.reconnect = { maxAttempts: 0 };
  try {
    gateway.disconnect();
  } catch (error) {
    // Handle the exception, e.g., log it and continue with the restart
    console.error('Error during disconnect:', error);
  }
});

Alternatively, you can also modify the SafeGatewayPlugin.handleReconnectionAttempt function to not throw an exception when maxAttempts is 0:

// In provider-CAlWEl41.js
SafeGatewayPlugin.handleReconnectionAttempt = function() {
  // ...
  if (this.options.reconnect.maxAttempts === 0) {
    // Do not throw an exception, just return or log a message
    console.log('Max reconnect attempts (0) reached, disconnecting...');
    return;
  }
  // ...
};

Verification

To verify that the fix worked, you can:

Enable the Discord health monitor again by setting channels.discord.healthMonitor.enabled: true
Wait for the health monitor to trigger a restart (after 30 minutes of inactivity)
Check the logs to see if the gateway restarts cleanly without crashing

Extra Tips

Make sure to test the fix in a staging environment before deploying it to production
Consider adding additional logging or monitoring to detect and handle similar issues in the future
Review the onAbort handler and SafeGatewayPlugin.handleReconnectionAttempt function to ensure they are handling errors and exceptions correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Discord health monitor triggers uncaught exception → gateway crash (2026.3.24) [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #55042: fix(discord): suppress intentional reconnect exception during health monitor restart

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

Impact

Changed files

PR #55000: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering

Description (problem / solution / changelog)

Summary

Root Cause

Changes

Test Results

Changed files

Code Example

Bug: Discord health monitor triggers uncaught exception → gateway crash (2026.3.24)

extent analysis

Fix Plan

Step-by-Step Solution

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING