openclaw - ✅(Solved) Fix Discord WebSocket crash: 'Max reconnect attempts (0) reached after code 1005' since 2026.3.24 [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55421Fetched 2026-04-08 01:39:42
View on GitHub
Comments
1
Participants
2
Timeline
10
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×6closed ×1commented ×1locked ×1

Since upgrading to 2026.3.24, the Discord provider crashes every ~35 minutes with an uncaught exception when the Discord gateway drops the WebSocket connection with code 1005 (No Status Received). Prior to this version (e.g. 2026.3.13), the same stale-socket event was handled gracefully without a crash.

Error Message

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3307:9)
    at WebSocket.emit (node:events:519:28)
    at WebSocket.emitClose (/opt/homebrew/lib/node_modules/openclaw/node_modules/ws/lib/websocket.js:273:10)
    at TLSSocket.socketOnClose (/opt/homebrew/lib/node_modules/openclaw/node_modules/ws/lib/websocket.js:1346:15)

Root Cause

Since upgrading to 2026.3.24, the Discord provider crashes every ~35 minutes with an uncaught exception when the Discord gateway drops the WebSocket connection with code 1005 (No Status Received). Prior to this version (e.g. 2026.3.13), the same stale-socket event was handled gracefully without a crash.

Fix Action

Fixed

PR fix notes

PR #55177: fix(discord): suppress shutdown errors with persistent listener to prevent crash (#55116)

Description (problem / solution / changelog)

Summary

Fixes #55116 Fixes #55421

When the gateway health-monitor detects a stale socket, it calls stopChannel() then startChannel(). The abort signal fires onAbort(), which sets maxAttempts: 0 and calls gateway.disconnect(). The WebSocket close event fires asynchronously, and Carbon's handleReconnectionAttempt emits an 'error' event when it sees reconnectAttempts (0) >= maxAttempts (0).

Root Cause

The previous code used gatewayEmitter.once('error', noop) to absorb this error. Problem: if another error fires concurrently before the reconnect error, the once listener is consumed, leaving the reconnect error unhandled. Node.js converts an unhandled 'error' emit into an uncaught exception, crashing the entire gateway process.

This explains why the bug manifests as a crash-loop: the health monitor restarts the channel, which triggers the same code path again.

Fix

Replace the once listener with a tracked persistent on listener (suppressShutdownError) that is:

  • Armed in onAbort() before calling gateway.disconnect()
  • Removed in the finally block (always, regardless of how the lifecycle exits)

This guarantees all errors emitted during shutdown are suppressed in the correct order, without listener leaks.

Testing

Added 2 regression tests in provider.lifecycle.test.ts:

  1. Verifies the suppression listener is properly cleaned up after lifecycle exits
  2. Verifies both a concurrent unrelated error AND the reconnect error are suppressed (the scenario that broke the old once approach)

All 13 tests pass (pnpm vitest run src/discord/monitor/provider.lifecycle.test.ts).

Changed files

  • src/discord/monitor/provider.lifecycle.test.ts (modified, +83/-0)
  • src/discord/monitor/provider.lifecycle.ts (modified, +19/-1)

PR #55443: fix(discord): treat reconnect-exhausted as graceful stop, not crash

Description (problem / solution / changelog)

Summary

  • Fixes #55421 — Discord WebSocket crash: "Max reconnect attempts (0) reached after code 1005"
  • reconnect-exhausted events in drainPendingGatewayErrors() now always return "stop" instead of conditionally throwing based on lifecycleStopping
  • Root cause: race condition where the health monitor's onAbort() sets maxAttempts=0 and disconnects, but lifecycleStopping isn't set until the finally block — so the queued reconnect-exhausted event would throw an uncaught exception

Changes

In extensions/discord/src/monitor/provider.lifecycle.ts:

  • Removed lifecycleStopping && guard from the reconnect-exhausted check in drainPendingGatewayErrors(), so reconnect-exhausted always gracefully stops the lifecycle instead of crashing
  • The health monitor already handles reconnection — throwing here was never the right behavior

In extensions/discord/src/monitor/provider.lifecycle.test.ts:

  • Updated the "reconnect-exhausted queued before shutdown" test to expect graceful resolution instead of a thrown error

Test plan

  • pnpm test -- extensions/discord/src/monitor/gateway-supervisor.test.ts extensions/discord/src/monitor/provider.lifecycle.test.ts — 25/25 pass
  • All pre-commit checks pass (format, lint, typecheck, boundary checks)

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +17/-15)
  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +7/-7)

Code Example

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3307:9)
    at WebSocket.emit (node:events:519:28)
    at WebSocket.emitClose (/opt/homebrew/lib/node_modules/openclaw/node_modules/ws/lib/websocket.js:273:10)
    at TLSSocket.socketOnClose (/opt/homebrew/lib/node_modules/openclaw/node_modules/ws/lib/websocket.js:1346:15)

---

2026-03-26T18:41:36 [health-monitor] restarting (reason: stale-socket)
2026-03-26T18:41:36 [openclaw] Uncaught exception: Max reconnect attempts (0) reached after code 1005
2026-03-26T19:16:38 [health-monitor] restarting (reason: stale-socket)
2026-03-26T19:16:38 [openclaw] Uncaught exception: Max reconnect attempts (0) reached after code 1005
... (continues every ~35-90 minutes overnight)
2026-03-27T08:27:18 [health-monitor] restarting (reason: stale-socket)
2026-03-27T08:27:19 [openclaw] Uncaught exception: Max reconnect attempts (0) reached after code 1005
RAW_BUFFERClick to expand / collapse

Summary

Since upgrading to 2026.3.24, the Discord provider crashes every ~35 minutes with an uncaught exception when the Discord gateway drops the WebSocket connection with code 1005 (No Status Received). Prior to this version (e.g. 2026.3.13), the same stale-socket event was handled gracefully without a crash.

Environment

  • OpenClaw version: 2026.3.24
  • Node version: v22.22.1 / v25.8.1
  • OS: macOS 26.3.1 (arm64)
  • Channel: Discord

Error

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (file:///opt/homebrew/lib/node_modules/openclaw/dist/provider-CAlWEl41.js:3307:9)
    at WebSocket.emit (node:events:519:28)
    at WebSocket.emitClose (/opt/homebrew/lib/node_modules/openclaw/node_modules/ws/lib/websocket.js:273:10)
    at TLSSocket.socketOnClose (/opt/homebrew/lib/node_modules/openclaw/node_modules/ws/lib/websocket.js:1346:15)

Frequency

Reproducible every ~35 minutes. The health-monitor detects the stale-socket and triggers a process restart, but instead of restarting gracefully, the process crashes first due to the uncaught exception, then LaunchAgent restarts it.

Log Timeline (2026-03-26 ~ 2026-03-27)

2026-03-26T18:41:36 [health-monitor] restarting (reason: stale-socket)
2026-03-26T18:41:36 [openclaw] Uncaught exception: Max reconnect attempts (0) reached after code 1005
2026-03-26T19:16:38 [health-monitor] restarting (reason: stale-socket)
2026-03-26T19:16:38 [openclaw] Uncaught exception: Max reconnect attempts (0) reached after code 1005
... (continues every ~35-90 minutes overnight)
2026-03-27T08:27:18 [health-monitor] restarting (reason: stale-socket)
2026-03-27T08:27:19 [openclaw] Uncaught exception: Max reconnect attempts (0) reached after code 1005

Before vs After

  • 2026.3.13 and earlier: stale-socket events triggered a graceful internal restart, no crash, no Uncaught exception
  • 2026.3.24: Same stale-socket event now causes an unhandled exception crash (code 1005)

The dist bundle name also changed between versions (subsystem-BDbeCphF.jsenv-D1ktUnAV.js), suggesting significant refactoring in the Discord provider reconnection logic.

Expected Behavior

Discord WebSocket disconnections (code 1005) should be handled gracefully with reconnection or clean restart — not as an uncaught exception that crashes the process.

extent analysis

Fix Plan

To fix the issue, we need to modify the reconnection logic in the Discord provider to handle the stale-socket event and WebSocket disconnections (code 1005) more robustly. Here are the steps:

  • Increase the max reconnect attempts to a reasonable value (e.g., 5) to prevent the process from crashing immediately.
  • Implement a retry mechanism with exponential backoff to handle temporary connection issues.
  • Catch and handle the Error: Max reconnect attempts reached exception to prevent the process from crashing.

Example code:

// Increase max reconnect attempts
const maxReconnectAttempts = 5;

// Implement retry mechanism with exponential backoff
const retryDelay = 500; // initial delay in ms
const maxRetryDelay = 30000; // max delay in ms
let reconnectAttempts = 0;

// Catch and handle the exception
try {
  // existing reconnection logic
} catch (error) {
  if (error.message.includes('Max reconnect attempts reached')) {
    console.log('Max reconnect attempts reached. Restarting...');
    // trigger a clean restart
  } else {
    throw error;
  }
}

// Example of a retry function
function retryReconnect() {
  reconnectAttempts++;
  const delay = Math.min(retryDelay * Math.pow(2, reconnectAttempts), maxRetryDelay);
  setTimeout(() => {
    // attempt to reconnect
    if (reconnectAttempts < maxReconnectAttempts) {
      retryReconnect();
    } else {
      // trigger a clean restart
    }
  }, delay);
}

Verification

To verify that the fix worked, monitor the process for at least 24 hours to ensure that it no longer crashes due to the stale-socket event and WebSocket disconnections (code 1005). Check the logs for any errors or exceptions related to the reconnection logic.

Extra Tips

  • Consider implementing a circuit breaker pattern to detect and prevent cascading failures in the reconnection logic.
  • Review the Discord provider documentation to ensure that the reconnection logic is aligned with the recommended best practices.
  • Test the fix thoroughly in a staging environment before deploying it to production.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING