openclaw - ✅(Solved) Fix Discord WebSocket reconnect failure crashes entire gateway (affects all channels) [3 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54894Fetched 2026-04-08 01:34:45
View on GitHub
Comments
3
Participants
3
Timeline
30
Reactions
0
Timeline (top)
cross-referenced ×14referenced ×6commented ×3subscribed ×3

Error Message

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///…/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:400:5)
    at SafeGatewayPlugin.handleClose (…/GatewayPlugin.ts:484:8)
    at WebSocket.<anonymous> (…/GatewayPlugin.ts:376:9)
    at WebSocket.emit (node:events:519:28)
ELIFECYCLE  Command failed with exit code 1.

Root Cause

Root cause chain

Fix Action

Fixed

PR fix notes

PR #54945: fix(discord): catch gateway emitter errors to prevent whole-gateway crash on WebSocket failure

Description (problem / solution / changelog)

Summary

A transient Discord WebSocket disconnection (e.g. close code 1005) can crash the entire gateway process, taking down all channels (Feishu, WhatsApp, Telegram, etc.) — not just Discord.

Root Cause

In Node.js, if an EventEmitter emits "error" with no registered listener, it throws an unhandled error. This hits the global uncaughtException handler in src/index.ts which calls process.exit(1), killing the whole gateway.

The @buape/carbon GatewayPlugin can emit an "error" event (e.g. when maxAttempts is exhausted), and provider.lifecycle.ts had no listener on gatewayEmitter for that event — only a "debug" listener was registered.

Fix

Add an onGatewayEmitterError listener in extensions/discord/src/monitor/provider.lifecycle.ts that catches error events on the gateway emitter, logs them as warnings, and lets the existing lifecycle handler manage structured recovery (reconnect-exhausted → channel restart, fatal intents error → stop).

The listener is cleaned up in the finally block alongside the existing "debug" listener so there's no leak.

const onGatewayEmitterError = (error: unknown) => {
  runtime.warn?.(
    `discord: gateway emitter error (caught to prevent process crash): ${String(error)}`,
  );
};
gatewayEmitter?.on("error", onGatewayEmitterError);
// ... cleaned up in finally
gatewayEmitter?.off("error", onGatewayEmitterError);

Testing

pnpm test:extension discord: 51 test files, 476 tests — all passed


AI Assistance

  • AI-assisted (Claude Sonnet 4.6 via OpenClaw)
  • Lightly tested — pnpm check + pnpm test:extension discord passed
  • Full pnpm build && pnpm test not run (CI will validate)
  • Code reviewed and understood by human author

Fixes #54894

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +11/-0)

PR #54974: fix(discord): prevent gateway crash during health-monitor restart

Description (problem / solution / changelog)

Summary

— Discord health-monitor triggers uncaught exception crash loop, bringing down the entire gateway every ~35 minutes after upgrading from v2026.3.11 to v2026.3.24.

Discord WebSocket reconnect failure crashes entire gateway

Root Cause

An uncaught exception occurred due to a timing issue during the channel teardown sequence:

  1. In the onAbort handler, setting gateway.options.reconnect = { maxAttempts: 0 } and calling gateway.disconnect() synchronously triggered a Max reconnect attempts (0) reached error from @buape/carbon.
  2. Because gatewaySupervisor.detachLifecycle() was previously only called in the finally block, the supervisor was still in the active phase when this synchronous error was emitted.
  3. The error was routed to the lifecycle handler, treated as a fatal reconnect-exhausted event, and eventually rejected the wait promise.
  4. The catch block in runDiscordGatewayLifecycle only swallowed disallowed-intents errors, so this reconnect error was re-thrown, causing an uncaught exception that crashed the entire gateway process.

Changes

extensions/discord/src/monitor/provider.lifecycle.ts

  • Transitioned the supervisor to the teardown phase by calling params.gatewaySupervisor.detachLifecycle() before disconnecting the gateway in the onAbort handler. This ensures the synchronous reconnect error is safely suppressed by the supervisor's existing logLateTeardownEvent mechanism.
  • Added a safety net in the catch block to swallow all errors when lifecycleStopping is true, preventing any residual reconnect errors from propagating as uncaught exceptions during an intentional shutdown.

Test Results

All existing Discord lifecycle and supervisor tests pass. The idempotency of detachLifecycle() ensures the repeated call in the finally block remains safe.

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +15/-1)

PR #55000: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering

Description (problem / solution / changelog)

Summary

A transient Discord WebSocket disconnection (e.g. close code 1005) or an intentional health-monitor restart can trigger an uncaught Max reconnect attempts (0) exception that crashes the entire gateway process. This kills all channels (Feishu, WhatsApp, Telegram, etc.), not just Discord.

This PR fixes the root cause of these gateway crashes by correcting the supervisor phase transition ordering during an abort.

Fixes #54931 Fixes #54894

Root Cause

The gateway crash happens due to an error routing mismatch during teardown:

  1. onAbort() sets maxAttempts to 0 and calls gateway.disconnect().
  2. disconnect() synchronously triggers @buape/carbon's handleClosehandleReconnectionAttempt, which emits a Max reconnect attempts (0) error on the gateway emitter.
  3. The gateway-supervisor is still in the active phase at this point, so it routes the error to the lifecycle handler instead of suppressing it.
  4. The error surfaces as an uncaught exception, causing the entire gateway process to exit.

The supervisor already has the correct teardown suppression logic (it logs and swallows late errors during the teardown phase). The bug is simply that onAbort() never transitions the supervisor to the teardown phase before disconnecting.

Changes

1. Fix teardown ordering in onAbort()

Call params.gatewaySupervisor.detachLifecycle() before gateway.disconnect(). This ensures the supervisor enters the teardown phase and correctly suppresses the synchronous error emitted during disconnect.

2. Add lifecycleStopping safety net

In the catch block, when lifecycleStopping is already true, we no longer re-throw errors. This acts as a defense-in-depth guard for any edge case where an error might still escape during an intentional shutdown.

Test Results

  • The extension-fast (extension-fast-discord, discord) CI check passes.
  • Verified locally that triggering an abort correctly suppresses the disconnect error and allows the gateway to restart gracefully without crashing the main process.

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +14/-1)

Code Example

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///…/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:400:5)
    at SafeGatewayPlugin.handleClose (/GatewayPlugin.ts:484:8)
    at WebSocket.<anonymous> (/GatewayPlugin.ts:376:9)
    at WebSocket.emit (node:events:519:28)
ELIFECYCLE  Command failed with exit code 1.
RAW_BUFFERClick to expand / collapse

Bug

A transient Discord WebSocket disconnection (close code 1005) triggers an uncaught exception that crashes the entire gateway process. This kills all channels (Feishu, WhatsApp, etc.), not just Discord.

This happened twice in one morning during normal Feishu usage. Both times, the gateway had to be manually restarted.

Error

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///…/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:400:5)
    at SafeGatewayPlugin.handleClose (…/GatewayPlugin.ts:484:8)
    at WebSocket.<anonymous> (…/GatewayPlugin.ts:376:9)
    at WebSocket.emit (node:events:519:28)
ELIFECYCLE  Command failed with exit code 1.

Analysis

Root cause chain

  1. Discord WebSocket closes with code 1005 (No Status Received — normal network fluctuation)
  2. @buape/carbon GatewayPlugin attempts reconnect, but maxAttempts resolves to 0 despite being configured as 50 in createDiscordGatewayPlugin() (extensions/discord/src/monitor/gateway-plugin.ts)
  3. handleReconnectionAttempt immediately gives up → emits an Error via this.emitter.emit("error", ...)
  4. The error propagates as an uncaught exception (no global process.on('uncaughtException') handler in the gateway)
  5. Entire gateway process exits → all channels (Feishu, Discord, WhatsApp, etc.) go down

Two issues

  1. maxAttempts mismatch: The code passes { reconnect: { maxAttempts: 50 } } to the GatewayPlugin constructor, but at runtime the value is 0. This may be a @buape/[email protected] bug or a state mutation during the session.

  2. No fault isolation: A single channel's WebSocket failure should not crash the entire gateway. Currently there is no uncaughtException/unhandledRejection handler, and no error boundary around individual channel plugins.

Expected behavior

  • Discord WebSocket disconnection should retry with the configured 50 max attempts
  • If Discord permanently fails, it should be marked as degraded — other channels should continue working
  • The gateway process should not exit due to a single channel's connection failure

Environment

  • OpenClaw: 2026.3.24
  • Node.js: v22.22.1
  • @buape/carbon: 0.0.0-beta-20260216184201
  • OS: Linux 6.8.0-101-generic (x64)
  • Channels: Feishu (primary) + Discord

Suggested fixes

  1. Immediate: Add a try/catch or error listener around SafeGatewayPlugin's error emission so it doesn't become an uncaught exception
  2. Short-term: Add global process.on('uncaughtException') / process.on('unhandledRejection') handlers in the gateway entry point that log the error and attempt graceful recovery
  3. Long-term: Isolate channel plugins so one channel's failure doesn't affect others (e.g., per-channel error boundaries, automatic restart of failed channels)

extent analysis

Fix Plan

To address the issue, we'll implement the following steps:

  • Add a try/catch block around the SafeGatewayPlugin error emission
  • Implement global process.on('uncaughtException') and process.on('unhandledRejection') handlers
  • Isolate channel plugins to prevent a single channel's failure from affecting others

Code Changes

Try/Catch Block

try {
  // existing code that emits an error
  this.emitter.emit("error", new Error("Max reconnect attempts reached"));
} catch (error) {
  // log the error and attempt recovery
  console.error("Error occurred in SafeGatewayPlugin:", error);
  // add recovery logic here, e.g., retry or restart the plugin
}

Global Error Handlers

process.on('uncaughtException', (error) => {
  console.error("Uncaught exception:", error);
  // attempt graceful recovery, e.g., restart the gateway
});

process.on('unhandledRejection', (reason, promise) => {
  console.error("Unhandled rejection:", reason);
  // attempt graceful recovery, e.g., restart the gateway
});

Isolate Channel Plugins

// create a separate process or container for each channel plugin
// this will prevent a single channel's failure from affecting others

// example using Node.js clusters
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // fork a new process for each channel plugin
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    // restart the worker if it exits
    cluster.fork();
  });
} else {
  // run the channel plugin in the worker process
  // ...
}

Verification

To verify the fix, simulate a Discord WebSocket disconnection and check that:

  • The gateway process does not exit
  • Other channels (e.g., Feishu, WhatsApp) continue to work
  • The Discord channel is marked as degraded and retry attempts are made

Extra Tips

  • Monitor the gateway logs for errors and adjust the recovery logic accordingly
  • Consider implementing a circuit breaker pattern to prevent cascading failures
  • Use a process manager like PM2 to automatically restart the gateway if it exits unexpectedly

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Discord WebSocket disconnection should retry with the configured 50 max attempts
  • If Discord permanently fails, it should be marked as degraded — other channels should continue working
  • The gateway process should not exit due to a single channel's connection failure

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Discord WebSocket reconnect failure crashes entire gateway (affects all channels) [3 pull requests, 3 comments, 3 participants]