- Discord WebSocket disconnection should retry with the configured 50 max attempts - If Discord permanently fails, it should be marked as degraded — **other channels should continue working** - The gateway process should not exit due to a single channel's connection failure

openclaw - ✅(Solved) Fix Discord WebSocket reconnect failure crashes entire gateway (affects all channels) [3 pull requests, 3 comments, 3 participants]

openclaw2026-03-26 04:36:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54894•Fetched 2026-04-08 01:34:45

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×14referenced ×6commented ×3subscribed ×3

Error Message

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///…/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:400:5)
    at SafeGatewayPlugin.handleClose (…/GatewayPlugin.ts:484:8)
    at WebSocket.<anonymous> (…/GatewayPlugin.ts:376:9)
    at WebSocket.emit (node:events:519:28)
    …
ELIFECYCLE  Command failed with exit code 1.

Root Cause

Root cause chain

Fix Action

Fixed

Fixed by PR: fix(discord): catch gateway emitter errors to prevent whole-gateway crash on WebSocket failure (https://github.com/openclaw/openclaw/pull/54945)
Fixed by PR: fix(discord): prevent gateway crash during health-monitor restart (https://github.com/openclaw/openclaw/pull/54974)
Fixed by PR: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering (https://github.com/openclaw/openclaw/pull/55000)

PR fix notes

PR #54945: fix(discord): catch gateway emitter errors to prevent whole-gateway crash on WebSocket failure

Repository: openclaw/openclaw
Author: lyfuci
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/54945

Description (problem / solution / changelog)

Summary

A transient Discord WebSocket disconnection (e.g. close code 1005) can crash the entire gateway process, taking down all channels (Feishu, WhatsApp, Telegram, etc.) — not just Discord.

Root Cause

In Node.js, if an EventEmitter emits "error" with no registered listener, it throws an unhandled error. This hits the global uncaughtException handler in src/index.ts which calls process.exit(1), killing the whole gateway.

The @buape/carbon GatewayPlugin can emit an "error" event (e.g. when maxAttempts is exhausted), and provider.lifecycle.ts had no listener on gatewayEmitter for that event — only a "debug" listener was registered.

Fix

Add an onGatewayEmitterError listener in extensions/discord/src/monitor/provider.lifecycle.ts that catches error events on the gateway emitter, logs them as warnings, and lets the existing lifecycle handler manage structured recovery (reconnect-exhausted → channel restart, fatal intents error → stop).

The listener is cleaned up in the finally block alongside the existing "debug" listener so there's no leak.

const onGatewayEmitterError = (error: unknown) => {
  runtime.warn?.(
    `discord: gateway emitter error (caught to prevent process crash): ${String(error)}`,
  );
};
gatewayEmitter?.on("error", onGatewayEmitterError);
// ... cleaned up in finally
gatewayEmitter?.off("error", onGatewayEmitterError);

Testing

pnpm test:extension discord: 51 test files, 476 tests — all passed ✅

AI Assistance

AI-assisted (Claude Sonnet 4.6 via OpenClaw)
Lightly tested — pnpm check + pnpm test:extension discord passed
Full pnpm build && pnpm test not run (CI will validate)
Code reviewed and understood by human author

Fixes #54894

Changed files

extensions/discord/src/monitor/provider.lifecycle.ts (modified, +11/-0)

PR #54974: fix(discord): prevent gateway crash during health-monitor restart

Repository: openclaw/openclaw
Author: openperf
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/54974

Description (problem / solution / changelog)

Summary

— Discord health-monitor triggers uncaught exception crash loop, bringing down the entire gateway every ~35 minutes after upgrading from v2026.3.11 to v2026.3.24.

— Discord WebSocket reconnect failure crashes entire gateway

Root Cause

An uncaught exception occurred due to a timing issue during the channel teardown sequence:

In the onAbort handler, setting gateway.options.reconnect = { maxAttempts: 0 } and calling gateway.disconnect() synchronously triggered a Max reconnect attempts (0) reached error from @buape/carbon.
Because gatewaySupervisor.detachLifecycle() was previously only called in the finally block, the supervisor was still in the active phase when this synchronous error was emitted.
The error was routed to the lifecycle handler, treated as a fatal reconnect-exhausted event, and eventually rejected the wait promise.
The catch block in runDiscordGatewayLifecycle only swallowed disallowed-intents errors, so this reconnect error was re-thrown, causing an uncaught exception that crashed the entire gateway process.

Changes

extensions/discord/src/monitor/provider.lifecycle.ts

Transitioned the supervisor to the teardown phase by calling params.gatewaySupervisor.detachLifecycle() before disconnecting the gateway in the onAbort handler. This ensures the synchronous reconnect error is safely suppressed by the supervisor's existing logLateTeardownEvent mechanism.
Added a safety net in the catch block to swallow all errors when lifecycleStopping is true, preventing any residual reconnect errors from propagating as uncaught exceptions during an intentional shutdown.

Test Results

All existing Discord lifecycle and supervisor tests pass. The idempotency of detachLifecycle() ensures the repeated call in the finally block remains safe.

Changed files

extensions/discord/src/monitor/provider.lifecycle.ts (modified, +15/-1)

PR #55000: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering

Repository: openclaw/openclaw
Author: openperf
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/55000

Description (problem / solution / changelog)

Summary

A transient Discord WebSocket disconnection (e.g. close code 1005) or an intentional health-monitor restart can trigger an uncaught Max reconnect attempts (0) exception that crashes the entire gateway process. This kills all channels (Feishu, WhatsApp, Telegram, etc.), not just Discord.

This PR fixes the root cause of these gateway crashes by correcting the supervisor phase transition ordering during an abort.

Fixes #54931 Fixes #54894

Root Cause

The gateway crash happens due to an error routing mismatch during teardown:

onAbort() sets maxAttempts to 0 and calls gateway.disconnect().
disconnect() synchronously triggers @buape/carbon's handleClose → handleReconnectionAttempt, which emits a Max reconnect attempts (0) error on the gateway emitter.
The gateway-supervisor is still in the active phase at this point, so it routes the error to the lifecycle handler instead of suppressing it.
The error surfaces as an uncaught exception, causing the entire gateway process to exit.

The supervisor already has the correct teardown suppression logic (it logs and swallows late errors during the teardown phase). The bug is simply that onAbort() never transitions the supervisor to the teardown phase before disconnecting.

Changes

1. Fix teardown ordering in onAbort()

Call params.gatewaySupervisor.detachLifecycle() before gateway.disconnect(). This ensures the supervisor enters the teardown phase and correctly suppresses the synchronous error emitted during disconnect.

2. Add lifecycleStopping safety net

In the catch block, when lifecycleStopping is already true, we no longer re-throw errors. This acts as a defense-in-depth guard for any edge case where an error might still escape during an intentional shutdown.

Test Results

The extension-fast (extension-fast-discord, discord) CI check passes.
Verified locally that triggering an abort correctly suppresses the disconnect error and allows the gateway to restart gracefully without crashing the main process.

Changed files

extensions/discord/src/monitor/provider.lifecycle.ts (modified, +14/-1)

Code Example

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///…/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:400:5)
    at SafeGatewayPlugin.handleClose (…/GatewayPlugin.ts:484:8)
    at WebSocket.<anonymous> (…/GatewayPlugin.ts:376:9)
    at WebSocket.emit (node:events:519:28)
    …
ELIFECYCLE  Command failed with exit code 1.

RAW_BUFFERClick to expand / collapse

Bug

A transient Discord WebSocket disconnection (close code 1005) triggers an uncaught exception that crashes the entire gateway process. This kills all channels (Feishu, WhatsApp, etc.), not just Discord.

This happened twice in one morning during normal Feishu usage. Both times, the gateway had to be manually restarted.

Error

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (file:///…/node_modules/@buape/carbon/src/plugins/gateway/GatewayPlugin.ts:400:5)
    at SafeGatewayPlugin.handleClose (…/GatewayPlugin.ts:484:8)
    at WebSocket.<anonymous> (…/GatewayPlugin.ts:376:9)
    at WebSocket.emit (node:events:519:28)
    …
ELIFECYCLE  Command failed with exit code 1.

Analysis

Root cause chain

Discord WebSocket closes with code 1005 (No Status Received — normal network fluctuation)
@buape/carbon GatewayPlugin attempts reconnect, but maxAttempts resolves to 0 despite being configured as 50 in createDiscordGatewayPlugin() (extensions/discord/src/monitor/gateway-plugin.ts)
handleReconnectionAttempt immediately gives up → emits an Error via this.emitter.emit("error", ...)
The error propagates as an uncaught exception (no global process.on('uncaughtException') handler in the gateway)
Entire gateway process exits → all channels (Feishu, Discord, WhatsApp, etc.) go down

Two issues

maxAttempts mismatch: The code passes { reconnect: { maxAttempts: 50 } } to the GatewayPlugin constructor, but at runtime the value is 0. This may be a @buape/[email protected] bug or a state mutation during the session.
No fault isolation: A single channel's WebSocket failure should not crash the entire gateway. Currently there is no uncaughtException/unhandledRejection handler, and no error boundary around individual channel plugins.

Expected behavior

Discord WebSocket disconnection should retry with the configured 50 max attempts
If Discord permanently fails, it should be marked as degraded — other channels should continue working
The gateway process should not exit due to a single channel's connection failure

Environment

OpenClaw: 2026.3.24
Node.js: v22.22.1
@buape/carbon: 0.0.0-beta-20260216184201
OS: Linux 6.8.0-101-generic (x64)
Channels: Feishu (primary) + Discord

Suggested fixes

Immediate: Add a try/catch or error listener around SafeGatewayPlugin's error emission so it doesn't become an uncaught exception
Short-term: Add global process.on('uncaughtException') / process.on('unhandledRejection') handlers in the gateway entry point that log the error and attempt graceful recovery
Long-term: Isolate channel plugins so one channel's failure doesn't affect others (e.g., per-channel error boundaries, automatic restart of failed channels)

extent analysis

Fix Plan

To address the issue, we'll implement the following steps:

Add a try/catch block around the SafeGatewayPlugin error emission
Implement global process.on('uncaughtException') and process.on('unhandledRejection') handlers
Isolate channel plugins to prevent a single channel's failure from affecting others

Code Changes

Try/Catch Block

try {
  // existing code that emits an error
  this.emitter.emit("error", new Error("Max reconnect attempts reached"));
} catch (error) {
  // log the error and attempt recovery
  console.error("Error occurred in SafeGatewayPlugin:", error);
  // add recovery logic here, e.g., retry or restart the plugin
}

Global Error Handlers

process.on('uncaughtException', (error) => {
  console.error("Uncaught exception:", error);
  // attempt graceful recovery, e.g., restart the gateway
});

process.on('unhandledRejection', (reason, promise) => {
  console.error("Unhandled rejection:", reason);
  // attempt graceful recovery, e.g., restart the gateway
});

Isolate Channel Plugins

// create a separate process or container for each channel plugin
// this will prevent a single channel's failure from affecting others

// example using Node.js clusters
const cluster = require('cluster');
const numCPUs = require('os').cpus().length;

if (cluster.isMaster) {
  // fork a new process for each channel plugin
  for (let i = 0; i < numCPUs; i++) {
    cluster.fork();
  }

  cluster.on('exit', (worker, code, signal) => {
    // restart the worker if it exits
    cluster.fork();
  });
} else {
  // run the channel plugin in the worker process
  // ...
}

Verification

To verify the fix, simulate a Discord WebSocket disconnection and check that:

The gateway process does not exit
Other channels (e.g., Feishu, WhatsApp) continue to work
The Discord channel is marked as degraded and retry attempts are made

Extra Tips

Monitor the gateway logs for errors and adjust the recovery logic accordingly
Consider implementing a circuit breaker pattern to prevent cascading failures
Use a process manager like PM2 to automatically restart the gateway if it exits unexpectedly

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Discord WebSocket disconnection should retry with the configured 50 max attempts
If Discord permanently fails, it should be marked as degraded — other channels should continue working
The gateway process should not exit due to a single channel's connection failure

#API middleware #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Discord WebSocket reconnect failure crashes entire gateway (affects all channels) [3 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause chain

Fix Action

Fixed

PR fix notes

PR #54945: fix(discord): catch gateway emitter errors to prevent whole-gateway crash on WebSocket failure

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

AI Assistance

Changed files

PR #54974: fix(discord): prevent gateway crash during health-monitor restart

Description (problem / solution / changelog)

Summary

Root Cause

Changes

Test Results

Changed files

PR #55000: fix(discord): prevent gateway crash on abort by fixing supervisor teardown ordering

Description (problem / solution / changelog)

Summary

Root Cause

Changes

Test Results

Changed files

Code Example

Bug

Error

Analysis

Root cause chain

Two issues

Expected behavior

Environment

Suggested fixes

extent analysis

Fix Plan

Code Changes

Try/Catch Block

Global Error Handlers

Isolate Channel Plugins

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING