openclaw - ✅(Solved) Fix [Bug]: Gateway crashes with uncaught exception when Discord health monitor triggers stale-socket restart [2 pull requests, 3 comments, 3 participants]

openclaw2026-03-29 07:25:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#56854•Fetched 2026-04-08 01:46:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3cross-referenced ×2labeled ×2referenced ×1

Discord provider crashes the entire gateway with an uncaught exception when the health monitor detects a stale socket and attempts a restart. The abort handler sets maxReconnectAttempts to 0, then the WebSocket close handler tries to reconnect and throws.

Error Message

Error: Max reconnect attempts (0) reached after code 1005 at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47) at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)

Root Cause

Root cause: Race condition between the health monitor restart and the abort handler. The onAbort callback (line 6952) sets maxAttempts=0 to prevent reconnection during intentional shutdown, but the health-monitor-initiated restart triggers the same abort flow, and then the WebSocket close handler attempts reconnection against the now-disabled reconnect config.

PR fix notes

PR #213: Add upstream intelligence report from Scout

Repository: MillionthOdin16/openclaw
Author: MillionthOdin16
State: open | merged: False
Link: https://github.com/MillionthOdin16/openclaw/pull/213

Description (problem / solution / changelog)

Added Scout report and journal files identifying inherited defects from the parent repository (openclaw/openclaw), including #56854, #56832, and #56892.

PR created automatically by Jules for task 7521594946744656005 started by @MillionthOdin16

Summary by CodeRabbit

Documentation
- Added three new files documenting known issues, defect reports, and upstream problem tracking with detailed descriptions and impact assessments.

Changed files

.jules/scout.md (added, +14/-0)
openclaw_issues.txt (added, +100/-0)
scout-report.txt (added, +19/-0)
test_output.log (added, +964/-0)

PR #58988: fix(discord): use safe disconnect in onAbort to prevent gateway crash

Repository: openclaw/openclaw
Author: Starhappysh
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/58988

Description (problem / solution / changelog)

Summary

Fixes #56854

The Discord provider crashes the entire gateway with an uncaught exception when the health monitor triggers an abort. The onAbort handler sets gateway.options.reconnect.maxAttempts to 0 then calls gateway.disconnect(). The socket close handler sees maxAttempts=0 and throws "Max reconnect attempts (0) reached", killing the process.

Root cause

// onAbort handler (line 404)
params.gateway.options.reconnect = { maxAttempts: 0 };
params.gateway.disconnect(); // triggers close handler -> throw

The codebase already has disconnectGatewaySocketWithoutAutoReconnect() (line 135) which strips close/error listeners before disconnecting, specifically to prevent this race.

Fix

Replace the raw disconnect() call in onAbort with disconnectGatewaySocketWithoutAutoReconnect(). This reuses the existing safe disconnect pattern.

Test plan

Gateway survives Discord health monitor restart cycles without crashing
Intentional abort (e.g., config reload) cleanly disconnects Discord

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) [email protected]

Changed files

extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (modified, +19/-2)

Code Example

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)

---

02:04:04+05:00 [health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
02:04:28+05:00 [discord] [default] starting provider (@Openclaw)
02:39:04+05:00 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (provider-CAlWEl41.js:3307:9)

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Steps to reproduce

Start OpenClaw gateway 2026.3.24 with Discord channel enabled on Windows 11.
Wait ~35 minutes for the health monitor to detect a stale Discord WebSocket.
Health monitor triggers a provider restart, firing the abort signal.
The onAbort handler (line 6952 in provider-CAlWEl41.js) sets gateway.options.reconnect = { maxAttempts: 0 }.
gateway.disconnect() closes the WebSocket, triggering handleClose -> handleReconnectionAttempt.
Since maxAttempts is 0 and reconnectAttempts >= 0, it throws: Error: Max reconnect attempts (0) reached after code 1005.
Uncaught exception kills the entire gateway process.

Expected behavior

The health monitor restart should cleanly disconnect and reconnect the Discord provider without crashing the gateway. The abort handler should not interfere with the reconnection logic during a health-monitor-initiated restart.

Actual behavior

Gateway crashes with uncaught exception:

Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)

This happens every time the health monitor triggers a Discord restart (observed 3+ times, always ~35 min after gateway start).

OpenClaw version

2026.3.24

Operating system

Windows 11 (10.0.26200)

Install method

pnpm global

Model

kilocode/kilo-auto/free

Provider / routing chain

openclaw -> kilocode (kilo-auto/free auto-router)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

02:04:04+05:00 [health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
02:04:28+05:00 [discord] [default] starting provider (@Openclaw)
02:39:04+05:00 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [health-monitor] [discord:default] health-monitor: restarting (reason: stale-socket)
02:44:04+05:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
    at SafeGatewayPlugin.handleReconnectionAttempt (provider-CAlWEl41.js:3318:47)
    at SafeGatewayPlugin.handleClose (provider-CAlWEl41.js:3364:8)
    at WebSocket.<anonymous> (provider-CAlWEl41.js:3307:9)

Impact and severity

Affected: Anyone running Discord channel on OpenClaw (Windows confirmed, likely cross-platform) Severity: Critical - gateway process dies completely, all channels go offline Frequency: Every ~35 minutes (whenever health monitor fires stale-socket check on Discord) Consequence: Missed messages on all channels (WhatsApp, Discord, webchat) until manual restart. Requires external process wrapper to maintain uptime.

Additional information

Suggested fix: Either (a) don't set maxAttempts=0 in the abort handler and instead use a separate flag to skip reconnection during intentional shutdown, or (b) have the health monitor restart bypass the abort handler's reconnect suppression, or (c) catch the error in handleReconnectionAttempt when maxAttempts=0 instead of throwing uncaught.

Default reconnect config: line 6447 sets reconnect: { maxAttempts: 50 }, but the abort handler overrides to 0.

extent analysis

Fix Plan

To resolve the issue, we will implement a separate flag to skip reconnection during intentional shutdown, instead of setting maxAttempts to 0. This will prevent the race condition between the health monitor restart and the abort handler.

Here are the steps to fix the issue:

Introduce a new flag isIntentionalShutdown to track whether the shutdown is intentional or not.
In the onAbort handler, set isIntentionalShutdown to true instead of setting maxAttempts to 0.
In the handleReconnectionAttempt function, check the isIntentionalShutdown flag before attempting to reconnect. If it's true, skip reconnection.

Example code:

// Introduce a new flag to track intentional shutdown
let isIntentionalShutdown = false;

// In the onAbort handler
onAbort = () => {
  isIntentionalShutdown = true;
  // ...
}

// In the handleReconnectionAttempt function
handleReconnectionAttempt = () => {
  if (isIntentionalShutdown) {
    // Skip reconnection if it's an intentional shutdown
    return;
  }
  // ...
}

Verification

To verify that the fix worked, follow these steps:

Start the OpenClaw gateway with the Discord channel enabled.
Wait for the health monitor to detect a stale Discord WebSocket and trigger a restart.
Check the logs to ensure that the gateway does not crash with an uncaught exception.
Verify that the Discord provider reconnects successfully after the restart.

Extra Tips

Make sure to reset the isIntentionalShutdown flag to false after a successful restart to allow for reconnection attempts in case of future failures.
Consider adding additional logging to track the state of the isIntentionalShutdown flag and the reconnection attempts to aid in debugging and monitoring.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#tensor shape #autograd error #model save/load #optimization #mixed precision

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Gateway crashes with uncaught exception when Discord health monitor triggers stale-socket restart [2 pull requests, 3 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

PR fix notes

PR #213: Add upstream intelligence report from Scout

Description (problem / solution / changelog)

Summary by CodeRabbit

Changed files

PR #58988: fix(discord): use safe disconnect in onAbort to prevent gateway crash

Description (problem / solution / changelog)

Summary

Root cause

Fix

Test plan

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING