openclaw - ✅(Solved) Fix Gateway crashes on config reload: unhandled reconnect-exhausted error during abort [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56137Fetched 2026-04-08 01:44:30
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

When a config change triggers a gateway reload, the Discord gateway's onAbort handler sets maxAttempts: 0 and disconnects. The resulting Max reconnect attempts (0) reached after code 1005 error is not caught and propagates to the global uncaughtException handler, which calls process.exit(1). This causes the gateway to crash and restart on every config reload.

Error Message

Mar 26 23:34:18 [reload] config change requires gateway restart (auth.profiles.modelstudio:default)
Mar 26 23:34:18 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
Mar 26 23:34:18 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
Mar 26 23:34:23 openclaw-gateway.service: Scheduled restart job, restart counter is at 1.
Mar 26 23:34:23 Started openclaw-gateway.service
Mar 26 23:35:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
Mar 26 23:35:00 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE

Root Cause

In provider-CAlWEl41.js, the onAbort handler (around line 6952) intentionally sets gateway.options.reconnect = { maxAttempts: 0 } before calling gateway.disconnect(). This is correct behavior for a clean shutdown.

However, the error emitted by handleReconnectionAttempt when maxAttempts is reached:

this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));

...is classified by classifyDiscordGatewayEvent as "reconnect-exhausted" with shouldStopLifecycle: true. This error is not caught by the abort flow and bubbles up as an uncaught exception, hitting:

process.on("uncaughtException", (error) => {
    console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
    process.exit(1);
});

Fix Action

Fixed

PR fix notes

PR #56164: fix(discord): suppress shutdown errors with persistent listener to prevent crash

Description (problem / solution / changelog)

Summary

Fixes #55116 Fixes #55421 Fixes #56137

This bug has now been reported three times independently, confirming its impact.

When the gateway health-monitor detects a stale socket, it calls stopChannel() then startChannel(). The abort signal fires onAbort(), which sets maxAttempts: 0 and calls gateway.disconnect(). The WebSocket close event fires asynchronously, and Carbon's handleReconnectionAttempt emits an 'error' event when it sees reconnectAttempts (0) >= maxAttempts (0).

Root Cause

The previous code used gatewayEmitter.once('error', noop) to absorb this error. Problem: if another error fires concurrently before the reconnect error, the once listener is consumed, leaving the reconnect error unhandled. Node.js converts an unhandled 'error' emit into an uncaught exception, crashing the entire gateway process.

This explains the crash-loop: the health monitor restarts the channel, which triggers the same code path again.

Fix

Replace the once listener with a tracked persistent on listener (suppressShutdownError) that is:

  • Armed in onAbort() before calling gateway.disconnect()
  • Removed in the finally block (always, regardless of how the lifecycle exits)

This guarantees all errors emitted during shutdown are suppressed regardless of ordering, and the listener is always cleaned up.

Testing

Added 2 regression tests in provider.lifecycle.test.ts:

  1. Verifies the suppression listener is properly cleaned up after lifecycle exits
  2. Verifies both a concurrent unrelated error AND the reconnect error are suppressed

All 13 tests pass.

Changed files

  • src/discord/monitor/provider.lifecycle.test.ts (modified, +83/-0)
  • src/discord/monitor/provider.lifecycle.ts (modified, +19/-1)

Code Example

[openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005

---

this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));

---

process.on("uncaughtException", (error) => {
    console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
    process.exit(1);
});

---

Mar 26 23:34:18 [reload] config change requires gateway restart (auth.profiles.modelstudio:default)
Mar 26 23:34:18 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
Mar 26 23:34:18 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
Mar 26 23:34:23 openclaw-gateway.service: Scheduled restart job, restart counter is at 1.
Mar 26 23:34:23 Started openclaw-gateway.service
Mar 26 23:35:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
Mar 26 23:35:00 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
RAW_BUFFERClick to expand / collapse

Description

When a config change triggers a gateway reload, the Discord gateway's onAbort handler sets maxAttempts: 0 and disconnects. The resulting Max reconnect attempts (0) reached after code 1005 error is not caught and propagates to the global uncaughtException handler, which calls process.exit(1). This causes the gateway to crash and restart on every config reload.

Reproduction

  1. Run the gateway as a systemd service with Discord integration enabled
  2. Trigger any config change that requires a gateway restart (e.g., modifying auth.profiles, plugins, gateway.tools, etc.)
  3. The gateway crashes with:
    [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
  4. systemd restarts the service, and any subsequent config change repeats the cycle

Root Cause

In provider-CAlWEl41.js, the onAbort handler (around line 6952) intentionally sets gateway.options.reconnect = { maxAttempts: 0 } before calling gateway.disconnect(). This is correct behavior for a clean shutdown.

However, the error emitted by handleReconnectionAttempt when maxAttempts is reached:

this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));

...is classified by classifyDiscordGatewayEvent as "reconnect-exhausted" with shouldStopLifecycle: true. This error is not caught by the abort flow and bubbles up as an uncaught exception, hitting:

process.on("uncaughtException", (error) => {
    console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
    process.exit(1);
});

Expected Behavior

When the abort signal fires and the gateway disconnects intentionally, the reconnect-exhausted error should be suppressed or caught, since the disconnect was deliberate. The process should not crash.

Suggested Fix

Either:

  1. In the onAbort handler, set a flag (e.g., lifecycleStopping = true is already set) and check it before emitting the reconnect-exhausted error
  2. Catch the error emitted during abort-triggered disconnect so it doesn't propagate to the global handler
  3. Skip the handleReconnectionAttempt path entirely when maxAttempts is 0 (treat it as an intentional no-reconnect)

Environment

  • openclaw v2026.3.24
  • Linux (systemd user service)
  • Discord gateway integration enabled
  • Multiple agents configured

Impact

In the last 24 hours, this caused 83 restarts of the gateway service. Each restart drops all active Discord/Telegram sessions, interrupts embedded agent runs, and triggers re-discovery of all provider models.

Logs

Mar 26 23:34:18 [reload] config change requires gateway restart (auth.profiles.modelstudio:default)
Mar 26 23:34:18 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
Mar 26 23:34:18 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
Mar 26 23:34:23 openclaw-gateway.service: Scheduled restart job, restart counter is at 1.
Mar 26 23:34:23 Started openclaw-gateway.service
Mar 26 23:35:00 [openclaw] Uncaught exception: Error: Max reconnect attempts (0) reached after code 1005
Mar 26 23:35:00 openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE

extent analysis

Fix Plan

To prevent the gateway from crashing and restarting on every config reload, we need to catch the reconnect-exhausted error emitted during abort-triggered disconnect. Here are the steps:

  • Set a flag in the onAbort handler to indicate that the disconnect is intentional:
onAbort: () => {
  lifecycleStopping = true;
  gateway.options.reconnect = { maxAttempts: 0 };
  gateway.disconnect();
}
  • Check this flag before emitting the reconnect-exhausted error:
handleReconnectionAttempt: () => {
  if (lifecycleStopping) return;
  // existing code to emit reconnect-exhausted error
}

Alternatively, you can catch the error emitted during abort-triggered disconnect:

gateway.on("error", (error) => {
  if (error.message.includes("Max reconnect attempts (0) reached")) return;
  // existing code to handle other errors
});

Verification

To verify that the fix worked, trigger a config change that requires a gateway restart and check the logs for the Uncaught exception error. If the error is no longer present, the fix was successful.

Extra Tips

  • Make sure to test the fix in a non-production environment before deploying it to production.
  • Consider adding additional logging to help diagnose any future issues related to the reconnect-exhausted error.
  • Review the code to ensure that the lifecycleStopping flag is properly reset after the disconnect is complete.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING