openclaw - ✅(Solved) Fix Bug: SafeGatewayPlugin crashes on code 1005 when abortSignal fires (maxAttempts=0 sentinel collision) [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56833Fetched 2026-04-08 01:47:15
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Timeline (top)
cross-referenced ×2commented ×1referenced ×1

When abortSignal fires, onAbort() sets gateway.options.reconnect = { maxAttempts: 0 } as a signal to stop reconnecting, then calls gateway.disconnect(). This triggers handleClose(1005)handleReconnectionAttempt, where the guard this.reconnectAttempts >= maxAttempts immediately evaluates to 0 >= 0 = true, causing an error to be emitted and this.monitor.destroy() to be called — crashing the process instead of exiting cleanly.

Error Message

When abortSignal fires, onAbort() sets gateway.options.reconnect = { maxAttempts: 0 } as a signal to stop reconnecting, then calls gateway.disconnect(). This triggers handleClose(1005)handleReconnectionAttempt, where the guard this.reconnectAttempts >= maxAttempts immediately evaluates to 0 >= 0 = true, causing an error to be emitted and this.monitor.destroy() to be called — crashing the process instead of exiting cleanly. 3. Observe crash: Error: Max reconnect attempts (0) reached after code 1005 this.emitter.emit("error", new Error(Max reconnect attempts (${maxAttempts}) reached...)); The problem: maxAttempts: 0 is used as an intentional-abort sentinel, but the >= check in handleReconnectionAttempt cannot distinguish this from "retry count exhausted" — it fires the error path immediately at reconnectAttempts = 0. When abortSignal fires, the gateway should disconnect silently/cleanly with no error emitted. error event is emitted with Max reconnect attempts (0) reached after code 1005, and this.monitor.destroy() is called — crashing the process or causing unhandled error listeners to fire. Guard maxAttempts === 0 as a clean intentional stop before the error path: this.emitter.emit("error", new Error(Max reconnect attempts (${maxAttempts}) reached...));

Root Cause

In provider-CAlWEl41.js (dist), handleReconnectionAttempt reads:

handleReconnectionAttempt(options) {
  const { maxAttempts = 5, baseDelay = 1e3, maxDelay = 3e4 } = this.options.reconnect ?? {};
  if (this.reconnectAttempts >= maxAttempts) {   // 0 >= 0 → true immediately
    this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
    this.monitor.destroy();
    return;
  }
  // ...
}

And onAbort in the monitor lifecycle:

const onAbort = () => {
  lifecycleStopping = true;
  gateway.options.reconnect = { maxAttempts: 0 };  // sentinel: "stop reconnecting"
  gateway.disconnect();                             // triggers handleClose(1005)
};

The problem: maxAttempts: 0 is used as an intentional-abort sentinel, but the >= check in handleReconnectionAttempt cannot distinguish this from "retry count exhausted" — it fires the error path immediately at reconnectAttempts = 0.

Fix Action

Workaround

Patched the compiled dist file directly with the guard above until an upstream fix is available.

PR fix notes

PR #56840: fix(discord): use listener-stripping disconnect in abort handler to prevent crash

Description (problem / solution / changelog)

Summary

Fix SafeGatewayPlugin crash when abortSignal fires — the process crashes with Error: Max reconnect attempts (0) reached after code 1005 instead of exiting cleanly.

Problem

When abortSignal fires, onAbort() in provider.lifecycle.reconnect.ts sets maxAttempts=0 as a sentinel to disable reconnection, then calls gateway.disconnect(). Carbon's internal reconnection handler evaluates this.reconnectAttempts >= maxAttempts0 >= 0 = true, emitting a crash-level error.

This error races the lifecycle's finally block which sets lifecycleStopping = true. Since lifecycleStopping is still false when the error is drained, handleGatewayEvent does not suppress it, and drainPendingGatewayErrors throws — crashing the monitor.

Fix

Replace the maxAttempts=0 + disconnect() pattern with the existing disconnectGatewaySocketWithoutAutoReconnect() helper. This function strips Carbon's close/error listeners from the WebSocket before closing it, preventing the reconnection handler from firing at all. The gateway shuts down cleanly without the 0 >= 0 sentinel collision.

Changes

  • extensions/discord/src/monitor/provider.lifecycle.reconnect.ts: replace maxAttempts=0 sentinel in onAbort() with disconnectGatewaySocketWithoutAutoReconnect()

Fixes #56833

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (modified, +7/-2)

PR #56164: fix(discord): suppress shutdown errors with persistent listener to prevent crash

Description (problem / solution / changelog)

Summary

Fixes #55116 Fixes #55421 Fixes #56137 Fixes #56644 Fixes #56833

This bug has now been reported four times independently (#55116, #55421, #56137, #56644), confirming its widespread impact across different environments and OpenClaw versions.

When the gateway health-monitor detects a stale socket, it calls stopChannel() then startChannel(). The abort signal fires onAbort(), which sets maxAttempts: 0 and calls gateway.disconnect(). The WebSocket close event fires asynchronously, and Carbon's handleReconnectionAttempt emits an 'error' event when it sees reconnectAttempts (0) >= maxAttempts (0).

Root Cause

The previous code used gatewayEmitter.once('error', noop) to absorb this error. Problem: if another error fires concurrently before the reconnect error, the once listener is consumed, leaving the reconnect error unhandled. Node.js converts an unhandled 'error' emit into an uncaught exception, crashing the entire gateway process.

This explains the crash-loop: the health monitor restarts the channel, which triggers the same code path again.

Fix

Replace the once listener with a tracked persistent on listener (suppressShutdownError) that is:

  • Armed in onAbort() before calling gateway.disconnect()
  • Removed in the finally block (always, regardless of how the lifecycle exits)

This guarantees all errors emitted during shutdown are suppressed regardless of ordering, and the listener is always cleaned up.

Testing

Added 2 regression tests in provider.lifecycle.test.ts:

  1. Verifies the suppression listener is properly cleaned up after lifecycle exits
  2. Verifies both a concurrent unrelated error AND the reconnect error are suppressed

All 13 tests pass.

Changed files

  • src/discord/monitor/provider.lifecycle.test.ts (modified, +83/-0)
  • src/discord/monitor/provider.lifecycle.ts (modified, +19/-1)

Code Example

handleReconnectionAttempt(options) {
  const { maxAttempts = 5, baseDelay = 1e3, maxDelay = 3e4 } = this.options.reconnect ?? {};
  if (this.reconnectAttempts >= maxAttempts) {   // 0 >= 0 → true immediately
    this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
    this.monitor.destroy();
    return;
  }
  // ...
}

---

const onAbort = () => {
  lifecycleStopping = true;
  gateway.options.reconnect = { maxAttempts: 0 };  // sentinel: "stop reconnecting"
  gateway.disconnect();                             // triggers handleClose(1005)
};

---

handleReconnectionAttempt(options) {
  const { maxAttempts = 5, baseDelay = 1e3, maxDelay = 3e4 } = this.options.reconnect ?? {};
  // maxAttempts=0 is set by onAbort() as an intentional stop signal — exit silently
  if (maxAttempts === 0) {
    this.emitter.emit("debug", "Reconnection suppressed: intentional abort (maxAttempts=0)");
    return;
  }
  if (this.reconnectAttempts >= maxAttempts) {
    this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
    this.monitor.destroy();
    return;
  }
  // ...
}
RAW_BUFFERClick to expand / collapse

Summary

When abortSignal fires, onAbort() sets gateway.options.reconnect = { maxAttempts: 0 } as a signal to stop reconnecting, then calls gateway.disconnect(). This triggers handleClose(1005)handleReconnectionAttempt, where the guard this.reconnectAttempts >= maxAttempts immediately evaluates to 0 >= 0 = true, causing an error to be emitted and this.monitor.destroy() to be called — crashing the process instead of exiting cleanly.

Reproduction

  1. Provide an abortSignal to the gateway/monitor task
  2. Trigger the abort (e.g., task timeout, external abort)
  3. Observe crash: Error: Max reconnect attempts (0) reached after code 1005

Root Cause

In provider-CAlWEl41.js (dist), handleReconnectionAttempt reads:

handleReconnectionAttempt(options) {
  const { maxAttempts = 5, baseDelay = 1e3, maxDelay = 3e4 } = this.options.reconnect ?? {};
  if (this.reconnectAttempts >= maxAttempts) {   // 0 >= 0 → true immediately
    this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
    this.monitor.destroy();
    return;
  }
  // ...
}

And onAbort in the monitor lifecycle:

const onAbort = () => {
  lifecycleStopping = true;
  gateway.options.reconnect = { maxAttempts: 0 };  // sentinel: "stop reconnecting"
  gateway.disconnect();                             // triggers handleClose(1005)
};

The problem: maxAttempts: 0 is used as an intentional-abort sentinel, but the >= check in handleReconnectionAttempt cannot distinguish this from "retry count exhausted" — it fires the error path immediately at reconnectAttempts = 0.

Expected Behaviour

When abortSignal fires, the gateway should disconnect silently/cleanly with no error emitted.

Actual Behaviour

error event is emitted with Max reconnect attempts (0) reached after code 1005, and this.monitor.destroy() is called — crashing the process or causing unhandled error listeners to fire.

Suggested Fix

Guard maxAttempts === 0 as a clean intentional stop before the error path:

handleReconnectionAttempt(options) {
  const { maxAttempts = 5, baseDelay = 1e3, maxDelay = 3e4 } = this.options.reconnect ?? {};
  // maxAttempts=0 is set by onAbort() as an intentional stop signal — exit silently
  if (maxAttempts === 0) {
    this.emitter.emit("debug", "Reconnection suppressed: intentional abort (maxAttempts=0)");
    return;
  }
  if (this.reconnectAttempts >= maxAttempts) {
    this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
    this.monitor.destroy();
    return;
  }
  // ...
}

Alternatively, use a dedicated sentinel (e.g., null or a symbol) in onAbort instead of { maxAttempts: 0 } to remove the ambiguity entirely.

Environment

  • openclaw version: 2026.3.24
  • Node.js: checked via global npm install
  • Platform: Windows

Workaround

Patched the compiled dist file directly with the guard above until an upstream fix is available.

extent analysis

Fix Plan

To fix the issue, update the handleReconnectionAttempt function to guard against maxAttempts === 0 as a clean intentional stop signal. Here are the steps:

  • Update the handleReconnectionAttempt function with the following code:
handleReconnectionAttempt(options) {
  const { maxAttempts = 5, baseDelay = 1e3, maxDelay = 3e4 } = this.options.reconnect ?? {};
  // maxAttempts=0 is set by onAbort() as an intentional stop signal — exit silently
  if (maxAttempts === 0) {
    this.emitter.emit("debug", "Reconnection suppressed: intentional abort (maxAttempts=0)");
    return;
  }
  if (this.reconnectAttempts >= maxAttempts) {
    this.emitter.emit("error", new Error(`Max reconnect attempts (${maxAttempts}) reached...`));
    this.monitor.destroy();
    return;
  }
  // ...
}

Alternatively, consider using a dedicated sentinel (e.g., null or a symbol) in onAbort instead of { maxAttempts: 0 } to remove the ambiguity entirely.

Verification

To verify the fix, trigger the abort signal and check that the gateway disconnects silently without emitting an error event. You can do this by:

  • Providing an abortSignal to the gateway/monitor task
  • Triggering the abort (e.g., task timeout, external abort)
  • Observing that no error event is emitted and the process does not crash

Extra Tips

  • Consider updating the onAbort function to use a dedicated sentinel instead of { maxAttempts: 0 } to avoid similar issues in the future.
  • Make sure to test the fix thoroughly to ensure it works as expected in different scenarios.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Bug: SafeGatewayPlugin crashes on code 1005 when abortSignal fires (maxAttempts=0 sentinel collision) [2 pull requests, 1 comments, 2 participants]