openclaw - 💡(How to fix) Fix Channel-plugin WebSocket errors crash the entire gateway process (Slack socket-mode example)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A Slack socket-mode WebSocket error caused the entire openclaw-gateway process to exit with code 1. A non-critical channel plugin took down the LLM proxy and every other channel.

Error Message

A Slack socket-mode WebSocket error caused the entire openclaw-gateway process to exit with code 1. A non-critical channel plugin took down the LLM proxy and every other channel. [ERROR] socket-mode:SlackWebSocket:98 WebSocket error occurred: [ERROR] socket-mode:SocketModeClient:85 WebSocket error! Error [slack] socket disconnected (error). retry 1/12 in 2s Channel plugins should be isolated from the main gateway process. An unhandled error or socket disconnect from any single channel should NOT trigger process.exit(1) on the gateway. The Slack plugin's own retry 1/12 in 2s indicates it expects to recover; it never gets the chance.

  1. Wrap channel-plugin event handlers in process-level error boundaries (process.on('uncaughtException') / unhandledRejection with channel-scope filtering).
  2. Audit other channel plugins (Discord, Telegram, BlueBubbles, Signal, WhatsApp, etc.) for similar exit-on-error paths.

Root Cause

  • ~45-second outage of the LLM proxy and all other channel plugins (Discord, Telegram, Bluesky, Signal, etc.) for a fault in one channel.
  • Systemd auto-restart works but ~45s warmup window means user-visible failures during that period.
  • In our incident, the 45s gateway outage was amplified to a 10+ minute user lockout because retries during gateway warmup tripped our downstream circuit breaker (we have a local workaround for that part).

Fix Action

Fix / Workaround

  • ~45-second outage of the LLM proxy and all other channel plugins (Discord, Telegram, Bluesky, Signal, etc.) for a fault in one channel.
  • Systemd auto-restart works but ~45s warmup window means user-visible failures during that period.
  • In our incident, the 45s gateway outage was amplified to a 10+ minute user lockout because retries during gateway warmup tripped our downstream circuit breaker (we have a local workaround for that part).

Workaround we are running locally

Code Example

[ERROR] socket-mode:SlackWebSocket:98 WebSocket error occurred:
[ERROR] socket-mode:SocketModeClient:85 WebSocket error! Error
[slack] socket disconnected (error). retry 1/12 in 2s
openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE
RAW_BUFFERClick to expand / collapse

Summary

A Slack socket-mode WebSocket error caused the entire openclaw-gateway process to exit with code 1. A non-critical channel plugin took down the LLM proxy and every other channel.

Version

[email protected] on Ubuntu 24.04 LTS (Node 22.22.1).

Reproduction

Not deterministic, but the trigger is any sustained Slack SocketModeClient connection failure. In our case, five pong-timeout warnings over 10 minutes (15:50 → 16:00 UTC) culminated in:

[ERROR] socket-mode:SlackWebSocket:98 WebSocket error occurred:
[ERROR] socket-mode:SocketModeClient:85 WebSocket error! Error
[slack] socket disconnected (error). retry 1/12 in 2s
openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE

The Slack plugin's retry logic logs retry 1/12 in 2s but the process exits before retry can happen.

Impact

  • ~45-second outage of the LLM proxy and all other channel plugins (Discord, Telegram, Bluesky, Signal, etc.) for a fault in one channel.
  • Systemd auto-restart works but ~45s warmup window means user-visible failures during that period.
  • In our incident, the 45s gateway outage was amplified to a 10+ minute user lockout because retries during gateway warmup tripped our downstream circuit breaker (we have a local workaround for that part).

Expected behavior

Channel plugins should be isolated from the main gateway process. An unhandled error or socket disconnect from any single channel should NOT trigger process.exit(1) on the gateway. The Slack plugin's own retry 1/12 in 2s indicates it expects to recover; it never gets the chance.

Proposed fix direction

  1. Wrap channel-plugin event handlers in process-level error boundaries (process.on('uncaughtException') / unhandledRejection with channel-scope filtering).
  2. Treat channel plugins as sidecars: if a channel's socket dies, retry that channel plugin in isolation; leave the main process untouched.
  3. Audit other channel plugins (Discord, Telegram, BlueBubbles, Signal, WhatsApp, etc.) for similar exit-on-error paths.

Workaround we are running locally

  • Tightened systemd RestartSec=1 and StartLimitBurst=10 to shrink recovery time.
  • Added flap-detection cron emitting PostHog telemetry.
  • Added downstream filtering so transient ECONNREFUSED against the warming gateway does not trip our application-level circuit breaker.

None of these address the underlying crash. Filing this so isolation can be considered upstream.

Thanks!

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Channel plugins should be isolated from the main gateway process. An unhandled error or socket disconnect from any single channel should NOT trigger process.exit(1) on the gateway. The Slack plugin's own retry 1/12 in 2s indicates it expects to recover; it never gets the chance.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Channel-plugin WebSocket errors crash the entire gateway process (Slack socket-mode example)