openclaw - 💡(How to fix) Fix Slack Socket Mode: event loop starvation causes pong timeouts and silent message loss [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58519Fetched 2026-04-08 02:01:43
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Error Message

  • Symptom: Consecutive [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms! — we see runs of 4+ sequential timeouts during heavy processing 1:38:45 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received... 1:39:00 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received... 1:39:09 PM [WARN] diagnostic stuck session: state=processing age=123s queueDepth=1 1:39:18 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received... 1:39:41 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...

Root Cause

All processing — agent turns, tool dispatch, sub-agent orchestration, JSON parsing, context assembly, and WebSocket keepalive — runs on the same single Node.js event loop thread. When heavy turns monopolize the event loop, the Slack SDK's ping/pong handler can't fire within its 5000ms deadline.

This isn't a Slack server-side issue (as in #14248) — it's the gateway's event loop being too busy to service the pings at all.

Fix Action

Fix / Workaround

All processing — agent turns, tool dispatch, sub-agent orchestration, JSON parsing, context assembly, and WebSocket keepalive — runs on the same single Node.js event loop thread. When heavy turns monopolize the event loop, the Slack SDK's ping/pong handler can't fire within its 5000ms deadline.

Code Example

1:38:45 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:00 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:09 PM [WARN] diagnostic  stuck session: state=processing age=123s queueDepth=1
1:39:18 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:41 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
RAW_BUFFERClick to expand / collapse

Problem

Slack Socket Mode WebSocket pong responses are missed when the Node.js event loop is busy processing agent turns. We empirically observe 4+ consecutive pong timeouts (5000ms deadline) during heavy turns, which causes Slack to drop the WebSocket connection. Messages sent by the gateway during or just after a dead socket window are silently lost — the gateway logs a successful delivery, but Slack never receives it.

What we observe

  • Environment: OpenClaw 2026.3.28, single gateway process
  • Pattern: Heavy agent turns (tool calls, sub-agent orchestration, large context assembly) peg the event loop for 2-4+ minutes
  • Symptom: Consecutive [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms! — we see runs of 4+ sequential timeouts during heavy processing
  • Impact: The gateway logs delivered reply for messages it sent to a stale/dying socket. The user never receives them. From the user's perspective, the bot goes silent.

Gateway diagnostic logs show the stuck session pattern alongside the pong timeouts:

1:38:45 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:00 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:09 PM [WARN] diagnostic  stuck session: state=processing age=123s queueDepth=1
1:39:18 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:41 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...

Root cause

All processing — agent turns, tool dispatch, sub-agent orchestration, JSON parsing, context assembly, and WebSocket keepalive — runs on the same single Node.js event loop thread. When heavy turns monopolize the event loop, the Slack SDK's ping/pong handler can't fire within its 5000ms deadline.

This isn't a Slack server-side issue (as in #14248) — it's the gateway's event loop being too busy to service the pings at all.

Suggested fix

Move Slack Socket Mode WebSocket keepalive (ping/pong) to a worker_threads thread. The keepalive handler doesn't need access to session state or agent context — it just needs to respond "pong" to Slack's "ping." This is a small, isolated piece of work that would prevent connection drops regardless of how busy the main thread is.

Alternatively (or additionally):

  • Expose clientPingTimeout per #14248 — a higher timeout would reduce false positives, though it doesn't prevent genuine event loop starvation from causing drops
  • Add periodic setImmediate() yields in long-running synchronous processing paths (context assembly, transcript serialization) to give the event loop breathing room

Related

  • #14248 — Expose clientPingTimeout for Slack Socket Mode configuration
  • #56508 — Orphaned ping intervals leak file descriptors
  • #45852 — WebSocket 408 timeout causes unhandled promise rejection

extent analysis

TL;DR

Move the Slack Socket Mode WebSocket keepalive to a separate thread using worker_threads to prevent event loop starvation and connection drops.

Guidance

  • Identify the heavy processing tasks (e.g., agent turns, tool dispatch, sub-agent orchestration) that are causing the event loop to be busy and consider optimizing or parallelizing them.
  • Implement the suggested fix of moving the Slack Socket Mode WebSocket keepalive to a worker_threads thread to ensure timely ping/pong responses.
  • Consider exposing clientPingTimeout to increase the timeout and reduce false positives, but note that this does not prevent genuine event loop starvation.
  • Add periodic setImmediate() yields in long-running synchronous processing paths to give the event loop breathing room and prevent starvation.

Example

const { Worker } = require('worker_threads');

// Create a new worker thread for the keepalive handler
const keepaliveWorker = new Worker('./keepalive.js');

// In keepalive.js
require('slack-sdk').startKeepalive((ping) => {
  // Respond with "pong" to Slack's "ping"
  ping.respond();
});

Notes

This solution assumes that the keepalive handler does not require access to session state or agent context. If it does, an alternative approach may be needed. Additionally, increasing the clientPingTimeout may reduce false positives but does not address the underlying issue of event loop starvation.

Recommendation

Apply the workaround of moving the Slack Socket Mode WebSocket keepalive to a separate thread using worker_threads to prevent connection drops and ensure timely ping/pong responses. This approach addresses the root cause of the issue and provides a reliable solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING