openclaw - 💡(How to fix) Fix Slack Socket Mode: event loop starvation causes pong timeouts and silent message loss [1 participants]

openclaw2026-03-31 18:12:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58519•Fetched 2026-04-08 02:01:43

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jrex-jooni

Participants

jrex-jooni

Timeline (top)

cross-referenced ×1

Error Message

Symptom: Consecutive [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms! — we see runs of 4+ sequential timeouts during heavy processing 1:38:45 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received... 1:39:00 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received... 1:39:09 PM [WARN] diagnostic stuck session: state=processing age=123s queueDepth=1 1:39:18 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received... 1:39:41 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...

Root Cause

All processing — agent turns, tool dispatch, sub-agent orchestration, JSON parsing, context assembly, and WebSocket keepalive — runs on the same single Node.js event loop thread. When heavy turns monopolize the event loop, the Slack SDK's ping/pong handler can't fire within its 5000ms deadline.

This isn't a Slack server-side issue (as in #14248) — it's the gateway's event loop being too busy to service the pings at all.

Fix Action

Fix / Workaround

Code Example

1:38:45 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:00 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:09 PM [WARN] diagnostic  stuck session: state=processing age=123s queueDepth=1
1:39:18 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:41 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...

RAW_BUFFERClick to expand / collapse

Problem

Slack Socket Mode WebSocket pong responses are missed when the Node.js event loop is busy processing agent turns. We empirically observe 4+ consecutive pong timeouts (5000ms deadline) during heavy turns, which causes Slack to drop the WebSocket connection. Messages sent by the gateway during or just after a dead socket window are silently lost — the gateway logs a successful delivery, but Slack never receives it.

What we observe

Environment: OpenClaw 2026.3.28, single gateway process
Pattern: Heavy agent turns (tool calls, sub-agent orchestration, large context assembly) peg the event loop for 2-4+ minutes
Symptom: Consecutive [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms! — we see runs of 4+ sequential timeouts during heavy processing
Impact: The gateway logs delivered reply for messages it sent to a stale/dying socket. The user never receives them. From the user's perspective, the bot goes silent.

Gateway diagnostic logs show the stuck session pattern alongside the pong timeouts:

1:38:45 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:00 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:09 PM [WARN] diagnostic  stuck session: state=processing age=123s queueDepth=1
1:39:18 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...
1:39:41 PM [WARN] socket-mode:SlackWebSocket A pong wasn't received...

Root cause

This isn't a Slack server-side issue (as in #14248) — it's the gateway's event loop being too busy to service the pings at all.

Suggested fix

Move Slack Socket Mode WebSocket keepalive (ping/pong) to a worker_threads thread. The keepalive handler doesn't need access to session state or agent context — it just needs to respond "pong" to Slack's "ping." This is a small, isolated piece of work that would prevent connection drops regardless of how busy the main thread is.

Alternatively (or additionally):

Expose clientPingTimeout per #14248 — a higher timeout would reduce false positives, though it doesn't prevent genuine event loop starvation from causing drops
Add periodic setImmediate() yields in long-running synchronous processing paths (context assembly, transcript serialization) to give the event loop breathing room

#14248 — Expose clientPingTimeout for Slack Socket Mode configuration
#56508 — Orphaned ping intervals leak file descriptors
#45852 — WebSocket 408 timeout causes unhandled promise rejection

extent analysis

TL;DR

Move the Slack Socket Mode WebSocket keepalive to a separate thread using worker_threads to prevent event loop starvation and connection drops.

Guidance

Identify the heavy processing tasks (e.g., agent turns, tool dispatch, sub-agent orchestration) that are causing the event loop to be busy and consider optimizing or parallelizing them.
Implement the suggested fix of moving the Slack Socket Mode WebSocket keepalive to a worker_threads thread to ensure timely ping/pong responses.
Consider exposing clientPingTimeout to increase the timeout and reduce false positives, but note that this does not prevent genuine event loop starvation.
Add periodic setImmediate() yields in long-running synchronous processing paths to give the event loop breathing room and prevent starvation.

Example

const { Worker } = require('worker_threads');

// Create a new worker thread for the keepalive handler
const keepaliveWorker = new Worker('./keepalive.js');

// In keepalive.js
require('slack-sdk').startKeepalive((ping) => {
  // Respond with "pong" to Slack's "ping"
  ping.respond();
});

Notes

This solution assumes that the keepalive handler does not require access to session state or agent context. If it does, an alternative approach may be needed. Additionally, increasing the clientPingTimeout may reduce false positives but does not address the underlying issue of event loop starvation.

Recommendation

Apply the workaround of moving the Slack Socket Mode WebSocket keepalive to a separate thread using worker_threads to prevent connection drops and ensure timely ping/pong responses. This approach addresses the root cause of the issue and provides a reliable solution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#search optimization #API routing #API middleware #SSR setup #ISR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Slack Socket Mode: event loop starvation causes pong timeouts and silent message loss [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

What we observe

Root cause

Suggested fix

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Slack Socket Mode: event loop starvation causes pong timeouts and silent message loss [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

What we observe

Root cause

Suggested fix

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING