openclaw - ✅(Solved) Fix Slack adapter goes silently dead after channel-stop timeout (post-#56646) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75480Fetched 2026-05-02 05:34:05
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Timeline (top)
closed ×1commented ×1cross-referenced ×1

After a channel stop exceeded 5000ms after abort; continuing shutdown log line, the Slack provider stops logging anything for hours and stops dispatching inbound events. The gateway process stays alive, /health returns OK, but Slack messages to the bot are silently dropped. The only known recovery is a manual gateway restart.

This is a distinct failure mode from #56508 — PR #56646 closed the clean-stop race by setting shuttingDown=true before app.stop(), but the forced-shutdown branch (when the abort itself times out at 5s) still leaves the adapter in a half-dead state where the dying retry timer fires once more and the new adapter never starts.

Error Message

2026-04-30T19:14:09 [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms!

  • The new adapter is not instantiated — there is no subsequent socket mode connected, no starting provider, no error of any kind

Root Cause

Net result: state is "old adapter half-stopped, new adapter never created". The gateway looks healthy because the gateway is healthy — only the slack subsystem is silently dead.

Fix Action

Fix / Workaround

After a channel stop exceeded 5000ms after abort; continuing shutdown log line, the Slack provider stops logging anything for hours and stops dispatching inbound events. The gateway process stays alive, /health returns OK, but Slack messages to the bot are silently dropped. The only known recovery is a manual gateway restart.

Workarounds in place locally

PR fix notes

PR #72912: Recover channel restarts when old lifecycles wedge

Description (problem / solution / changelog)

The gateway health monitor already notices unhealthy channel lifecycles and tries to restart them, but the restart path had a blind spot: if the old channel task ignored abort and failed to settle within the stop grace window, stopChannel logged the timeout and left the stale task registered. The immediately-following startChannel then saw the existing task and no-oped, so the monitor could say it was restarting without actually creating a fresh Discord or Telegram lifecycle.

This adds an explicit forced-retirement path for health-monitor recovery. Normal manual stops keep the existing conservative behavior. Health-monitor restarts can now retire the wedged lifecycle from manager bookkeeping, clear stale busy/connected state, and start a new channel account. Runtime status updates, catch handlers, cleanup handlers, and auto-restart logic are now lifecycle-gated so a late callback from the retired task cannot mark the fresh lifecycle stopped or restart behind its back.

Changed files

  • src/gateway/channel-health-monitor.test.ts (modified, +20/-4)
  • src/gateway/channel-health-monitor.ts (modified, +3/-1)
  • src/gateway/server-channels.approval-bootstrap.test.ts (modified, +45/-0)
  • src/gateway/server-channels.test.ts (modified, +79/-0)
  • src/gateway/server-channels.ts (modified, +94/-11)
  • src/infra/channel-runtime-context.test.ts (modified, +23/-0)
  • src/infra/channel-runtime-context.ts (modified, +11/-3)
  • src/infra/exec-approval-channel-runtime.test.ts (modified, +29/-0)
  • src/infra/exec-approval-channel-runtime.ts (modified, +35/-7)

Code Example

2026-04-30T19:14:09 [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms!
2026-04-30T19:14:13 [slack] socket disconnected (disconnect). retry 1/12 in 2s
2026-04-30T19:14:26 [slack] socket mode failed to start. retry 2/12 in 4s (undefined)
2026-04-30T19:14:41 [slack] socket mode failed to start. retry 3/12 in 7s (undefined)
2026-04-30T19:14:59 [slack] socket mode failed to start. retry 4/12 in 13s (undefined)
2026-04-30T19:15:15 [health-monitor] [slack:default] health-monitor: restarting (reason: disconnected)
2026-04-30T19:15:20 [slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
2026-04-30T19:15:22 [slack] socket mode failed to start. retry 5/12 in 21s (undefined)
[next "[slack] socket mode connected" entry: 2026-05-01T08:49:2813 hours later, only after manual `launchctl kickstart -k`]
RAW_BUFFERClick to expand / collapse

Summary

After a channel stop exceeded 5000ms after abort; continuing shutdown log line, the Slack provider stops logging anything for hours and stops dispatching inbound events. The gateway process stays alive, /health returns OK, but Slack messages to the bot are silently dropped. The only known recovery is a manual gateway restart.

This is a distinct failure mode from #56508 — PR #56646 closed the clean-stop race by setting shuttingDown=true before app.stop(), but the forced-shutdown branch (when the abort itself times out at 5s) still leaves the adapter in a half-dead state where the dying retry timer fires once more and the new adapter never starts.

Environment

  • OpenClaw 2026.4.26 (be8c246)
  • macOS 14 (arm64), Node v25.9.0
  • Bundled @slack/[email protected]

Reproduction (naturally occurring)

  1. Pong-timeout cycle starts (common during the ~30-min stale-socket cadence on macOS / Node 25, see #61072)
  2. Adapter starts its internal retry loop: retry 1/12 in 2s, 2/12, 3/12, 4/12 …
  3. Health-monitor decides the socket is disconnected and calls stopChannelgracefulStop
  4. Graceful stop times out: [slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
  5. The dying adapter's retry timer still fires one more time (e.g. retry 5/12 in 21s (undefined))
  6. No further [slack] log entries for hours. New adapter is never observed. /health remains green.

Verbatim log trace

2026-04-30T19:14:09 [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms!
2026-04-30T19:14:13 [slack] socket disconnected (disconnect). retry 1/12 in 2s
2026-04-30T19:14:26 [slack] socket mode failed to start. retry 2/12 in 4s (undefined)
2026-04-30T19:14:41 [slack] socket mode failed to start. retry 3/12 in 7s (undefined)
2026-04-30T19:14:59 [slack] socket mode failed to start. retry 4/12 in 13s (undefined)
2026-04-30T19:15:15 [health-monitor] [slack:default] health-monitor: restarting (reason: disconnected)
2026-04-30T19:15:20 [slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
2026-04-30T19:15:22 [slack] socket mode failed to start. retry 5/12 in 21s (undefined)
[next "[slack] socket mode connected" entry: 2026-05-01T08:49:28 — 13 hours later, only after manual `launchctl kickstart -k`]

Diagnosis

After the abort-timeout forces shutdown to "continue":

  • The dying adapter's retry timer is not cleared — it fires once more (retry 5/12)
  • After that single fire, shuttingDown=true is presumably set, so the close handler does not schedule another retry
  • The new adapter is not instantiated — there is no subsequent socket mode connected, no starting provider, no error of any kind

Net result: state is "old adapter half-stopped, new adapter never created". The gateway looks healthy because the gateway is healthy — only the slack subsystem is silently dead.

Suggested fixes

  • When the abort-timeout fires and shutdown is forced to continue, explicitly cancel the dying adapter's retry timer so it cannot fire after shuttingDown=true — the retry 5/12 line is the last thing before silence
  • Guarantee that a new adapter is instantiated even when stop() errored or timed out, not only when it returned cleanly
  • Add an internal watchdog: if stopChannel succeeds but no socket mode connected arrives within N seconds, force a full restart from the gateway side rather than silently leaving the channel offline

Workarounds in place locally

  • gateway.channelStaleEventThresholdMinutes 30 → 120 (per #61072) to reduce how often health-monitor triggers stopChannel, which is when the bug fires
  • gateway.channelMaxRestartsPerHour 10 → 3 (defensive)
  • Local hourly diagnostic alert: "no [slack] socket mode connected log for >2 hours" — would have caught the 13-hour silence in our case within ~1 hour, instead of users noticing later

Related

  • #56508 (closed) — sibling failure mode in the clean-stop path
  • PR #56646 (merged) — partial fix; covers gracefulStop() clean path only
  • #61072 (open) — periodic stale-socket restarts on macOS arm64 / Node 25, the trigger
  • #58519 (open) — event-loop starvation, root cause of the pong-timeouts that start the cycle

extent analysis

TL;DR

Explicitly cancel the dying adapter's retry timer when shutdown is forced to continue to prevent it from firing after shuttingDown=true, and guarantee a new adapter is instantiated even when stop() errors or times out.

Guidance

  • When the abort-timeout fires and shutdown is forced to continue, cancel the dying adapter's retry timer to prevent further retries.
  • Ensure a new adapter is instantiated after a failed or timed-out stop() call to maintain connectivity.
  • Consider adding an internal watchdog to detect and recover from silent failures, such as when stopChannel succeeds but no socket mode connected event arrives within a specified time frame.
  • Review and adjust configuration settings, like gateway.channelStaleEventThresholdMinutes and gateway.channelMaxRestartsPerHour, to mitigate the frequency of the bug trigger.

Example

No code snippet is provided as the issue does not contain explicit code references that can be safely used for an example.

Notes

The provided suggestions are based on the diagnosis that the dying adapter's retry timer is not cleared and the new adapter is not instantiated after a forced shutdown. Implementing these fixes should help resolve the silent failure mode described.

Recommendation

Apply the suggested fixes to explicitly cancel the retry timer and guarantee a new adapter instantiation to address the root cause of the issue. This approach should help prevent the silent failure mode and ensure continuous Slack message processing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Slack adapter goes silently dead after channel-stop timeout (post-#56646) [1 pull requests, 1 comments, 2 participants]