openclaw - ✅(Solved) Fix Slack adapter goes silently dead after channel-stop timeout (post-#56646) [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-01 06:39:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75480•Fetched 2026-05-02 05:34:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jnikolaidis

Participants

clawsweeper[bot]

jnikolaidis

Timeline (top)

closed ×1commented ×1cross-referenced ×1

After a channel stop exceeded 5000ms after abort; continuing shutdown log line, the Slack provider stops logging anything for hours and stops dispatching inbound events. The gateway process stays alive, /health returns OK, but Slack messages to the bot are silently dropped. The only known recovery is a manual gateway restart.

This is a distinct failure mode from #56508 — PR #56646 closed the clean-stop race by setting shuttingDown=true before app.stop(), but the forced-shutdown branch (when the abort itself times out at 5s) still leaves the adapter in a half-dead state where the dying retry timer fires once more and the new adapter never starts.

Error Message

2026-04-30T19:14:09 [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms!

The new adapter is not instantiated — there is no subsequent socket mode connected, no starting provider, no error of any kind

Root Cause

Net result: state is "old adapter half-stopped, new adapter never created". The gateway looks healthy because the gateway is healthy — only the slack subsystem is silently dead.

Fix Action

Fix / Workaround

Workarounds in place locally

PR fix notes

PR #72912: Recover channel restarts when old lifecycles wedge

Repository: openclaw/openclaw
Author: pashpashpash
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/72912

Description (problem / solution / changelog)

The gateway health monitor already notices unhealthy channel lifecycles and tries to restart them, but the restart path had a blind spot: if the old channel task ignored abort and failed to settle within the stop grace window, stopChannel logged the timeout and left the stale task registered. The immediately-following startChannel then saw the existing task and no-oped, so the monitor could say it was restarting without actually creating a fresh Discord or Telegram lifecycle.

This adds an explicit forced-retirement path for health-monitor recovery. Normal manual stops keep the existing conservative behavior. Health-monitor restarts can now retire the wedged lifecycle from manager bookkeeping, clear stale busy/connected state, and start a new channel account. Runtime status updates, catch handlers, cleanup handlers, and auto-restart logic are now lifecycle-gated so a late callback from the retired task cannot mark the fresh lifecycle stopped or restart behind its back.

Changed files

src/gateway/channel-health-monitor.test.ts (modified, +20/-4)
src/gateway/channel-health-monitor.ts (modified, +3/-1)
src/gateway/server-channels.approval-bootstrap.test.ts (modified, +45/-0)
src/gateway/server-channels.test.ts (modified, +79/-0)
src/gateway/server-channels.ts (modified, +94/-11)
src/infra/channel-runtime-context.test.ts (modified, +23/-0)
src/infra/channel-runtime-context.ts (modified, +11/-3)
src/infra/exec-approval-channel-runtime.test.ts (modified, +29/-0)
src/infra/exec-approval-channel-runtime.ts (modified, +35/-7)

Code Example

2026-04-30T19:14:09 [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms!
2026-04-30T19:14:13 [slack] socket disconnected (disconnect). retry 1/12 in 2s
2026-04-30T19:14:26 [slack] socket mode failed to start. retry 2/12 in 4s (undefined)
2026-04-30T19:14:41 [slack] socket mode failed to start. retry 3/12 in 7s (undefined)
2026-04-30T19:14:59 [slack] socket mode failed to start. retry 4/12 in 13s (undefined)
2026-04-30T19:15:15 [health-monitor] [slack:default] health-monitor: restarting (reason: disconnected)
2026-04-30T19:15:20 [slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
2026-04-30T19:15:22 [slack] socket mode failed to start. retry 5/12 in 21s (undefined)
[next "[slack] socket mode connected" entry: 2026-05-01T08:49:28 — 13 hours later, only after manual `launchctl kickstart -k`]

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw 2026.4.26 (be8c246)
macOS 14 (arm64), Node v25.9.0
Bundled @slack/[email protected]

Reproduction (naturally occurring)

Pong-timeout cycle starts (common during the ~30-min stale-socket cadence on macOS / Node 25, see #61072)
Adapter starts its internal retry loop: retry 1/12 in 2s, 2/12, 3/12, 4/12 …
Health-monitor decides the socket is disconnected and calls stopChannel → gracefulStop
Graceful stop times out: [slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
The dying adapter's retry timer still fires one more time (e.g. retry 5/12 in 21s (undefined))
No further [slack] log entries for hours. New adapter is never observed. /health remains green.

Verbatim log trace

2026-04-30T19:14:09 [WARN] socket-mode:SlackWebSocket A pong wasn't received from the server before the timeout of 5000ms!
2026-04-30T19:14:13 [slack] socket disconnected (disconnect). retry 1/12 in 2s
2026-04-30T19:14:26 [slack] socket mode failed to start. retry 2/12 in 4s (undefined)
2026-04-30T19:14:41 [slack] socket mode failed to start. retry 3/12 in 7s (undefined)
2026-04-30T19:14:59 [slack] socket mode failed to start. retry 4/12 in 13s (undefined)
2026-04-30T19:15:15 [health-monitor] [slack:default] health-monitor: restarting (reason: disconnected)
2026-04-30T19:15:20 [slack] [default] channel stop exceeded 5000ms after abort; continuing shutdown
2026-04-30T19:15:22 [slack] socket mode failed to start. retry 5/12 in 21s (undefined)
[next "[slack] socket mode connected" entry: 2026-05-01T08:49:28 — 13 hours later, only after manual `launchctl kickstart -k`]

Diagnosis

After the abort-timeout forces shutdown to "continue":

The dying adapter's retry timer is not cleared — it fires once more (retry 5/12)
After that single fire, shuttingDown=true is presumably set, so the close handler does not schedule another retry
The new adapter is not instantiated — there is no subsequent socket mode connected, no starting provider, no error of any kind

Net result: state is "old adapter half-stopped, new adapter never created". The gateway looks healthy because the gateway is healthy — only the slack subsystem is silently dead.

Suggested fixes

When the abort-timeout fires and shutdown is forced to continue, explicitly cancel the dying adapter's retry timer so it cannot fire after shuttingDown=true — the retry 5/12 line is the last thing before silence
Guarantee that a new adapter is instantiated even when stop() errored or timed out, not only when it returned cleanly
Add an internal watchdog: if stopChannel succeeds but no socket mode connected arrives within N seconds, force a full restart from the gateway side rather than silently leaving the channel offline

Workarounds in place locally

gateway.channelStaleEventThresholdMinutes 30 → 120 (per #61072) to reduce how often health-monitor triggers stopChannel, which is when the bug fires
gateway.channelMaxRestartsPerHour 10 → 3 (defensive)
Local hourly diagnostic alert: "no [slack] socket mode connected log for >2 hours" — would have caught the 13-hour silence in our case within ~1 hour, instead of users noticing later

#56508 (closed) — sibling failure mode in the clean-stop path
PR #56646 (merged) — partial fix; covers gracefulStop() clean path only
#61072 (open) — periodic stale-socket restarts on macOS arm64 / Node 25, the trigger
#58519 (open) — event-loop starvation, root cause of the pong-timeouts that start the cycle

extent analysis

TL;DR

Explicitly cancel the dying adapter's retry timer when shutdown is forced to continue to prevent it from firing after shuttingDown=true, and guarantee a new adapter is instantiated even when stop() errors or times out.

Guidance

When the abort-timeout fires and shutdown is forced to continue, cancel the dying adapter's retry timer to prevent further retries.
Ensure a new adapter is instantiated after a failed or timed-out stop() call to maintain connectivity.
Consider adding an internal watchdog to detect and recover from silent failures, such as when stopChannel succeeds but no socket mode connected event arrives within a specified time frame.
Review and adjust configuration settings, like gateway.channelStaleEventThresholdMinutes and gateway.channelMaxRestartsPerHour, to mitigate the frequency of the bug trigger.

Example

No code snippet is provided as the issue does not contain explicit code references that can be safely used for an example.

Notes

The provided suggestions are based on the diagnosis that the dying adapter's retry timer is not cleared and the new adapter is not instantiated after a forced shutdown. Implementing these fixes should help resolve the silent failure mode described.

Recommendation

Apply the suggested fixes to explicitly cancel the retry timer and guarantee a new adapter instantiation to address the root cause of the issue. This approach should help prevent the silent failure mode and ensure continuous Slack message processing.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#model download #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Slack adapter goes silently dead after channel-stop timeout (post-#56646) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds in place locally

PR fix notes

PR #72912: Recover channel restarts when old lifecycles wedge

Description (problem / solution / changelog)

Changed files

Code Example

Summary

Environment

Reproduction (naturally occurring)

Verbatim log trace

Diagnosis

Suggested fixes

Workarounds in place locally

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Slack adapter goes silently dead after channel-stop timeout (post-#56646) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds in place locally

PR fix notes

PR #72912: Recover channel restarts when old lifecycles wedge

Description (problem / solution / changelog)

Changed files

Code Example

Summary

Environment

Reproduction (naturally occurring)

Verbatim log trace

Diagnosis

Suggested fixes

Workarounds in place locally

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING