openclaw - ✅(Solved) Fix Bug: channel stop timeout leaves channel permanently dead — running: true with stale store entries [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70024Fetched 2026-04-23 07:30:18
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

When stopChannel times out waiting for a channel task to settle, it sets running: true in the runtime snapshot without cleaning up store.aborts or store.tasks. Three code paths then combine to make the channel silently dead with no automatic recovery.

Error Message

if (!stoppedCleanly) { setRuntime(channelId, id, { running: true, // ← lie: channel is actually dead restartPending: false, lastError: channel stop timed out ..., }); return; // ← skips cleanup } // normal path (586-593): store.aborts.delete(id); store.tasks.delete(id); setRuntime(channelId, id, { running: false, ... });

Root Cause

Root Cause — Three-Link Chain

Fix Action

Fixed

PR fix notes

PR #70056: fix(gateway): clean up store and set running=false on stop timeout

Description (problem / solution / changelog)

Summary

  • stopChannel timeout path set running: true and skipped store.aborts/store.tasks cleanup, leaving a dead promise that blocked all future starts and fooled the health monitor
  • Fix: set running: false, clean up store entries, and add lastStopAt in the timeout branch so subsequent startChannel calls and health recovery work correctly
  • Update existing test to verify cleanup + restart succeeds after timeout

Test plan

  • server-channels.test.ts: updated test verifies running: false after timeout and successful restart (18/18 pass)
  • channel-health-monitor.test.ts + channel-health-policy.test.ts: no regression (46/46 pass)
  • Full changed-lane suite: 2560/2560 pass, lint + typecheck clean

Closes #70024

[AI-assisted]

Changed files

  • src/gateway/server-channels.test.ts (modified, +14/-8)
  • src/gateway/server-channels.ts (modified, +4/-1)

Code Example

if (!stoppedCleanly) {
  setRuntime(channelId, id, {
    running: true,          // ← lie: channel is actually dead
    restartPending: false,
    lastError: `channel stop timed out ...`,
  });
  return;                   // ← skips cleanup
}
// normal path (586-593):
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, ... });

---

if (store.tasks.has(id)) {
  return;  // ← dead promise still in map → silently skipped
}

---

if (!snapshot.running) {
  return { healthy: false, reason: "not-running" };
}
// running: true → skips this, assumes healthy

---

if (now - record.lastRestartAt <= cooldownMs) {
  continue;  // cooldown window blocks retry
}

---

if (!stoppedCleanly) {
   log.warn?.(
     `[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`,
   );
+  store.aborts.delete(id);
+  store.tasks.delete(id);
   setRuntime(channelId, id, {
     accountId: id,
-    running: true,
+    running: false,
     restartPending: false,
+    lastStopAt: Date.now(),
     lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
   });
   return;
 }
RAW_BUFFERClick to expand / collapse

Description

When stopChannel times out waiting for a channel task to settle, it sets running: true in the runtime snapshot without cleaning up store.aborts or store.tasks. Three code paths then combine to make the channel silently dead with no automatic recovery.

Root Cause — Three-Link Chain

Link 1: stop timeout lies about state and skips cleanup

src/gateway/server-channels.ts:574-584

if (!stoppedCleanly) {
  setRuntime(channelId, id, {
    running: true,          // ← lie: channel is actually dead
    restartPending: false,
    lastError: `channel stop timed out ...`,
  });
  return;                   // ← skips cleanup
}
// normal path (586-593):
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, ... });

The timeout path sets running: true and returns early, leaving the dead promise in store.tasks and the stale AbortController in store.aborts.

Link 2: startChannel blocked by stale store entry

src/gateway/server-channels.ts:306

if (store.tasks.has(id)) {
  return;  // ← dead promise still in map → silently skipped
}

Map.has() only checks key existence, not promise state. The dead promise from Link 1 blocks all future starts permanently.

Link 3: health monitor fooled by running: true

src/gateway/channel-health-policy.ts:80:

if (!snapshot.running) {
  return { healthy: false, reason: "not-running" };
}
// running: true → skips this, assumes healthy

src/gateway/channel-health-monitor.ts:148:

if (now - record.lastRestartAt <= cooldownMs) {
  continue;  // cooldown window blocks retry
}

Since running is true, the health monitor never flags the channel as unhealthy. Even if it did, cooldown would suppress restart attempts.

Combined Effect

Channel dies → system thinks it's alive → no automatic recovery. Silent permanent death.

Steps to Reproduce

  1. Start a channel (e.g. Telegram, Discord)
  2. Trigger a stopChannel where the underlying task does not settle within 5 seconds (CHANNEL_STOP_ABORT_TIMEOUT_MS)
  3. Observe runtime snapshot: running: true, restartPending: false
  4. Observe store.tasks still contains the dead promise
  5. Attempt startChannel → silently skipped due to store.tasks.has(id)
  6. Health monitor never detects the channel as unhealthy

Expected Behavior

  • Stop timeout should set running: false
  • Stop timeout should still clean up store.aborts and store.tasks
  • Subsequent startChannel calls should succeed
  • Health monitor should detect and recover the channel

Suggested Fix

In the timeout branch (server-channels.ts:574-584):

  1. Set running: false instead of true
  2. Clean up store.aborts.delete(id) and store.tasks.delete(id) before returning
 if (!stoppedCleanly) {
   log.warn?.(
     `[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`,
   );
+  store.aborts.delete(id);
+  store.tasks.delete(id);
   setRuntime(channelId, id, {
     accountId: id,
-    running: true,
+    running: false,
     restartPending: false,
+    lastStopAt: Date.now(),
     lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
   });
   return;
 }

Environment

  • OpenClaw: latest main (fd2c883673)
  • Channels affected: all (Telegram, Discord, Slack, etc.)
  • Node: 22+

[AI-assisted]

extent analysis

TL;DR

The most likely fix is to update the stopChannel timeout branch to set running: false and clean up store.aborts and store.tasks to prevent silent channel death.

Guidance

  • Review the server-channels.ts file and update the timeout branch to set running: false instead of true to accurately reflect the channel's state.
  • Add cleanup code to remove the dead promise from store.tasks and the stale AbortController from store.aborts before returning from the timeout branch.
  • Verify that the health monitor correctly detects and recovers the channel after applying the fix by checking the channel's runtime snapshot and health monitor logs.
  • Test the fix by reproducing the steps to reproduce and confirming that the channel can be started successfully after a stop timeout.

Example

The suggested fix provides a code snippet that demonstrates the necessary changes:

 if (!stoppedCleanly) {
   log.warn?.(
     `[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`,
   );
+  store.aborts.delete(id);
+  store.tasks.delete(id);
   setRuntime(channelId, id, {
     accountId: id,
-    running: true,
+    running: false,
     restartPending: false,
+    lastStopAt: Date.now(),
     lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
   });
   return;
 }

Notes

This fix assumes that the store.aborts and store.tasks cleanup is sufficient to prevent silent channel death. Additional logging or monitoring may be necessary to ensure the fix is effective in all scenarios.

Recommendation

Apply the suggested fix to update the stopChannel timeout branch and clean up store.aborts and store.tasks to prevent silent channel death, as it directly addresses the root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Bug: channel stop timeout leaves channel permanently dead — running: true with stale store entries [1 pull requests, 1 participants]