openclaw - ✅(Solved) Fix Bug: channel stop timeout leaves channel permanently dead — running: true with stale store entries [1 pull requests, 1 participants]

openclaw2026-04-22 06:29:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70024•Fetched 2026-04-23 07:30:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

garnetlyx

Participants

garnetlyx

Timeline (top)

cross-referenced ×1

When stopChannel times out waiting for a channel task to settle, it sets running: true in the runtime snapshot without cleaning up store.aborts or store.tasks. Three code paths then combine to make the channel silently dead with no automatic recovery.

Error Message

if (!stoppedCleanly) { setRuntime(channelId, id, { running: true, // ← lie: channel is actually dead restartPending: false, lastError: channel stop timed out ..., }); return; // ← skips cleanup } // normal path (586-593): store.aborts.delete(id); store.tasks.delete(id); setRuntime(channelId, id, { running: false, ... });

Root Cause

Root Cause — Three-Link Chain

Fix Action

Fixed

Fixed by PR: fix(gateway): clean up store and set running=false on stop timeout (https://github.com/openclaw/openclaw/pull/70056)

PR fix notes

PR #70056: fix(gateway): clean up store and set running=false on stop timeout

Repository: openclaw/openclaw
Author: garnetlyx
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/70056

Description (problem / solution / changelog)

Summary

stopChannel timeout path set running: true and skipped store.aborts/store.tasks cleanup, leaving a dead promise that blocked all future starts and fooled the health monitor
Fix: set running: false, clean up store entries, and add lastStopAt in the timeout branch so subsequent startChannel calls and health recovery work correctly
Update existing test to verify cleanup + restart succeeds after timeout

Test plan

server-channels.test.ts: updated test verifies running: false after timeout and successful restart (18/18 pass)
channel-health-monitor.test.ts + channel-health-policy.test.ts: no regression (46/46 pass)
Full changed-lane suite: 2560/2560 pass, lint + typecheck clean

Closes #70024

[AI-assisted]

Changed files

src/gateway/server-channels.test.ts (modified, +14/-8)
src/gateway/server-channels.ts (modified, +4/-1)

Code Example

if (!stoppedCleanly) {
  setRuntime(channelId, id, {
    running: true,          // ← lie: channel is actually dead
    restartPending: false,
    lastError: `channel stop timed out ...`,
  });
  return;                   // ← skips cleanup
}
// normal path (586-593):
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, ... });

---

if (store.tasks.has(id)) {
  return;  // ← dead promise still in map → silently skipped
}

---

if (!snapshot.running) {
  return { healthy: false, reason: "not-running" };
}
// running: true → skips this, assumes healthy

---

if (now - record.lastRestartAt <= cooldownMs) {
  continue;  // cooldown window blocks retry
}

---

if (!stoppedCleanly) {
   log.warn?.(
     `[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`,
   );
+  store.aborts.delete(id);
+  store.tasks.delete(id);
   setRuntime(channelId, id, {
     accountId: id,
-    running: true,
+    running: false,
     restartPending: false,
+    lastStopAt: Date.now(),
     lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
   });
   return;
 }

RAW_BUFFERClick to expand / collapse

Description

Root Cause — Three-Link Chain

Link 1: stop timeout lies about state and skips cleanup

src/gateway/server-channels.ts:574-584

if (!stoppedCleanly) {
  setRuntime(channelId, id, {
    running: true,          // ← lie: channel is actually dead
    restartPending: false,
    lastError: `channel stop timed out ...`,
  });
  return;                   // ← skips cleanup
}
// normal path (586-593):
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, ... });

The timeout path sets running: true and returns early, leaving the dead promise in store.tasks and the stale AbortController in store.aborts.

Link 2: startChannel blocked by stale store entry

src/gateway/server-channels.ts:306

if (store.tasks.has(id)) {
  return;  // ← dead promise still in map → silently skipped
}

Map.has() only checks key existence, not promise state. The dead promise from Link 1 blocks all future starts permanently.

Link 3: health monitor fooled by `running: true`

src/gateway/channel-health-policy.ts:80:

if (!snapshot.running) {
  return { healthy: false, reason: "not-running" };
}
// running: true → skips this, assumes healthy

src/gateway/channel-health-monitor.ts:148:

if (now - record.lastRestartAt <= cooldownMs) {
  continue;  // cooldown window blocks retry
}

Since running is true, the health monitor never flags the channel as unhealthy. Even if it did, cooldown would suppress restart attempts.

Combined Effect

Channel dies → system thinks it's alive → no automatic recovery. Silent permanent death.

Steps to Reproduce

Start a channel (e.g. Telegram, Discord)
Trigger a stopChannel where the underlying task does not settle within 5 seconds (CHANNEL_STOP_ABORT_TIMEOUT_MS)
Observe runtime snapshot: running: true, restartPending: false
Observe store.tasks still contains the dead promise
Attempt startChannel → silently skipped due to store.tasks.has(id)
Health monitor never detects the channel as unhealthy

Expected Behavior

Stop timeout should set running: false
Stop timeout should still clean up store.aborts and store.tasks
Subsequent startChannel calls should succeed
Health monitor should detect and recover the channel

Suggested Fix

In the timeout branch (server-channels.ts:574-584):

Set running: false instead of true
Clean up store.aborts.delete(id) and store.tasks.delete(id) before returning

 if (!stoppedCleanly) {
   log.warn?.(
     `[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`,
   );
+  store.aborts.delete(id);
+  store.tasks.delete(id);
   setRuntime(channelId, id, {
     accountId: id,
-    running: true,
+    running: false,
     restartPending: false,
+    lastStopAt: Date.now(),
     lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
   });
   return;
 }

Environment

OpenClaw: latest main (fd2c883673)
Channels affected: all (Telegram, Discord, Slack, etc.)
Node: 22+

[AI-assisted]

extent analysis

TL;DR

The most likely fix is to update the stopChannel timeout branch to set running: false and clean up store.aborts and store.tasks to prevent silent channel death.

Guidance

Review the server-channels.ts file and update the timeout branch to set running: false instead of true to accurately reflect the channel's state.
Add cleanup code to remove the dead promise from store.tasks and the stale AbortController from store.aborts before returning from the timeout branch.
Verify that the health monitor correctly detects and recovers the channel after applying the fix by checking the channel's runtime snapshot and health monitor logs.
Test the fix by reproducing the steps to reproduce and confirming that the channel can be started successfully after a stop timeout.

Example

The suggested fix provides a code snippet that demonstrates the necessary changes:

 if (!stoppedCleanly) {
   log.warn?.(
     `[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`,
   );
+  store.aborts.delete(id);
+  store.tasks.delete(id);
   setRuntime(channelId, id, {
     accountId: id,
-    running: true,
+    running: false,
     restartPending: false,
+    lastStopAt: Date.now(),
     lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
   });
   return;
 }

Notes

This fix assumes that the store.aborts and store.tasks cleanup is sufficient to prevent silent channel death. Additional logging or monitoring may be necessary to ensure the fix is effective in all scenarios.

Recommendation

Apply the suggested fix to update the stopChannel timeout branch and clean up store.aborts and store.tasks to prevent silent channel death, as it directly addresses the root cause of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Bug: channel stop timeout leaves channel permanently dead — running: true with stale store entries [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root Cause — Three-Link Chain

Fix Action

Fixed

PR fix notes

PR #70056: fix(gateway): clean up store and set running=false on stop timeout

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Description

Root Cause — Three-Link Chain

Link 1: stop timeout lies about state and skips cleanup

Link 2: startChannel blocked by stale store entry

Link 3: health monitor fooled by running: true

Combined Effect

Steps to Reproduce

Expected Behavior

Suggested Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Link 3: health monitor fooled by `running: true`