openclaw - 💡(How to fix) Fix telegram: ingress worker exit code 1 on stop wedges channel during config hot reload

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Any channels.telegram.* config change triggers gateway hot reload → telegram channel reload → ingress worker exits with code 1 (instead of 0) on graceful stop → channel supervisor leaves the account in running: true state and the subsequent startChannel is skipped. The channel is wedged silently (no inbound, no outbound, healthState: "stale-socket" later) and only recovers after a full launchctl bootout+bootstrap of the gateway.

Reproduced on v2026.5.16-beta.4 (release cf10f1ec64) with extensions/telegram isolated polling ingress enabled. Likely present back to whenever telegram-ingress-worker.runtime.ts was introduced.

Error Message

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000 if (!stoppedCleanly) { log.warn?.([${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown); setRuntime(channelId, id, { accountId: id, running: manual, // ← stays true on manual stop restartPending: !manual, lastError: channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms, }); … return; // store.tasks / store.aborts NOT cleared } // only clean stop reaches here: store.aborts.delete(id); store.tasks.delete(id); setRuntime(channelId, id, { running: false, … });

Root Cause

Three pieces interact:

1. Ingress worker exits with code 1 on graceful stop

extensions/telegram/src/telegram-ingress-worker.runtime.ts:

main()
  .then(() => undefined)
  .catch((err) => {
    post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
    process.exitCode = 1;
  });

When parentPort receives { type: "stop" }, the handler sets stopped = true and activeController.abort(...). The in-flight fetch rejects with the abort error; the inner catch (err) block's if (stopped) break; exits the loop cleanly. So far so good.

But the cleanup finally { await transport.close(); } can throw (most often when undici dispatcher pools are torn down mid-request — common under abort). That throw escapes through the finally, main() rejects, the top-level catch fires, and process.exitCode = 1. This is treated as a crash by the worker host:

extensions/telegram/src/telegram-ingress-worker.ts:55-61:

worker.once("exit", (code) => {
  if (code === 0) {
    resolve();
    return;
  }
  reject(new Error(`Telegram ingress worker exited with code ${code}`));
});

2. runIsolatedIngressCycle does not catch the rejected worker.task()

extensions/telegram/src/polling-session.ts #runIsolatedIngressCycle:

try {
  await worker.task();   // ← rejects with "Telegram ingress worker exited with code 1"
  if (this.opts.abortSignal?.aborted) {
    return "exit";
  }
  return shouldRestart ? "continue" : "exit";
} finally {
  clearInterval(drainTimer);
}

The try has no catch — the rejection bubbles up through runUntilAbort's while loop (also no catch) and out of the polling-session task promise. The reject value: Error: Telegram ingress worker exited with code 1.

3. Supervisor stop helper times out, then startChannel no-ops

src/gateway/server-channels.ts:720-738 in the stopAllAccounts / per-account stop helper:

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000
if (!stoppedCleanly) {
  log.warn?.(`[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`);
  setRuntime(channelId, id, {
    accountId: id,
    running: manual,       // ← stays true on manual stop
    restartPending: !manual,
    lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
  });
  return;  // store.tasks / store.aborts NOT cleared
}
// only clean stop reaches here:
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false,});

The hot-reload supervisor calls stopChannel with manual = true. Because worker.stop() gives the worker up to 15 s grace (telegram-ingress-worker.ts setTimeout(() => worker.terminate(), 15_000)), the worker frequently doesn't finish in 5 s — and even when it does, the eventual non-zero exit shows up as a rejected task promise after the supervisor has already returned with running: true and stale store.tasks / store.aborts.

Then in src/gateway/server-reload-handlers.ts:419-423:

params.logChannels.info(`restarting ${name} channel`);
if (!channelsStoppedBeforePluginReload.has(name)) {
  await params.stopChannel(name);
}
await params.startChannel(name);

startChannel runs against state that says running: true and finds an existing task in store.tasks. Effective no-op. Channel is wedged.

The auto-restart path (server-channels.ts:567+ .then(restart) → MAX_RESTART_ATTEMPTS) cannot save us either, because hot reload sets manuallyStopped.has(rKey) === true and the auto-restart block returns early on that condition.

Fix Action

Fix / Workaround

  1. Run the gateway with channels.telegram.accounts.*.botToken set and isolated polling ingress active (default).
  2. Have at least one in-flight tool call / embedded run on the telegram account so the reload waits to drain.
  3. Trigger any channels.telegram.* config change. The easiest is the message action's config.patch, e.g. flipping channels.telegram.replyToMode between quote and none from an agent — but any path that lands channels.telegram.* does it.
  4. Observe in gateway.log:
    [reload] config change detected; evaluating reload (channels.telegram.replyToMode, …)
    [reload] channel reload still deferred after 30448ms with 2 operation(s), …
    [reload] active operations and replies completed; reloading channels now
    [gateway/channels] restarting telegram channel
    [telegram] [default] released stopped Telegram polling lease
    [telegram] [ivy]     released stopped Telegram polling lease
    [telegram] [default] channel stop exceeded 5000ms after abort; continuing shutdown
    [telegram] [ivy]     channel stop exceeded 5000ms after abort; continuing shutdown
    [reload] config hot reload applied (channels.telegram.replyToMode, …)
    [telegram] [ivy]     channel exited: Telegram ingress worker exited with code 1
    [telegram] [default] channel exited: Telegram ingress worker exited with code 1
  5. From this point on, telegram channel is dead: no inbound, no outbound. pnpm openclaw gateway call health --json returns running: false, healthState: "not-running" for the affected accounts. No auto-restart, no further log lines from the telegram subsystem.
  6. Only fix: restart the whole gateway process.

But the cleanup finally { await transport.close(); } can throw (most often when undici dispatcher pools are torn down mid-request — common under abort). That throw escapes through the finally, main() rejects, the top-level catch fires, and process.exitCode = 1. This is treated as a crash by the worker host:

  • Severity: high in practice. Any user (or agent calling config.patch for the user) toggling a channels.telegram.* setting silently breaks their own telegram channel until they notice and restart the gateway. We hit this when an agent flipped replyToMode from quote to none per user request.
  • Detection: weak. Gateway health shows healthState: "stale-socket" after a while, but the supervisor has no actor that bounces a stale-socket channel — the existing polling stall detector (polling-liveness.ts) only fires when the polling loop is still ticking. Here the loop is gone.
  • Recovery: gateway-wide bounce only. No per-channel CLI restart is currently exposed.

Code Example

[reload] config change detected; evaluating reload (channels.telegram.replyToMode,)
   [reload] channel reload still deferred after 30448ms with 2 operation(s),   [reload] active operations and replies completed; reloading channels now
   [gateway/channels] restarting telegram channel
   [telegram] [default] released stopped Telegram polling lease
   [telegram] [ivy]     released stopped Telegram polling lease
   [telegram] [default] channel stop exceeded 5000ms after abort; continuing shutdown
   [telegram] [ivy]     channel stop exceeded 5000ms after abort; continuing shutdown
   [reload] config hot reload applied (channels.telegram.replyToMode,)
   [telegram] [ivy]     channel exited: Telegram ingress worker exited with code 1
   [telegram] [default] channel exited: Telegram ingress worker exited with code 1

---

main()
  .then(() => undefined)
  .catch((err) => {
    post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
    process.exitCode = 1;
  });

---

worker.once("exit", (code) => {
  if (code === 0) {
    resolve();
    return;
  }
  reject(new Error(`Telegram ingress worker exited with code ${code}`));
});

---

try {
  await worker.task();   // ← rejects with "Telegram ingress worker exited with code 1"
  if (this.opts.abortSignal?.aborted) {
    return "exit";
  }
  return shouldRestart ? "continue" : "exit";
} finally {
  clearInterval(drainTimer);
}

---

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000
if (!stoppedCleanly) {
  log.warn?.(`[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`);
  setRuntime(channelId, id, {
    accountId: id,
    running: manual,       // ← stays true on manual stop
    restartPending: !manual,
    lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
  });
  return;  // store.tasks / store.aborts NOT cleared
}
// only clean stop reaches here:
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false,});

---

params.logChannels.info(`restarting ${name} channel`);
if (!channelsStoppedBeforePluginReload.has(name)) {
  await params.stopChannel(name);
}
await params.startChannel(name);

---

--- a/extensions/telegram/src/telegram-ingress-worker.runtime.ts
+++ b/extensions/telegram/src/telegram-ingress-worker.runtime.ts
@@ main()
   .then(() => undefined)
   .catch((err) => {
     post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
-    process.exitCode = 1;
+    // If a stop was requested, surface the error to the parent but exit
+    // cleanly: shutdown drift (e.g. transport.close throwing in finally)
+    // is not a crash, and exit code 1 makes the channel supervisor treat
+    // the worker as crashed, which on hot reload leaves the supervisor in
+    // a "still running" state and skips startChannel — wedging telegram
+    // until the gateway is bounced.
+    process.exitCode = stopped ? 0 : 1;
   });
RAW_BUFFERClick to expand / collapse

Summary

Any channels.telegram.* config change triggers gateway hot reload → telegram channel reload → ingress worker exits with code 1 (instead of 0) on graceful stop → channel supervisor leaves the account in running: true state and the subsequent startChannel is skipped. The channel is wedged silently (no inbound, no outbound, healthState: "stale-socket" later) and only recovers after a full launchctl bootout+bootstrap of the gateway.

Reproduced on v2026.5.16-beta.4 (release cf10f1ec64) with extensions/telegram isolated polling ingress enabled. Likely present back to whenever telegram-ingress-worker.runtime.ts was introduced.

Reproduction

  1. Run the gateway with channels.telegram.accounts.*.botToken set and isolated polling ingress active (default).
  2. Have at least one in-flight tool call / embedded run on the telegram account so the reload waits to drain.
  3. Trigger any channels.telegram.* config change. The easiest is the message action's config.patch, e.g. flipping channels.telegram.replyToMode between quote and none from an agent — but any path that lands channels.telegram.* does it.
  4. Observe in gateway.log:
    [reload] config change detected; evaluating reload (channels.telegram.replyToMode, …)
    [reload] channel reload still deferred after 30448ms with 2 operation(s), …
    [reload] active operations and replies completed; reloading channels now
    [gateway/channels] restarting telegram channel
    [telegram] [default] released stopped Telegram polling lease
    [telegram] [ivy]     released stopped Telegram polling lease
    [telegram] [default] channel stop exceeded 5000ms after abort; continuing shutdown
    [telegram] [ivy]     channel stop exceeded 5000ms after abort; continuing shutdown
    [reload] config hot reload applied (channels.telegram.replyToMode, …)
    [telegram] [ivy]     channel exited: Telegram ingress worker exited with code 1
    [telegram] [default] channel exited: Telegram ingress worker exited with code 1
  5. From this point on, telegram channel is dead: no inbound, no outbound. pnpm openclaw gateway call health --json returns running: false, healthState: "not-running" for the affected accounts. No auto-restart, no further log lines from the telegram subsystem.
  6. Only fix: restart the whole gateway process.

Root cause

Three pieces interact:

1. Ingress worker exits with code 1 on graceful stop

extensions/telegram/src/telegram-ingress-worker.runtime.ts:

main()
  .then(() => undefined)
  .catch((err) => {
    post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
    process.exitCode = 1;
  });

When parentPort receives { type: "stop" }, the handler sets stopped = true and activeController.abort(...). The in-flight fetch rejects with the abort error; the inner catch (err) block's if (stopped) break; exits the loop cleanly. So far so good.

But the cleanup finally { await transport.close(); } can throw (most often when undici dispatcher pools are torn down mid-request — common under abort). That throw escapes through the finally, main() rejects, the top-level catch fires, and process.exitCode = 1. This is treated as a crash by the worker host:

extensions/telegram/src/telegram-ingress-worker.ts:55-61:

worker.once("exit", (code) => {
  if (code === 0) {
    resolve();
    return;
  }
  reject(new Error(`Telegram ingress worker exited with code ${code}`));
});

2. runIsolatedIngressCycle does not catch the rejected worker.task()

extensions/telegram/src/polling-session.ts #runIsolatedIngressCycle:

try {
  await worker.task();   // ← rejects with "Telegram ingress worker exited with code 1"
  if (this.opts.abortSignal?.aborted) {
    return "exit";
  }
  return shouldRestart ? "continue" : "exit";
} finally {
  clearInterval(drainTimer);
}

The try has no catch — the rejection bubbles up through runUntilAbort's while loop (also no catch) and out of the polling-session task promise. The reject value: Error: Telegram ingress worker exited with code 1.

3. Supervisor stop helper times out, then startChannel no-ops

src/gateway/server-channels.ts:720-738 in the stopAllAccounts / per-account stop helper:

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000
if (!stoppedCleanly) {
  log.warn?.(`[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`);
  setRuntime(channelId, id, {
    accountId: id,
    running: manual,       // ← stays true on manual stop
    restartPending: !manual,
    lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
  });
  return;  // store.tasks / store.aborts NOT cleared
}
// only clean stop reaches here:
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false,});

The hot-reload supervisor calls stopChannel with manual = true. Because worker.stop() gives the worker up to 15 s grace (telegram-ingress-worker.ts setTimeout(() => worker.terminate(), 15_000)), the worker frequently doesn't finish in 5 s — and even when it does, the eventual non-zero exit shows up as a rejected task promise after the supervisor has already returned with running: true and stale store.tasks / store.aborts.

Then in src/gateway/server-reload-handlers.ts:419-423:

params.logChannels.info(`restarting ${name} channel`);
if (!channelsStoppedBeforePluginReload.has(name)) {
  await params.stopChannel(name);
}
await params.startChannel(name);

startChannel runs against state that says running: true and finds an existing task in store.tasks. Effective no-op. Channel is wedged.

The auto-restart path (server-channels.ts:567+ .then(restart) → MAX_RESTART_ATTEMPTS) cannot save us either, because hot reload sets manuallyStopped.has(rKey) === true and the auto-restart block returns early on that condition.

Impact

  • Severity: high in practice. Any user (or agent calling config.patch for the user) toggling a channels.telegram.* setting silently breaks their own telegram channel until they notice and restart the gateway. We hit this when an agent flipped replyToMode from quote to none per user request.
  • Detection: weak. Gateway health shows healthState: "stale-socket" after a while, but the supervisor has no actor that bounces a stale-socket channel — the existing polling stall detector (polling-liveness.ts) only fires when the polling loop is still ticking. Here the loop is gone.
  • Recovery: gateway-wide bounce only. No per-channel CLI restart is currently exposed.

Proposed fix

Smallest, most targeted change — fix the worker exit code so cleanup drift after stop doesn't masquerade as a crash:

--- a/extensions/telegram/src/telegram-ingress-worker.runtime.ts
+++ b/extensions/telegram/src/telegram-ingress-worker.runtime.ts
@@ main()
   .then(() => undefined)
   .catch((err) => {
     post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
-    process.exitCode = 1;
+    // If a stop was requested, surface the error to the parent but exit
+    // cleanly: shutdown drift (e.g. transport.close throwing in finally)
+    // is not a crash, and exit code 1 makes the channel supervisor treat
+    // the worker as crashed, which on hot reload leaves the supervisor in
+    // a "still running" state and skips startChannel — wedging telegram
+    // until the gateway is bounced.
+    process.exitCode = stopped ? 0 : 1;
   });

With this fix, worker.task() resolves cleanly under requested-stop, runIsolatedIngressCycle reaches the abortSignal.aborted → return "exit" branch, the supervisor's stopHelper sees a settled task in time, clears store.tasks/store.aborts and sets running: false, then startChannel starts a fresh ingress worker.

Optional follow-ups (defence in depth, not required for this regression)

  1. Wrap await worker.task() in #runIsolatedIngressCycle with a try/catch that treats abortSignal.aborted as clean and otherwise logs + #waitBeforeRestart (so a real worker crash also triggers the restart cycle instead of throwing out of runUntilAbort).
  2. In src/gateway/server-channels.ts, when the per-account stop times out with manual: true, still clear store.tasks/store.aborts after a longer hard deadline so a subsequent startChannel is guaranteed to start from a clean slate.
  3. Add a supervisor that reacts to a sustained healthState: "stale-socket" (e.g. > 60 s without transport activity) by bouncing the affected account, similar to the polling stall detector but covering the cycle-not-running case.

Tests

The worker runtime currently has no direct unit tests under extensions/telegram/src/. A small test that asserts process.exitCode === 0 when main() rejects after stopped === true would catch any regression. I'm happy to add it in a PR if the fix above looks right.

Environment

  • Gateway version: v2026.5.16-beta.4
  • Platform: macOS 26.3 (Darwin 25.4.0), Node 22, pnpm 11.1, single-host launchd-managed gateway.
  • Channels: telegram (two accounts default+ivy), feishu, imessage.
  • Repro confirmed with isolated polling ingress (default).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix telegram: ingress worker exit code 1 on stop wedges channel during config hot reload