openclaw - 💡(How to fix) Fix telegram: ingress worker exit code 1 on stop wedges channel during config hot reload

openclaw2026-05-17 08:26:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Any channels.telegram.* config change triggers gateway hot reload → telegram channel reload → ingress worker exits with code 1 (instead of 0) on graceful stop → channel supervisor leaves the account in running: true state and the subsequent startChannel is skipped. The channel is wedged silently (no inbound, no outbound, healthState: "stale-socket" later) and only recovers after a full launchctl bootout+bootstrap of the gateway.

Reproduced on v2026.5.16-beta.4 (release cf10f1ec64) with extensions/telegram isolated polling ingress enabled. Likely present back to whenever telegram-ingress-worker.runtime.ts was introduced.

Error Message

Root Cause

Three pieces interact:

1. Ingress worker exits with code 1 on graceful stop

extensions/telegram/src/telegram-ingress-worker.runtime.ts:

main()
  .then(() => undefined)
  .catch((err) => {
    post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
    process.exitCode = 1;
  });

When parentPort receives { type: "stop" }, the handler sets stopped = true and activeController.abort(...). The in-flight fetch rejects with the abort error; the inner catch (err) block's if (stopped) break; exits the loop cleanly. So far so good.

But the cleanup finally { await transport.close(); } can throw (most often when undici dispatcher pools are torn down mid-request — common under abort). That throw escapes through the finally, main() rejects, the top-level catch fires, and process.exitCode = 1. This is treated as a crash by the worker host:

extensions/telegram/src/telegram-ingress-worker.ts:55-61:

worker.once("exit", (code) => {
  if (code === 0) {
    resolve();
    return;
  }
  reject(new Error(`Telegram ingress worker exited with code ${code}`));
});

2. runIsolatedIngressCycle does not catch the rejected worker.task()

extensions/telegram/src/polling-session.ts #runIsolatedIngressCycle:

try {
  await worker.task();   // ← rejects with "Telegram ingress worker exited with code 1"
  if (this.opts.abortSignal?.aborted) {
    return "exit";
  }
  …
  return shouldRestart ? "continue" : "exit";
} finally {
  clearInterval(drainTimer);
  …
}

The try has no catch — the rejection bubbles up through runUntilAbort's while loop (also no catch) and out of the polling-session task promise. The reject value: Error: Telegram ingress worker exited with code 1.

3. Supervisor stop helper times out, then startChannel no-ops

src/gateway/server-channels.ts:720-738 in the stopAllAccounts / per-account stop helper:

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000
if (!stoppedCleanly) {
  log.warn?.(`[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`);
  setRuntime(channelId, id, {
    accountId: id,
    running: manual,       // ← stays true on manual stop
    restartPending: !manual,
    lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
  });
  …
  return;  // store.tasks / store.aborts NOT cleared
}
// only clean stop reaches here:
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, … });

The hot-reload supervisor calls stopChannel with manual = true. Because worker.stop() gives the worker up to 15 s grace (telegram-ingress-worker.ts setTimeout(() => worker.terminate(), 15_000)), the worker frequently doesn't finish in 5 s — and even when it does, the eventual non-zero exit shows up as a rejected task promise after the supervisor has already returned with running: true and stale store.tasks / store.aborts.

Then in src/gateway/server-reload-handlers.ts:419-423:

params.logChannels.info(`restarting ${name} channel`);
if (!channelsStoppedBeforePluginReload.has(name)) {
  await params.stopChannel(name);
}
await params.startChannel(name);

startChannel runs against state that says running: true and finds an existing task in store.tasks. Effective no-op. Channel is wedged.

The auto-restart path (server-channels.ts:567+ .then(restart) → MAX_RESTART_ATTEMPTS) cannot save us either, because hot reload sets manuallyStopped.has(rKey) === true and the auto-restart block returns early on that condition.

Fix Action

Fix / Workaround

Run the gateway with channels.telegram.accounts.*.botToken set and isolated polling ingress active (default).
Have at least one in-flight tool call / embedded run on the telegram account so the reload waits to drain.
Trigger any channels.telegram.* config change. The easiest is the message action's config.patch, e.g. flipping channels.telegram.replyToMode between quote and none from an agent — but any path that lands channels.telegram.* does it.

Observe in gateway.log:

[reload] config change detected; evaluating reload (channels.telegram.replyToMode, …)
[reload] channel reload still deferred after 30448ms with 2 operation(s), …
[reload] active operations and replies completed; reloading channels now
[gateway/channels] restarting telegram channel
[telegram] [default] released stopped Telegram polling lease
[telegram] [ivy]     released stopped Telegram polling lease
[telegram] [default] channel stop exceeded 5000ms after abort; continuing shutdown
[telegram] [ivy]     channel stop exceeded 5000ms after abort; continuing shutdown
[reload] config hot reload applied (channels.telegram.replyToMode, …)
[telegram] [ivy]     channel exited: Telegram ingress worker exited with code 1
[telegram] [default] channel exited: Telegram ingress worker exited with code 1

From this point on, telegram channel is dead: no inbound, no outbound. pnpm openclaw gateway call health --json returns running: false, healthState: "not-running" for the affected accounts. No auto-restart, no further log lines from the telegram subsystem.
Only fix: restart the whole gateway process.

Severity: high in practice. Any user (or agent calling config.patch for the user) toggling a channels.telegram.* setting silently breaks their own telegram channel until they notice and restart the gateway. We hit this when an agent flipped replyToMode from quote to none per user request.
Detection: weak. Gateway health shows healthState: "stale-socket" after a while, but the supervisor has no actor that bounces a stale-socket channel — the existing polling stall detector (polling-liveness.ts) only fires when the polling loop is still ticking. Here the loop is gone.
Recovery: gateway-wide bounce only. No per-channel CLI restart is currently exposed.

Code Example

[reload] config change detected; evaluating reload (channels.telegram.replyToMode, …)
   [reload] channel reload still deferred after 30448ms with 2 operation(s), …
   [reload] active operations and replies completed; reloading channels now
   [gateway/channels] restarting telegram channel
   [telegram] [default] released stopped Telegram polling lease
   [telegram] [ivy]     released stopped Telegram polling lease
   [telegram] [default] channel stop exceeded 5000ms after abort; continuing shutdown
   [telegram] [ivy]     channel stop exceeded 5000ms after abort; continuing shutdown
   [reload] config hot reload applied (channels.telegram.replyToMode, …)
   [telegram] [ivy]     channel exited: Telegram ingress worker exited with code 1
   [telegram] [default] channel exited: Telegram ingress worker exited with code 1

---

main()
  .then(() => undefined)
  .catch((err) => {
    post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
    process.exitCode = 1;
  });

---

worker.once("exit", (code) => {
  if (code === 0) {
    resolve();
    return;
  }
  reject(new Error(`Telegram ingress worker exited with code ${code}`));
});

---

try {
  await worker.task();   // ← rejects with "Telegram ingress worker exited with code 1"
  if (this.opts.abortSignal?.aborted) {
    return "exit";
  }
  …
  return shouldRestart ? "continue" : "exit";
} finally {
  clearInterval(drainTimer);
  …
}

---

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000
if (!stoppedCleanly) {
  log.warn?.(`[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`);
  setRuntime(channelId, id, {
    accountId: id,
    running: manual,       // ← stays true on manual stop
    restartPending: !manual,
    lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
  });
  …
  return;  // store.tasks / store.aborts NOT cleared
}
// only clean stop reaches here:
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, … });

---

params.logChannels.info(`restarting ${name} channel`);
if (!channelsStoppedBeforePluginReload.has(name)) {
  await params.stopChannel(name);
}
await params.startChannel(name);

---

--- a/extensions/telegram/src/telegram-ingress-worker.runtime.ts
+++ b/extensions/telegram/src/telegram-ingress-worker.runtime.ts
@@ main()
   .then(() => undefined)
   .catch((err) => {
     post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
-    process.exitCode = 1;
+    // If a stop was requested, surface the error to the parent but exit
+    // cleanly: shutdown drift (e.g. transport.close throwing in finally)
+    // is not a crash, and exit code 1 makes the channel supervisor treat
+    // the worker as crashed, which on hot reload leaves the supervisor in
+    // a "still running" state and skips startChannel — wedging telegram
+    // until the gateway is bounced.
+    process.exitCode = stopped ? 0 : 1;
   });

RAW_BUFFERClick to expand / collapse

Summary

Reproduction

Run the gateway with channels.telegram.accounts.*.botToken set and isolated polling ingress active (default).
Have at least one in-flight tool call / embedded run on the telegram account so the reload waits to drain.
Trigger any channels.telegram.* config change. The easiest is the message action's config.patch, e.g. flipping channels.telegram.replyToMode between quote and none from an agent — but any path that lands channels.telegram.* does it.

Observe in gateway.log:

[reload] config change detected; evaluating reload (channels.telegram.replyToMode, …)
[reload] channel reload still deferred after 30448ms with 2 operation(s), …
[reload] active operations and replies completed; reloading channels now
[gateway/channels] restarting telegram channel
[telegram] [default] released stopped Telegram polling lease
[telegram] [ivy]     released stopped Telegram polling lease
[telegram] [default] channel stop exceeded 5000ms after abort; continuing shutdown
[telegram] [ivy]     channel stop exceeded 5000ms after abort; continuing shutdown
[reload] config hot reload applied (channels.telegram.replyToMode, …)
[telegram] [ivy]     channel exited: Telegram ingress worker exited with code 1
[telegram] [default] channel exited: Telegram ingress worker exited with code 1

From this point on, telegram channel is dead: no inbound, no outbound. pnpm openclaw gateway call health --json returns running: false, healthState: "not-running" for the affected accounts. No auto-restart, no further log lines from the telegram subsystem.
Only fix: restart the whole gateway process.

Root cause

Three pieces interact:

1. Ingress worker exits with code 1 on graceful stop

extensions/telegram/src/telegram-ingress-worker.runtime.ts:

main()
  .then(() => undefined)
  .catch((err) => {
    post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
    process.exitCode = 1;
  });

extensions/telegram/src/telegram-ingress-worker.ts:55-61:

worker.once("exit", (code) => {
  if (code === 0) {
    resolve();
    return;
  }
  reject(new Error(`Telegram ingress worker exited with code ${code}`));
});

2. runIsolatedIngressCycle does not catch the rejected worker.task()

extensions/telegram/src/polling-session.ts #runIsolatedIngressCycle:

try {
  await worker.task();   // ← rejects with "Telegram ingress worker exited with code 1"
  if (this.opts.abortSignal?.aborted) {
    return "exit";
  }
  …
  return shouldRestart ? "continue" : "exit";
} finally {
  clearInterval(drainTimer);
  …
}

3. Supervisor stop helper times out, then startChannel no-ops

src/gateway/server-channels.ts:720-738 in the stopAllAccounts / per-account stop helper:

const stoppedCleanly = await waitForChannelStopGracefully(task, CHANNEL_STOP_ABORT_TIMEOUT_MS); // 5_000
if (!stoppedCleanly) {
  log.warn?.(`[${id}] channel stop exceeded ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms after abort; continuing shutdown`);
  setRuntime(channelId, id, {
    accountId: id,
    running: manual,       // ← stays true on manual stop
    restartPending: !manual,
    lastError: `channel stop timed out after ${CHANNEL_STOP_ABORT_TIMEOUT_MS}ms`,
  });
  …
  return;  // store.tasks / store.aborts NOT cleared
}
// only clean stop reaches here:
store.aborts.delete(id);
store.tasks.delete(id);
setRuntime(channelId, id, { running: false, … });

Then in src/gateway/server-reload-handlers.ts:419-423:

params.logChannels.info(`restarting ${name} channel`);
if (!channelsStoppedBeforePluginReload.has(name)) {
  await params.stopChannel(name);
}
await params.startChannel(name);

startChannel runs against state that says running: true and finds an existing task in store.tasks. Effective no-op. Channel is wedged.

Impact

Severity: high in practice. Any user (or agent calling config.patch for the user) toggling a channels.telegram.* setting silently breaks their own telegram channel until they notice and restart the gateway. We hit this when an agent flipped replyToMode from quote to none per user request.
Detection: weak. Gateway health shows healthState: "stale-socket" after a while, but the supervisor has no actor that bounces a stale-socket channel — the existing polling stall detector (polling-liveness.ts) only fires when the polling loop is still ticking. Here the loop is gone.
Recovery: gateway-wide bounce only. No per-channel CLI restart is currently exposed.

Proposed fix

Smallest, most targeted change — fix the worker exit code so cleanup drift after stop doesn't masquerade as a crash:

--- a/extensions/telegram/src/telegram-ingress-worker.runtime.ts
+++ b/extensions/telegram/src/telegram-ingress-worker.runtime.ts
@@ main()
   .then(() => undefined)
   .catch((err) => {
     post({ type: "poll-error", message: formatErrorMessage(err), finishedAt: Date.now() });
-    process.exitCode = 1;
+    // If a stop was requested, surface the error to the parent but exit
+    // cleanly: shutdown drift (e.g. transport.close throwing in finally)
+    // is not a crash, and exit code 1 makes the channel supervisor treat
+    // the worker as crashed, which on hot reload leaves the supervisor in
+    // a "still running" state and skips startChannel — wedging telegram
+    // until the gateway is bounced.
+    process.exitCode = stopped ? 0 : 1;
   });

With this fix, worker.task() resolves cleanly under requested-stop, runIsolatedIngressCycle reaches the abortSignal.aborted → return "exit" branch, the supervisor's stopHelper sees a settled task in time, clears store.tasks/store.aborts and sets running: false, then startChannel starts a fresh ingress worker.

Optional follow-ups (defence in depth, not required for this regression)

Wrap await worker.task() in #runIsolatedIngressCycle with a try/catch that treats abortSignal.aborted as clean and otherwise logs + #waitBeforeRestart (so a real worker crash also triggers the restart cycle instead of throwing out of runUntilAbort).
In src/gateway/server-channels.ts, when the per-account stop times out with manual: true, still clear store.tasks/store.aborts after a longer hard deadline so a subsequent startChannel is guaranteed to start from a clean slate.
Add a supervisor that reacts to a sustained healthState: "stale-socket" (e.g. > 60 s without transport activity) by bouncing the affected account, similar to the polling stall detector but covering the cycle-not-running case.

Tests

The worker runtime currently has no direct unit tests under extensions/telegram/src/. A small test that asserts process.exitCode === 0 when main() rejects after stopped === true would catch any regression. I'm happy to add it in a PR if the fix above looks right.

Environment

Gateway version: v2026.5.16-beta.4
Platform: macOS 26.3 (Darwin 25.4.0), Node 22, pnpm 11.1, single-host launchd-managed gateway.
Channels: telegram (two accounts default+ivy), feishu, imessage.
Repro confirmed with isolated polling ingress (default).

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#installation #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix telegram: ingress worker exit code 1 on stop wedges channel during config hot reload

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Reproduction

Root cause

Impact

Proposed fix

Optional follow-ups (defence in depth, not required for this regression)

Tests

Environment

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix telegram: ingress worker exit code 1 on stop wedges channel during config hot reload

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Reproduction

Root cause

Impact

Proposed fix

Optional follow-ups (defence in depth, not required for this regression)

Tests

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING