openclaw - 💡(How to fix) Fix Gateway hangs on shutdown when Telegram API is unreachable [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60180Fetched 2026-04-08 02:35:22
View on GitHub
Comments
2
Participants
3
Timeline
2
Reactions
0
Timeline (top)
commented ×2

Error Message

From gateway.err.log:

[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
... (dozens of these)
[gateway] shutdown timed out; exiting without full cleanup

Root Cause

Two issues in the shutdown path:

Fix Action

Fix / Workaround

From gateway.err.log:

[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
... (dozens of these)
[gateway] shutdown timed out; exiting without full cleanup

Code Example

[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
... (dozens of these)
[gateway] shutdown timed out; exiting without full cleanup

---

[2026-04-03 13:14:44] Restarting gateway — health endpoint failed or timed out
[2026-04-03 13:15:05] Gateway STILL DOWN after restart
[2026-04-03 13:17:12] Restarting gateway — port 18789 not listening
[2026-04-03 13:18:17] Gateway STILL DOWN after restart
... (repeats for 25+ minutes)

---

// channel-manager — no timeout, hangs when API is down
if (plugin?.gateway?.stopAccount) {
  await plugin.gateway.stopAccount({ ... });
}

---

// server close handler — sequential, one slow channel blocks all
for (const plugin of listChannelPlugins())
  await params.stopChannel(plugin.id);

---

const STOP_ACCOUNT_TIMEOUT_MS = 10_000;

if (plugin?.gateway?.stopAccount) {
  await Promise.race([
    plugin.gateway.stopAccount({
      cfg, accountId: id, account,
      runtime: channelRuntimeEnvs[channelId],
      abortSignal: abort?.signal ?? new AbortController().signal,
      log: channelLogs[channelId],
      getStatus: () => getRuntime(channelId, id),
      setStatus: (next) => setRuntime(channelId, id, next),
    }),
    new Promise<void>((_, reject) =>
      setTimeout(() => reject(new Error(
        `stopAccount timed out for ${channelId}/${id} after ${STOP_ACCOUNT_TIMEOUT_MS}ms`
      )), STOP_ACCOUNT_TIMEOUT_MS)
    ),
  ]).catch((err) => {
    channelLogs[channelId]?.warn?.(
      `[${channelId}] stopAccount failed: ${err.message}; continuing shutdown`
    );
  });
}

---

const CHANNEL_SHUTDOWN_TIMEOUT_MS = 15_000;

await Promise.race([
  Promise.allSettled(
    listChannelPlugins().map((plugin) => params.stopChannel(plugin.id))
  ),
  new Promise<void>((resolve) =>
    setTimeout(() => {
      gatewayLog.warn('channel shutdown timed out; continuing');
      resolve();
    }, CHANNEL_SHUTDOWN_TIMEOUT_MS)
  ),
]);

---

{
  "gateway": {
    "shutdown": {
      "timeoutMs": 25000,
      "channelTimeoutMs": 10000
    }
  }
}
RAW_BUFFERClick to expand / collapse

Problem

When the Telegram API is unreachable (e.g., network issues, UND_ERR_CONNECT_TIMEOUT), the gateway cannot shut down cleanly. It logs "shutdown timed out; exiting without full cleanup" and exits with code 1, leaving the port in a dirty state.

This causes a restart loop: the watchdog restarts the gateway, but the new process can't bind the port (still held by the dying process), so it fails immediately — repeating until the network recovers.

Observed behavior

From gateway.err.log:

[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
[telegram] fetch fallback: enabling sticky IPv4-only dispatcher (codes=UND_ERR_CONNECT_TIMEOUT)
... (dozens of these)
[gateway] shutdown timed out; exiting without full cleanup

From watchdog.log — 10+ consecutive "STILL DOWN" restarts during a Telegram outage:

[2026-04-03 13:14:44] Restarting gateway — health endpoint failed or timed out
[2026-04-03 13:15:05] Gateway STILL DOWN after restart
[2026-04-03 13:17:12] Restarting gateway — port 18789 not listening
[2026-04-03 13:18:17] Gateway STILL DOWN after restart
... (repeats for 25+ minutes)

Root cause

Two issues in the shutdown path:

1. stopAccount() has no timeout

In the channel manager, stopChannel() calls await plugin.gateway.stopAccount(...) with no timeout. When the Telegram API is unreachable, this hangs indefinitely — blocking the entire shutdown sequence.

// channel-manager — no timeout, hangs when API is down
if (plugin?.gateway?.stopAccount) {
  await plugin.gateway.stopAccount({ ... });
}

2. Channels are stopped sequentially during server shutdown

// server close handler — sequential, one slow channel blocks all
for (const plugin of listChannelPlugins())
  await params.stopChannel(plugin.id);

The 25-second force-exit timer (SHUTDOWN_TIMEOUT_MS) eventually fires, but by then the process is in a bad state and the port isn't released cleanly.

Suggested fix

Fix 1: Wrap stopAccount() with a timeout

const STOP_ACCOUNT_TIMEOUT_MS = 10_000;

if (plugin?.gateway?.stopAccount) {
  await Promise.race([
    plugin.gateway.stopAccount({
      cfg, accountId: id, account,
      runtime: channelRuntimeEnvs[channelId],
      abortSignal: abort?.signal ?? new AbortController().signal,
      log: channelLogs[channelId],
      getStatus: () => getRuntime(channelId, id),
      setStatus: (next) => setRuntime(channelId, id, next),
    }),
    new Promise<void>((_, reject) =>
      setTimeout(() => reject(new Error(
        `stopAccount timed out for ${channelId}/${id} after ${STOP_ACCOUNT_TIMEOUT_MS}ms`
      )), STOP_ACCOUNT_TIMEOUT_MS)
    ),
  ]).catch((err) => {
    channelLogs[channelId]?.warn?.(
      `[${channelId}] stopAccount failed: ${err.message}; continuing shutdown`
    );
  });
}

Fix 2: Stop channels in parallel with an overall timeout

const CHANNEL_SHUTDOWN_TIMEOUT_MS = 15_000;

await Promise.race([
  Promise.allSettled(
    listChannelPlugins().map((plugin) => params.stopChannel(plugin.id))
  ),
  new Promise<void>((resolve) =>
    setTimeout(() => {
      gatewayLog.warn('channel shutdown timed out; continuing');
      resolve();
    }, CHANNEL_SHUTDOWN_TIMEOUT_MS)
  ),
]);

Bonus: make timeouts configurable

Ideally SHUTDOWN_TIMEOUT_MS and STOP_ACCOUNT_TIMEOUT_MS could be set in openclaw.json:

{
  "gateway": {
    "shutdown": {
      "timeoutMs": 25000,
      "channelTimeoutMs": 10000
    }
  }
}

Environment

  • openclaw version: $(node -e "console.log(require('/opt/homebrew/lib/node_modules/openclaw/package.json').version)" 2>/dev/null || echo "unknown")
  • macOS (darwin arm64)
  • Channel: Telegram (long-polling mode)
  • Triggered by: intermittent network issues causing UND_ERR_CONNECT_TIMEOUT to Telegram API

extent analysis

TL;DR

Implement timeouts for stopAccount() and parallelize channel shutdown to prevent the gateway from hanging indefinitely during Telegram API outages.

Guidance

  • Wrap stopAccount() with a timeout (e.g., 10 seconds) to prevent it from blocking the shutdown sequence indefinitely.
  • Stop channels in parallel with an overall timeout (e.g., 15 seconds) to prevent a single slow channel from blocking all others.
  • Consider making timeouts configurable via openclaw.json for easier tuning.
  • Review the provided code snippets for stopAccount() timeout and parallel channel shutdown to ensure they fit your specific use case.

Example

The suggested fix provides example code snippets for implementing timeouts:

const STOP_ACCOUNT_TIMEOUT_MS = 10_000;
await Promise.race([
  plugin.gateway.stopAccount({ /*... */ }),
  new Promise<void>((_, reject) =>
    setTimeout(() => reject(new Error(`stopAccount timed out`)), STOP_ACCOUNT_TIMEOUT_MS)
  ),
]);

And for parallel channel shutdown:

const CHANNEL_SHUTDOWN_TIMEOUT_MS = 15_000;
await Promise.race([
  Promise.allSettled(listChannelPlugins().map((plugin) => params.stopChannel(plugin.id))),
  new Promise<void>((resolve) =>
    setTimeout(() => {
      gatewayLog.warn('channel shutdown timed out; continuing');
      resolve();
    }, CHANNEL_SHUTDOWN_TIMEOUT_MS)
  ),
]);

Notes

The provided code snippets assume a TypeScript environment and may require adjustments for other languages or frameworks. Additionally, the choice of timeout values (e.g., 10 seconds, 15 seconds) may need to be tuned based on your specific use case and performance requirements.

Recommendation

Apply the suggested fixes to implement timeouts for stopAccount() and parallelize channel shutdown. This should help prevent the gateway from hanging indefinitely during Telegram API outages and reduce the likelihood of restart loops.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING