openclaw - ✅(Solved) Fix Telegram bots freeze periodically due to NAT timeout silently dropping idle getUpdates TCP connections [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#49461Fetched 2026-04-08 00:55:03
View on GitHub
Comments
2
Participants
3
Timeline
4
Reactions
0
Timeline (top)
commented ×2cross-referenced ×1subscribed ×1

Telegram bots in OpenClaw periodically stop responding for ~15–90 seconds because the long-polling (getUpdates) TCP connection is silently dropped by the local NAT device. The stall repeats every ~1000 s (exact interval matches the router's NAT idle timeout). A fix is available in PR #49460.

Error Message

When the NAT entry is dropped the server's reply has nowhere to go. The socket does not immediately error — it just hangs. The grammY polling stall watchdog (POLL_STALL_THRESHOLD_MS = 90_000) eventually fires, stops the runner, and forces a restart.

Root Cause

The getUpdates request uses a ~900 s long-poll timeout. During this time the TCP connection is idle — no data flows in either direction. Most NAT devices (home routers, firewalls) expire their TCP session table entries after 60–1800 s of inactivity (Linux conntrack default is 432 s; many home routers use 300–1200 s; the user's router appears to use ~100 s residual after ~900 s).

When the NAT entry is dropped the server's reply has nowhere to go. The socket does not immediately error — it just hangs. The grammY polling stall watchdog (POLL_STALL_THRESHOLD_MS = 90_000) eventually fires, stops the runner, and forces a restart.

Secondary cause: If the system has IPv6: Automatic configured but no actual IPv6 connectivity, autoSelectFamily attempts cause immediate EHOSTUNREACH errors (~200 s stall interval). Disabling IPv6 on the interface (networksetup -setv6off Wi-Fi on macOS) increases the stall interval from ~200 s to ~1000 s, confirming NAT timeout is the primary cause once IPv6 misconfiguration is removed.

Fix Action

Fix

See PR #49460: add keepAlive: true, keepAliveInitialDelay: 30_000 to buildTelegramConnectOptions. A 30 s initial delay avoids unnecessary probes on short API calls while refreshing NAT entries well before typical expiry.

PR fix notes

PR #49460: fix(telegram): enable TCP keepalive on getUpdates connections to prevent NAT timeout stalls

Description (problem / solution / changelog)

Problem

Telegram long-polling connections (getUpdates with a ~900 s timeout) sit idle for most of their lifetime. Home/office NAT devices typically expire idle TCP entries after 60–1800 s (commonly ~1000 s). When the NAT entry is silently dropped the socket hangs instead of returning an error — the grammY runner never detects the stall until the 90 s stall watchdog fires, logs Polling stall detected (no getUpdates for Xs), and forces a restart cycle.

This causes the gateway to appear frozen to end-users: all configured Telegram bots stop responding until the watchdog-triggered restart completes.

Observed pattern:

[telegram] Polling stall detected (no getUpdates for 1009s); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
Telegram polling runner stopped (polling stall detected); restarting in 2.0s.

The ~1009 s stall interval = ~900 s getUpdates timeout + NAT expiry margin, which is the expected fingerprint of a silently-dropped idle TCP connection.

Root Cause

buildTelegramConnectOptions in extensions/telegram/src/fetch.ts builds the undici Agent connect options but does not set keepAlive or keepAliveInitialDelay. Without TCP keepalive probes, idle long-poll connections give the NAT table no reason to keep its entry alive, and there is no mechanism to detect the dead connection before the grammY stall watchdog fires.

Fix

Unconditionally add keepAlive: true and keepAliveInitialDelay: 30_000 (30 s) to the connect options object. OS-level TCP keepalive probes (sent every ~75 s by default on macOS/Linux) will:

  1. Refresh NAT table entries before they expire (probes are real TCP packets with ACK, same effect as data traffic).
  2. Surface dead connections promptly with ETIMEDOUT instead of hanging silently, allowing the existing retry/fallback logic to recover immediately.

A 30 s initial delay avoids sending unnecessary probes on short-lived API calls while still refreshing NAT entries well before the typical ~1000 s expiry.

The return Object.keys(connect).length > 0 ? connect : null guard is removed since connect is now always non-empty.

Testing

Verified locally by disabling IPv6 on the network interface (which had been masking the issue with ~200 s EHOSTUNREACH stalls) and observing that polling no longer stalls after the change. Before the fix, stalls occurred predictably at intervals matching the local router's NAT timeout (~1000 s). After the fix, no stalls observed over a multi-hour run.

Related

Changed files

  • extensions/telegram/src/fetch.ts (modified, +20/-15)
  • scripts/stage-bundled-plugin-runtime.mjs (modified, +9/-0)

Code Example

[telegram] Polling stall detected (no getUpdates for 1009s); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
Telegram polling runner stopped (polling stall detected); restarting in 2.0s.
RAW_BUFFERClick to expand / collapse

Summary

Telegram bots in OpenClaw periodically stop responding for ~15–90 seconds because the long-polling (getUpdates) TCP connection is silently dropped by the local NAT device. The stall repeats every ~1000 s (exact interval matches the router's NAT idle timeout). A fix is available in PR #49460.

Observed Behavior

All configured Telegram bots stop responding at regular intervals. Gateway logs show:

[telegram] Polling stall detected (no getUpdates for 1009s); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
Telegram polling runner stopped (polling stall detected); restarting in 2.0s.

The 1009s interval is consistent across restarts and equals the getUpdates long-poll timeout (~900 s) plus the NAT entry expiry time of the router.

Root Cause

The getUpdates request uses a ~900 s long-poll timeout. During this time the TCP connection is idle — no data flows in either direction. Most NAT devices (home routers, firewalls) expire their TCP session table entries after 60–1800 s of inactivity (Linux conntrack default is 432 s; many home routers use 300–1200 s; the user's router appears to use ~100 s residual after ~900 s).

When the NAT entry is dropped the server's reply has nowhere to go. The socket does not immediately error — it just hangs. The grammY polling stall watchdog (POLL_STALL_THRESHOLD_MS = 90_000) eventually fires, stops the runner, and forces a restart.

Secondary cause: If the system has IPv6: Automatic configured but no actual IPv6 connectivity, autoSelectFamily attempts cause immediate EHOSTUNREACH errors (~200 s stall interval). Disabling IPv6 on the interface (networksetup -setv6off Wi-Fi on macOS) increases the stall interval from ~200 s to ~1000 s, confirming NAT timeout is the primary cause once IPv6 misconfiguration is removed.

Why TCP Keepalive Fixes It

TCP keepalive probes are small ACK packets sent at the OS level after a period of inactivity. They:

  1. Refresh the NAT table entry — the probe counts as traffic, resetting the NAT idle timer.
  2. Detect dead connections promptly — if the peer doesn't respond to probes, the OS closes the socket with ETIMEDOUT instead of leaving it hung forever.

undici's Agent connect options support keepAlive and keepAliveInitialDelay. Neither is currently set in buildTelegramConnectOptions (extensions/telegram/src/fetch.ts).

Affected Users

Any OpenClaw user behind a NAT device — essentially all home/office users. The stall interval varies by router model but the behavior is universal.

Fix

See PR #49460: add keepAlive: true, keepAliveInitialDelay: 30_000 to buildTelegramConnectOptions. A 30 s initial delay avoids unnecessary probes on short API calls while refreshing NAT entries well before typical expiry.

Environment

  • macOS (darwin 25.3.0), but the NAT timeout issue affects all platforms
  • Home router with NAT idle TCP timeout ~1000 s
  • getUpdates long-poll timeout ~900 s
  • undici Agent (via grammY)

extent analysis

Fix Plan

To resolve the issue of Telegram bots in OpenClaw periodically stopping responding due to the long-polling TCP connection being silently dropped by the local NAT device, follow these steps:

  • Update the buildTelegramConnectOptions function in extensions/telegram/src/fetch.ts to include TCP keepalive options.
  • Set keepAlive to true to enable TCP keepalive probes.
  • Set keepAliveInitialDelay to 30_000 (30 seconds) to avoid unnecessary probes on short API calls while refreshing NAT entries before typical expiry.

Example code:

import { Agent } from 'undici';

// ...

const buildTelegramConnectOptions = () => {
  // ...
  return {
    // ...
    keepAlive: true,
    keepAliveInitialDelay: 30_000,
  };
};

const agent = new Agent(buildTelegramConnectOptions());

Verification

To verify that the fix worked:

  • Monitor the gateway logs for polling stall detections and restarts.
  • Check that the stall interval is no longer consistent with the NAT idle timeout.
  • Test the Telegram bots' responsiveness over an extended period to ensure they no longer stop responding at regular intervals.

Extra Tips

  • Be aware that the NAT idle timeout can vary by router model, so the stall interval may differ across environments.
  • Consider adjusting the keepAliveInitialDelay value based on specific use cases or network conditions.
  • Refer to the PR #49460 for the complete fix implementation and additional context.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Telegram bots freeze periodically due to NAT timeout silently dropping idle getUpdates TCP connections [1 pull requests, 2 comments, 3 participants]