openclaw - ✅(Solved) Fix Telegram polling client gets permanently stuck after transient network failure [1 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52116Fetched 2026-04-08 01:15:26
View on GitHub
Comments
5
Participants
4
Timeline
8
Reactions
0
Author
Timeline (top)
commented ×5cross-referenced ×3

The gateway's Telegram client enters an unrecoverable state after a transient network hiccup. Once it fails, it retries sendChatAction in a tight loop (~every 3 seconds) indefinitely, even across gateway restarts.

Error Message

[telegram] sendChatAction failed: Network request for 'sendChatAction' failed! [telegram] sendMessage failed: Network request for 'sendMessage' failed! [telegram] final reply failed: HttpError: Network request for 'sendMessage' failed! [telegram] Polling stall detected (no getUpdates for XXXs); forcing restart. [telegram] Polling runner stop timed out after 15s; forcing restart cycle. [diagnostic] lane wait exceeded: lane=main waitedMs=XXXXX queueAhead=0

Root Cause

The gateway's Telegram client enters an unrecoverable state after a transient network hiccup. Once it fails, it retries sendChatAction in a tight loop (~every 3 seconds) indefinitely, even across gateway restarts.

PR fix notes

PR #58451: feat(telegram): add heartbeat supervisor for silent network outage detection

Description (problem / solution / changelog)

Background

In production deployments, the existing polling watchdog detects stalls when getUpdates hangs, but it cannot detect silent TCP drops — when the network connection dies without any error signal. In these cases, the bot can sit idle for 20+ minutes before any recovery attempt.

Related issues

  • #54708 — Message Loss on Telegram Network Failure
  • #52116 — Telegram polling client gets permanently stuck after transient network failure
  • #54513 — Telegram polling has no stall detection (unlike Slack health-monitor)
  • #55406 — Telegram polling can restart into not-started state after connectivity loss
  • #47458 — Polling stall loop — getUpdates hangs, restart never recovers
  • #41704 — Telegram polling stalls indefinitely when proxy TCP connection drops silently
  • #42782 — [Feature Request] Add health-monitor auto-reconnect for Telegram polling
  • #44396 — Telegram polling stall (~95s) causes significant message delivery delay

What changed

New: HeartbeatSupervisor (extensions/telegram/src/heartbeat.ts)

A threshold-based heartbeat supervisor that runs periodic getMe probes using the existing probeTelegram() function from probe.ts:

  • Runs on a configurable interval (default: 30s)
  • Counts consecutive probe failures
  • Fires onOutageDetected after reaching the failure threshold (default: 3)
  • Fires onRecovered once when connectivity returns
  • Security: Reuses probeTelegram() which already handles transport safely. Error messages are logged without the bot token or full URL — only the method name, error description, and failure counter.

Modified: TelegramPollingSession (extensions/telegram/src/polling-session.ts)

  • Integrates HeartbeatSupervisor in runUntilAbort() when apiBase is provided
  • onOutageDetected aborts the current polling cycle only via a cycle-scoped AbortController (not the global abort signal), so the outer loop can restart cleanly
  • onRecovered logs recovery; the polling loop restarts naturally
  • Supervisor starts before the polling loop, stops in finally
  • The existing watchdog is untouched — both mechanisms work independently

Modified: monitor.ts

  • Passes apiBase (from resolveTelegramApiBase()) to TelegramPollingSession

Tests

  • 14 new tests in heartbeat.test.ts: threshold behavior, recovery, abort signal, overlap prevention, token-never-in-logs assertion
  • 1 new test in polling-session.test.ts: verifies HeartbeatSupervisor starts when apiBase is provided
  • All 9 existing polling-session.test.ts tests pass unchanged
Test Files  2 passed (2)
     Tests  24 passed (24)

How to test

pnpm vitest run extensions/telegram/src/heartbeat.test.ts extensions/telegram/src/polling-session.test.ts

For manual QA: deploy with a Telegram bot, kill network connectivity for 2+ minutes, observe logs for [telegram][heartbeat] probe failed → outage detection → recovery on reconnect.

Changed files

  • extensions/telegram/src/heartbeat.test.ts (added, +417/-0)
  • extensions/telegram/src/heartbeat.ts (added, +110/-0)
  • extensions/telegram/src/monitor.test.ts (modified, +2/-0)
  • extensions/telegram/src/monitor.ts (modified, +3/-1)
  • extensions/telegram/src/polling-session.test.ts (modified, +147/-0)
  • extensions/telegram/src/polling-session.ts (modified, +66/-19)

Code Example

[telegram] sendChatAction failed: Network request for 'sendChatAction' failed!
[telegram] sendMessage failed: Network request for 'sendMessage' failed!
[telegram] final reply failed: HttpError: Network request for 'sendMessage' failed!
[telegram] Polling stall detected (no getUpdates for XXXs); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
[diagnostic] lane wait exceeded: lane=main waitedMs=XXXXX queueAhead=0
RAW_BUFFERClick to expand / collapse

Description

The gateway's Telegram client enters an unrecoverable state after a transient network hiccup. Once it fails, it retries sendChatAction in a tight loop (~every 3 seconds) indefinitely, even across gateway restarts.

Evidence

  • curl can successfully call the Telegram Bot API from the same machine (confirmed with getMe and sendMessage)
  • The gateway's internal Node.js HTTP client keeps failing with: Network request for 'sendChatAction' failed!
  • Polling stall detection fires repeatedly, triggering restart cycles, but the new polling runner immediately hits the same failures
  • The error log shows hundreds of consecutive failures spanning 30+ minutes with no recovery

Reproduction

  1. Run gateway with Telegram channel enabled
  2. Trigger a brief network interruption (e.g., wifi drop, DNS timeout)
  3. Observe that the Telegram client never recovers, even though network connectivity is restored
  4. Gateway restarts (via SIGTERM or openclaw gateway restart) do not fix the issue — the new process inherits or immediately re-enters the broken state

Relevant log patterns

[telegram] sendChatAction failed: Network request for 'sendChatAction' failed!
[telegram] sendMessage failed: Network request for 'sendMessage' failed!
[telegram] final reply failed: HttpError: Network request for 'sendMessage' failed!
[telegram] Polling stall detected (no getUpdates for XXXs); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
[diagnostic] lane wait exceeded: lane=main waitedMs=XXXXX queueAhead=0

Expected behavior

  • The Telegram client should reset its HTTP agent / socket pool after repeated failures
  • Successful recovery after network connectivity is restored
  • Gateway restart should guarantee a clean Telegram client state (no carryover from previous process)

Environment

  • macOS (Apple Silicon)
  • Node.js v25.6.1
  • OpenClaw gateway (LaunchAgent)

extent analysis

Fix Plan

To resolve the issue, we need to implement a retry mechanism with exponential backoff and reset the HTTP agent/socket pool after repeated failures. We'll also ensure a clean state after gateway restarts.

Step-by-Step Solution

  1. Implement Exponential Backoff:
    • Use a library like async-retry to handle retries with exponential backoff.
    • Example:

const asyncRetry = require('async-retry');

async function sendChatAction() { try { // Original sendChatAction code here } catch (error) { await asyncRetry(() => sendChatAction(), { retries: 5, factor: 2, minTimeout: 1000, maxTimeout: 30000, }); } }


2. **Reset HTTP Agent/Socket Pool**:
   - After a specified number of retries, reset the HTTP agent/socket pool.
   - Example using `axios`:
     ```javascript
const axios = require('axios');

const httpAgent = new axios.Agent({
  // Configure agent settings
});

// After retries, reset the agent
httpAgent.destroy();
  1. Ensure Clean State after Restart:

    • Use a process manager like pm2 to manage the gateway process.
    • Configure pm2 to restart the process with a clean environment.
  2. Code Changes:

    • Update the Telegram client to use the new retry mechanism and HTTP agent reset.
    • Example:

const telegramClient = { sendChatAction: async () => { try { // Send chat action code here } catch (error) { // Implement retry and reset logic here } }, };


### Verification
To verify the fix, follow these steps:

1. Trigger a network interruption.
2. Observe the gateway logs to ensure the Telegram client recovers after the network connectivity is restored.
3. Restart the gateway and verify that the Telegram client starts with a clean state.

### Extra Tips
- Monitor the gateway logs to detect any further issues.
- Consider implementing a circuit breaker pattern to prevent cascading failures.
- Use a reliable process manager like `pm2` to manage the gateway process.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • The Telegram client should reset its HTTP agent / socket pool after repeated failures
  • Successful recovery after network connectivity is restored
  • Gateway restart should guarantee a clean Telegram client state (no carryover from previous process)

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING