openclaw - ✅(Solved) Fix 5.2 telegram polling never starts on high-RTT hosts (regression vs 4.29) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76388Fetched 2026-05-04 05:07:39
View on GitHub
Comments
1
Participants
2
Timeline
14
Reactions
2
Timeline (top)
cross-referenced ×4mentioned ×4subscribed ×4closed ×1

On openclaw 2026.5.2, the Telegram polling loop never starts for users on geographically-distant VPS hosts (high RTT to Telegram Bot API). The provider logs starting provider (@bot_name) and the menu setup line, then goes silent forever — no getUpdates is ever issued, no inbound messages are processed, and the bot becomes effectively dead. No regression on 2026.4.29 with identical setup and identical network.

The downgrade-to-4.29-and-everything-works experiment confirms this is a 5.2-specific regression.

Error Message

If steps 1 or 2 throw a recoverable network error (per isRecoverableTelegramNetworkError in send-7IxqIgtx.js), #waitBeforeRetryOnRecoverableSetupError is invoked, which: The fetch-timeout we see at 09:57:06 is exactly this path firing. But the retry log line we'd expect (Telegram setup network error: …; retrying in …) does not appear in our logs, which suggests one of: (a) The error is being raised inside syncTelegramMenuCommands (which is fire-and-forget, not awaited — line 649 of bot-CW5ZEQ0V.js), so the rejection is uncaught and silently swallowed by the event loop, leaving createTelegramBot to return a bot whose internal API client has already been put into a degraded state.

Root Cause

On openclaw 2026.5.2, the Telegram polling loop never starts for users on geographically-distant VPS hosts (high RTT to Telegram Bot API). The provider logs starting provider (@bot_name) and the menu setup line, then goes silent forever — no getUpdates is ever issued, no inbound messages are processed, and the bot becomes effectively dead. No regression on 2026.4.29 with identical setup and identical network.

The downgrade-to-4.29-and-everything-works experiment confirms this is a 5.2-specific regression.

Fix Action

Fix / Workaround

Happy to provide stability-bundle JSON, full prep-stage trace, or apply test patches against a fix branch.

PR fix notes

PR #76735: fix(telegram): start polling after webhook cleanup timeout

Description (problem / solution / changelog)

Summary

  • let Telegram polling continue after recoverable deleteWebhook startup failures instead of adding another pre-poll Bot API confirmation call
  • reuse the successful startup getMe probe as grammY botInfo, avoiding grammY runner's implicit second getMe before the first getUpdates
  • keep the shared bot identity shape in a leaf contract to preserve Telegram import topology
  • update Telegram troubleshooting docs and regression coverage for both startup paths

Refs #76388

Research

  • Telegram Bot API treats webhooks and getUpdates as mutually exclusive; if a webhook is still active, getUpdates is the authoritative conflict signal.
  • grammY runner calls bot.init() before polling, and grammY Bot#init() calls getMe() unless botInfo is supplied.
  • Existing OpenClaw polling already rebuilds the transport and retries webhook cleanup on Telegram getUpdates conflict, so the safer startup path is to remove redundant control-plane calls before polling.

Tests

  • pnpm check:architecture
  • pnpm tsgo:extensions && pnpm tsgo:extensions:test
  • OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test extensions/telegram/src/polling-session.test.ts extensions/telegram/src/monitor.test.ts extensions/telegram/src/channel.gateway.test.ts extensions/telegram/src/bot.create-telegram-bot.test.ts extensions/telegram/src/probe.test.ts extensions/telegram/src/request-timeouts.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md docs/channels/telegram.md extensions/telegram/src/probe.ts extensions/telegram/src/probe.test.ts extensions/telegram/src/monitor.types.ts extensions/telegram/src/bot.types.ts extensions/telegram/src/channel.ts extensions/telegram/src/polling-session.ts extensions/telegram/src/monitor.ts extensions/telegram/src/bot-core.ts extensions/telegram/src/bot.create-telegram-bot.test-harness.ts extensions/telegram/src/bot.create-telegram-bot.test.ts extensions/telegram/src/channel.gateway.test.ts extensions/telegram/src/polling-session.test.ts extensions/telegram/src/monitor.test.ts

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/channels/telegram.md (modified, +2/-1)
  • extensions/telegram/src/bot-core.ts (modified, +5/-1)
  • extensions/telegram/src/bot-info.ts (added, +16/-0)
  • extensions/telegram/src/bot.create-telegram-bot.test-harness.ts (modified, +5/-3)
  • extensions/telegram/src/bot.create-telegram-bot.test.ts (modified, +26/-0)
  • extensions/telegram/src/bot.types.ts (modified, +3/-0)
  • extensions/telegram/src/channel.gateway.test.ts (modified, +39/-0)
  • extensions/telegram/src/channel.ts (modified, +4/-0)
  • extensions/telegram/src/monitor.test.ts (modified, +8/-23)
  • extensions/telegram/src/monitor.ts (modified, +1/-0)
  • extensions/telegram/src/monitor.types.ts (modified, +2/-0)
  • extensions/telegram/src/polling-session.test.ts (modified, +24/-0)
  • extensions/telegram/src/polling-session.ts (modified, +4/-26)
  • extensions/telegram/src/probe.test.ts (modified, +22/-1)
  • extensions/telegram/src/probe.ts (modified, +63/-19)

Code Example

09:57:05.748  [telegram] [default] starting provider (@<bot_name>)
09:57:06.267  [telegram] menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 98 commands visible.
09:57:06.291  [fetch-timeout] fetch timeout reached; aborting operation
              operation=fetchWithTimeout url=https://api.telegram.org/bot…/getMe
              timeoutMs=10000 elapsedMs=10236
(then: silence — no more telegram-related log lines for the rest of the gateway lifetime)

---

async runUntilAbort() {
    while (!this.opts.abortSignal?.aborted) {
        const bot = await this.#createPollingBot();        // calls createTelegramBot
        if (!bot) continue;
        const cleanupState = await this.#ensureWebhookCleanup(bot);  // bot.api.deleteWebhook, 10s timeout
        if (cleanupState === "retry") continue;
        if (cleanupState === "exit") return;
        if (await this.#runPollingCycle(bot) === "exit") return;
    }
}
RAW_BUFFERClick to expand / collapse

Summary

On openclaw 2026.5.2, the Telegram polling loop never starts for users on geographically-distant VPS hosts (high RTT to Telegram Bot API). The provider logs starting provider (@bot_name) and the menu setup line, then goes silent forever — no getUpdates is ever issued, no inbound messages are processed, and the bot becomes effectively dead. No regression on 2026.4.29 with identical setup and identical network.

The downgrade-to-4.29-and-everything-works experiment confirms this is a 5.2-specific regression.

Symptom (verbatim from production logs)

Within seconds of gateway ready on 2026.5.2:

09:57:05.748  [telegram] [default] starting provider (@<bot_name>)
09:57:06.267  [telegram] menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 98 commands visible.
09:57:06.291  [fetch-timeout] fetch timeout reached; aborting operation
              operation=fetchWithTimeout url=https://api.telegram.org/bot…/getMe
              timeoutMs=10000 elapsedMs=10236
(then: silence — no more telegram-related log lines for the rest of the gateway lifetime)

Three full systemctl --user restart openclaw-gateway.service cycles produce identical results. getUpdates is never called and the offset file ~/.openclaw/telegram/update-offset-default.json does not advance.

By contrast, on 2026.4.29 with the exact same config, network, and bot token, the same point in the log is followed within 1–2 seconds by getUpdates polling activity and the bot replies normally. We have been running 2026.4.29 (and earlier 4.x releases) on this same VPS for months without this failure mode.

Setup

  • openclaw 2026.5.2 (a448042 / npm latest)
  • Linode VPS in Singapore region; ~156ms RTT to api.telegram.org (Amsterdam DC)
  • curl https://api.telegram.org/bot…/getMe from the same host returns HTTP 200 in ~500ms consistently — the network is fine, just not zero-RTT
  • 14 plugins enabled: active-memory, anthropic, brave, browser, deepseek, discord, google, lossless-claw, memory-core, openai, openrouter, telegram, voice-call, zai
  • Externalized 5.2 plugins installed and status=loaded (brave, discord, voice-call)
  • channels.telegram.enabled: true, single account, ~98 commands registered (per the menu-shortening log)
  • Telegram Bot API token confirmed valid (the starting provider (@jack_lee_bot) line proves it authenticated)

Source code analysis

Reviewing dist/monitor-polling.runtime-CDmJk4Z7.js (TelegramPollingSession.runUntilAbort, lines 244–258) and the supporting code:

async runUntilAbort() {
    while (!this.opts.abortSignal?.aborted) {
        const bot = await this.#createPollingBot();        // calls createTelegramBot
        if (!bot) continue;
        const cleanupState = await this.#ensureWebhookCleanup(bot);  // bot.api.deleteWebhook, 10s timeout
        if (cleanupState === "retry") continue;
        if (cleanupState === "exit") return;
        if (await this.#runPollingCycle(bot) === "exit") return;
    }
}

The startup sequence on 5.2 is:

  1. createTelegramBot(...) — internally fires syncTelegramMenuCommands (the "menu shortening" log line) plus implicit getMe for bot identity validation.
  2. #ensureWebhookCleanup(bot)bot.api.deleteWebhook(...) with 10s fetch-timeout.
  3. #runPollingCycle(bot)run(bot, runnerOptions) from @grammyjs/runner, which begins long-polling getUpdates.

If steps 1 or 2 throw a recoverable network error (per isRecoverableTelegramNetworkError in send-7IxqIgtx.js), #waitBeforeRetryOnRecoverableSetupError is invoked, which:

  • increments #restartAttempts,
  • computes a backoff delay via TELEGRAM_POLL_RESTART_POLICY = { initialMs: 2000, maxMs: 30000, factor: 1.8 },
  • logs once with the delay,
  • await sleepWithAbort(delayMs),
  • returns true/false to retry or exit.

The fetch-timeout we see at 09:57:06 is exactly this path firing. But the retry log line we'd expect (Telegram setup network error: …; retrying in …) does not appear in our logs, which suggests one of:

(a) The error is being raised inside syncTelegramMenuCommands (which is fire-and-forget, not awaited — line 649 of bot-CW5ZEQ0V.js), so the rejection is uncaught and silently swallowed by the event loop, leaving createTelegramBot to return a bot whose internal API client has already been put into a degraded state.

(b) The retry log line uses a log level that runtime.log is dropping in this user's setup (we see other runtime.log?.(…) calls successfully).

(c) Backoff/retry is not actually firing, and the polling session is wedged inside createTelegramBot waiting on a never-resolving promise from the failed inline menu sync.

Either way, the new 2026.5.2 startup sequence calls more sequential Telegram REST endpoints before entering the long-poll loop than 2026.4.29 did (getMe + setMyCommands + deleteWebhook + identity validation, in series). On a host where each call has a 156ms RTT and is bounded by a 10s fetch-timeout, the probability of one of them hitting the timeout in any given startup attempt is non-trivial — and once it fires, the recovery path appears to either be silent or not actually reachable.

Why 4.29 doesn't hit this

The 2026.4.29 telegram channel startup sequence is shorter (no menu sync wait, no separate webhook validation step) and bot.api.deleteWebhook is called inside the polling loop iteration rather than as a precondition gate. Once getUpdates enters the long-poll phase, transient network blips during polling do not block startup.

Suggested fix direction

  1. Make syncTelegramMenuCommands properly awaited or wrap it in a try/catch with explicit log+recovery, so a slow setMyCommands call cannot leave createTelegramBot returning a half-initialized bot.
  2. Move deleteWebhook out of the gating precondition and into the first poll cycle, so a slow deleteWebhook doesn't block polling startup indefinitely.
  3. Surface the retry log line at a higher log level (or always-on) so operators can tell the difference between "polling cleanly silent" vs "stuck in retry storm".
  4. Add a channels.telegram.startupTimeoutMs config knob that bypasses the menu/webhook precondition entirely on timeout and starts polling unconditionally.

Of these, (3) is the lowest-risk change and would have made this issue trivially diagnosable. (2) is the structural fix.

Environment

  • openclaw: 2026.5.2 (a448042)
  • OS: Linux (Linode VPS, Singapore)
  • Node: bundled with 2026.5.2 (v22.22.0)
  • Channels: telegram (active), discord (token unresolved separately, unrelated)
  • Network: Singapore → api.telegram.org (Amsterdam DC) RTT ~156ms baseline; raw curl getMe consistently ~500ms HTTP 200

Happy to provide stability-bundle JSON, full prep-stage trace, or apply test patches against a fix branch.

extent analysis

TL;DR

The issue can be resolved by making syncTelegramMenuCommands properly awaited or wrapped in a try/catch with explicit log and recovery, and moving deleteWebhook out of the gating precondition.

Guidance

  • Review the createTelegramBot function to ensure syncTelegramMenuCommands is properly awaited or handled to prevent silent failures.
  • Consider moving deleteWebhook into the first poll cycle to prevent it from blocking polling startup indefinitely.
  • Increase the log level for retry attempts to facilitate diagnosis.
  • Introduce a channels.telegram.startupTimeoutMs config option to bypass menu and webhook preconditions on timeout.

Example

async function createTelegramBot() {
  try {
    await syncTelegramMenuCommands();
    // ...
  } catch (error) {
    // Log and recover from the error
  }
}

Notes

The provided analysis suggests that the issue is specific to the 2026.5.2 version and is related to the changed startup sequence. The suggested fixes aim to address the potential causes, but further testing and verification are necessary to confirm their effectiveness.

Recommendation

Apply the suggested fix direction, starting with making syncTelegramMenuCommands properly awaited or wrapped in a try/catch, and moving deleteWebhook out of the gating precondition. This should help resolve the issue and improve the overall stability of the Telegram polling loop.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING