openclaw - ✅(Solved) Fix Telegram polling silently wedges after stall — transport rebuild never starts new polling cycle (5.4 + 5.5) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78473Fetched 2026-05-07 03:36:32
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Author
Timeline (top)
commented ×1cross-referenced ×1

Error Message

…then silence. No new polling cycle starts, no error logged. #runPollingCycle() either never re-enters or hangs in a state that doesn't surface diagnostics. 2. Add error/timeout handling in the transport-rebuild path so silent failures surface as logs.

Fix Action

Workaround

Wait for self-recovery, or openclaw update --tag <new-version> to replace the npm package and force fresh JS file load.

PR fix notes

PR #78646: fix(telegram): keep polling watchdog on getUpdates liveness

Description (problem / solution / changelog)

Summary

  • Problem: Telegram polling stall recovery treated unrelated outbound Bot API activity as liveness for inbound getUpdates polling.
  • Why it matters: active sendMessage traffic could mask a wedged inbound polling loop, leaving Telegram replies silent until a manual restart.
  • What changed: make the stall watchdog depend on completed/stuck getUpdates liveness only, while keeping unrelated API elapsed time in diagnostics.
  • What did NOT change (scope boundary): this does not redesign Telegram transport rebuild behavior beyond ensuring the watchdog fires when inbound polling is stale.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #78422
  • Related #78473
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: TelegramPollingLivenessTracker.detectStall() returned no stall when either getUpdates elapsed time or generic Bot API elapsed time was still within the threshold.
  • Missing detection / guardrail: tests covered stale polling and stale unrelated API calls, but not the case where stale getUpdates coincides with recent or in-flight non-polling API traffic.
  • Contributing context (if known): outbound Telegram API success proves the Bot API path is alive, but it does not prove inbound long-polling is still progressing.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/telegram/src/polling-liveness.test.ts, extensions/telegram/src/polling-session.test.ts
  • Scenario the test should lock in: stale getUpdates still triggers watchdog restart even when sendMessage recently succeeded or a non-getUpdates API call is in flight.
  • Why this is the smallest reliable guardrail: the regression is in the polling liveness decision and session watchdog behavior, so targeted tracker/session tests catch it without live Telegram credentials.
  • Existing test that already covers this (if any): existing stale polling tests covered the baseline restart path but not unrelated API masking.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Telegram polling recovery now restarts stale inbound polling even if unrelated outbound Telegram API calls are active or recently succeeded.

Diagram (if applicable)

Before:
stale getUpdates + recent sendMessage -> watchdog suppressed -> inbound polling stays wedged

After:
stale getUpdates + recent sendMessage -> watchdog restart -> polling cycle recovers

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Ubuntu 24.04.4 LTS
  • Runtime/container: Node 22 / pnpm
  • Model/provider: N/A
  • Integration/channel (if any): Telegram plugin polling watchdog
  • Relevant config (redacted): targeted regression tests do not require a live token; live proof used TELEGRAM_BOT_TOKEN env fallback from a local redacted token file

Steps

  1. Create a stale getUpdates liveness state.
  2. Record unrelated Telegram API activity such as sendMessage success or an in-flight non-getUpdates API call.
  3. Fire the polling stall watchdog.

Expected

  • Watchdog reports a polling stall and restarts the polling cycle.

Actual

  • Before this fix, recent unrelated API activity suppressed the watchdog.
  • After this fix, stale getUpdates liveness controls the watchdog and restart proceeds.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Validation on the rebased branch:

pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/telegram/src/polling-liveness.ts extensions/telegram/src/polling-liveness.test.ts extensions/telegram/src/polling-session.test.ts
All matched files use the correct format.

pnpm test extensions/telegram/src/polling-liveness.test.ts extensions/telegram/src/polling-session.test.ts -- --reporter=verbose
Test Files 2 passed (2)
Tests 23 passed (23)

Real behavior proof

  • Behavior or issue addressed: Telegram polling watchdog recovery should fire from stale getUpdates liveness even when unrelated outbound Bot API calls are active.
  • Real environment tested: Ubuntu 24.04.4 LTS, PR branch fix/telegram-polling-watchdog-getupdates, commit e301533582, Node v22.22.1, pnpm 10.33.2, real Telegram Bot API token from a local redacted token file, and a private DM chat with the bot.
  • Exact steps or command run after this patch: Called real Telegram Bot API getMe, read a recent private DM via getUpdates, sent a disabled-notification proof message with sendMessage, exercised the PR liveness code to verify stale getUpdates returns STALL after outbound Telegram activity, then ran an isolated source-mode Gateway on port 19986 across the watchdog window with TELEGRAM_BOT_TOKEN supplied via env fallback.
  • Evidence after fix: Copied live output from Ubuntu 24.04.4 LTS, token omitted:
telegram_getMe=ok botId=8656041674 username=set
telegram_recent_chat=found chatType=private updateId=198414331
telegram_sendMessage=ok source=updates chatId=6599824666 messageId=70
live_sendMessage_stale_getUpdates=STALL
live_liveness_message=Polling stall detected (active getUpdates stuck for 120s); forcing restart. [diag inFlight=1 outcome=started startedAt=0 finishedAt=n/a durationMs=n/a offset=123 apiElapsedMs=60001]

Telegram client also showed the real round trip:

[5/6/2026 3:30 PM] Crazy Cat: test
[5/6/2026 3:30 PM] Orinclaw Assistant: OpenClaw PR #78646 live watchdog proof 2026-05-06T22:30:53.827Z

Additional isolated Gateway live proof after the same patch:

branch=fix/telegram-polling-watchdog-getupdates
commit=e301533582
os=Ubuntu 24.04.4 LTS
mode=isolated source-mode Gateway, port 19986, real Telegram bot token from env fallback
telegram_provider_start=[default] starting provider (@orinclaw_ai_bot)
inbound_updates=real pending Telegram DM updates consumed by Gateway poller
window=2026-05-06T22:51:45+00:00..2026-05-06T22:55:47+00:00
samples=5
health=live on every sample
ready=true failing=[] on every sample
polling_stall_count=0
getupdates_conflict_count=0
telegram_provider_start_count=1
final_health={"ok":true,"status":"live"}
final_ready={"ready":true,"failing":[]}
shutdown=clean SIGINT after validation

Before-fix long-lived reproduction on parent commit d05415d603:

scenario=active getUpdates started at t=0, unrelated non-getUpdates API success every 30s, watchdog threshold=120000ms
sample_0 t=0s result=NO_STALL
sample_1 t=30s result=NO_STALL
sample_2 t=60s result=NO_STALL
sample_3 t=90s result=NO_STALL
sample_4 t=120s result=NO_STALL
sample_135s t=135s result=NO_STALL
final_expected=STALL
final_actual=NO_STALL
reproduced_bug=stale getUpdates exceeded threshold but watchdog stayed suppressed by unrelated API liveness
  • Observed result after fix: The bot successfully handled real getMe, getUpdates, and sendMessage; the watchdog returned STALL after real outbound Telegram activity; the isolated Gateway stayed live/ready across the watchdog window with one Telegram provider start, zero false Polling stall detected logs, and zero getUpdates conflict logs.
  • What was not tested: I did not run a multi-hour production soak or model-response verification in the isolated test home. The isolated home intentionally had no OpenAI auth, so agent replies failed after Telegram polling consumed inbound DM updates; that auth failure is separate from Telegram polling liveness.

Human Verification (required)

  • Verified scenarios: stale getUpdates with recent non-polling API success, stale getUpdates with recent in-flight non-polling API activity, stale getUpdates with newer in-flight non-polling activity, existing stale polling restart paths, pre-fix long-lived suppression reproduction, real Telegram Bot API getMe/getUpdates/sendMessage, and an isolated live Gateway Telegram polling run across the watchdog window.
  • Edge cases checked: diagnostic output keeps apiElapsedMs for debugging while not using generic API liveness to suppress stale polling recovery.
  • What was not tested: I did not run a multi-hour production soak or model-response verification in the isolated test home; live Telegram polling startup, real update consumption, and watchdog-window stability were verified with a real token.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: Telegram polling may restart while outbound API traffic is healthy.
    • Mitigation: this is intentional; outbound API health is not inbound getUpdates health, and the watchdog threshold/throttling still bounds restarts.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/telegram/src/polling-liveness.test.ts (modified, +10/-6)
  • extensions/telegram/src/polling-liveness.ts (modified, +31/-20)
  • extensions/telegram/src/polling-session.test.ts (modified, +19/-22)

Code Example

if (elapsed <= params.thresholdMs || apiElapsed <= params.thresholdMs) return null;

---

[telegram] Polling stall detected (no completed getUpdates for 149.99s); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
[telegram][diag] polling cycle finished reason=polling stall detected
[telegram] Telegram polling runner stopped (...); restarting in 2.22s.
[telegram][diag] rebuilding transport for next polling cycle
RAW_BUFFERClick to expand / collapse

Two related bugs in dist/monitor-polling.runtime-*.js reproduced in 2026.5.4 and 2026.5.5.

Symptom

  • Gateway running, telegram channel reports running, connected, mode:polling, works via openclaw channels status --probe
  • ZERO TCP from gateway PID to 149.154.x or 91.108.x (Telegram backbone)
  • pending_update_count > 0 at telegram side, growing over time
  • No getUpdates / polling log entries for hours
  • Outbound sendMessage works fine (state-drift: gateway reports healthy while inbound is dead)
  • Multiple gateway restarts (systemctl --user restart openclaw-gateway) re-enter the same wedged state
  • Self-recovery eventually (~75 min in one case, indeterminate in another) — mechanism unclear; possibly when the npm package is replaced (e.g. openclaw update)

Bug 1 — masked stall detection

File: dist/monitor-polling.runtime-DjS2STzm.js (5.4) / monitor-polling.runtime-DBv9gGnS.js (5.5)

Line 84:

if (elapsed <= params.thresholdMs || apiElapsed <= params.thresholdMs) return null;

apiElapsed is updated by noteApiCallSuccess() on ANY successful API call (including outbound sendMessage). Result: stall-detection is suppressed during normal outbound activity, even when getUpdates has hung indefinitely. Should likely be && or just if (elapsed <= params.thresholdMs) return null; — polling-elapsed alone determines the polling stall.

Bug 2 — transport-rebuild silent failure

When stall IS detected (e.g. before any outbound activity occurs), the recovery sequence logs:

[telegram] Polling stall detected (no completed getUpdates for 149.99s); forcing restart.
[telegram] Polling runner stop timed out after 15s; forcing restart cycle.
[telegram][diag] polling cycle finished reason=polling stall detected
[telegram] Telegram polling runner stopped (...); restarting in 2.22s.
[telegram][diag] rebuilding transport for next polling cycle

…then silence. No new polling cycle starts, no error logged. #runPollingCycle() either never re-enters or hangs in a state that doesn't surface diagnostics.

Cost / impact

Sky-down on inbound for 1–3 hours per occurrence. Two occurrences in a single day during 2026-05-06.

Trigger

Both occurrences followed an external disruption (network blip from Docker WSL toggle reset; auth-profile failure from Anthropic billing exhaustion). The disruption is recoverable in itself; the polling-restart code path doesn't survive it.

Workaround

Wait for self-recovery, or openclaw update --tag <new-version> to replace the npm package and force fresh JS file load.

Suggested fix

  1. Drop the apiElapsed check in detectStall — or use && — so stall-detection isn't masked by outbound activity.
  2. Add error/timeout handling in the transport-rebuild path so silent failures surface as logs.

Versions affected

Environment

  • Node v24.13.0 (nvm), Ubuntu (WSL2 on Windows 11)
  • Gateway managed by systemd-user

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Telegram polling silently wedges after stall — transport rebuild never starts new polling cycle (5.4 + 5.5) [1 pull requests, 1 comments, 2 participants]