openclaw - ✅(Solved) Fix Telegram long polling can leave getUpdates stuck far longer than expected [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54992Fetched 2026-04-08 01:33:54
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

Telegram long polling can get stuck in a way that is only partially visible from current logs.

In a real deployment, Telegram health probes remained healthy, but OpenClaw repeatedly logged polling stalls and forced restarts. After instrumenting the polling session, the observed pattern was:

  • an in-flight getUpdates request remained stuck for much longer than expected
  • the watchdog detected the stall and forced a restart
  • the polling cycle eventually ended with Network request for 'getUpdates' failed!
  • bot.stop() / runner shutdown paths could add extra noise during recovery

Error Message

[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error ... error=Network request for 'getUpdates' failed! error=Network request for 'getUpdates' failed!

Root Cause

Suspected root cause area

PR fix notes

PR #55014: telegram: rebuild transport after stalled polling cycles

Description (problem / solution / changelog)

Summary

This draft PR builds on the Telegram polling investigation and takes the next recovery step: once a polling cycle is identified as stalled or fails through a polling-network recovery path, the next polling cycle rebuilds the Telegram transport instead of reusing the previous one.

Why draft

This change affects recovery behavior, not just diagnostics. I want to keep the branch and PR ready for review, but test it locally in a real deployment for a few days before marking it ready.

What this changes

  • marks the current transport as dirty after:
    • watchdog-detected polling stalls
    • recoverable polling network errors
    • unhandled polling network failures that trigger forced restart handling
  • rebuilds the Telegram transport on the next polling cycle when marked dirty
  • keeps the richer polling diagnostics and state-aware stall detection needed to verify behavior in production

Intended effect

If a request wedges in a way that survives normal abort/stop handling, the next cycle should no longer inherit the possibly-bad dispatcher/socket/transport state from the previous cycle.

Local test plan before ready-for-review

  • run in a real Telegram deployment for several days
  • compare stall frequency before/after
  • watch for reduced repeated polling stalls
  • confirm no new or recovery regressions
  • confirm transport rebuild logs correlate with successful recovery

Notes

This is intentionally opened as a draft first. I plan to validate it locally before requesting full upstream review.

Changed files

  • extensions/telegram/src/monitor.ts (modified, +8/-3)
  • extensions/telegram/src/polling-session.ts (modified, +89/-8)

Code Example

[telegram] Polling stall detected (active getUpdates stuck for 222.22s); forcing restart. [diag inFlight=1 outcome=started ...]
[telegram][diag] stop sequence initiated inFlight=1 outcome=started ...
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error ... error=Network request for 'getUpdates' failed!

---

durationMs=1000181
error=Network request for 'getUpdates' failed!
RAW_BUFFERClick to expand / collapse

Summary

Telegram long polling can get stuck in a way that is only partially visible from current logs.

In a real deployment, Telegram health probes remained healthy, but OpenClaw repeatedly logged polling stalls and forced restarts. After instrumenting the polling session, the observed pattern was:

  • an in-flight getUpdates request remained stuck for much longer than expected
  • the watchdog detected the stall and forced a restart
  • the polling cycle eventually ended with Network request for 'getUpdates' failed!
  • bot.stop() / runner shutdown paths could add extra noise during recovery

Observed logs

Examples from the instrumented build:

[telegram] Polling stall detected (active getUpdates stuck for 222.22s); forcing restart. [diag inFlight=1 outcome=started ...]
[telegram][diag] stop sequence initiated inFlight=1 outcome=started ...
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error ... error=Network request for 'getUpdates' failed!

Another example showed a much longer stuck request:

durationMs=1000181
error=Network request for 'getUpdates' failed!

That suggests the problem is not just overly aggressive watchdog behavior. The watchdog is often reacting to a genuinely stuck in-flight getUpdates request.

Why this is confusing today

Before adding extra diagnostics, the existing logs mostly showed:

  • Polling stall detected ...
  • Polling runner stop timed out after 15s ...
  • occasional send failures

That made it difficult to answer:

  • was there an active in-flight getUpdates?
  • how long had it been stuck?
  • did the request ever finish or fail?
  • was recovery noise coming from runner.stop(), bot.stop(), or both?

Suspected root cause area

The issue appears to be in the interaction between:

  • Telegram long polling (getUpdates)
  • fetch abort / request timeout behavior
  • watchdog restart logic
  • bot.stop() confirmation behavior

In particular, it looks possible for an in-flight getUpdates request to outlive the expected request timeout by a large margin, which may indicate that abort/timeout is not reliably terminating the underlying request in all cases.

Proposed follow-up directions

  • add richer polling diagnostics around getUpdates lifecycle
  • make watchdog stall detection state-aware (active stuck request vs idle/no completion)
  • reduce duplicate recovery noise during stop/restart
  • investigate whether transport/bot recreation should be more aggressive after certain stuck-request failures
  • investigate whether bot.stop() confirmation behavior contributes to recovery delays

extent analysis

Fix Plan

To address the issue of stuck getUpdates requests in Telegram long polling, we will implement the following steps:

  • Enhance request timeout and abort behavior:
    • Set a reasonable request timeout for getUpdates (e.g., 60 seconds).
    • Implement a retry mechanism with exponential backoff for failed requests.
  • Improve watchdog stall detection:
    • Make the watchdog state-aware to distinguish between active stuck requests and idle/no completion.
    • Adjust the watchdog timeout to account for the request timeout and retry mechanism.
  • Reduce recovery noise:
    • Implement a more aggressive transport/bot recreation strategy after stuck-request failures.
    • Review and optimize bot.stop() confirmation behavior to minimize recovery delays.

Example Code

import requests
import time

# Set request timeout and retry mechanism
def get_updates_with_retry(bot, timeout=60, retries=3, backoff_factor=2):
    for attempt in range(retries):
        try:
            response = requests.get('https://api.telegram.org/bot{}/getUpdates'.format(bot.token), timeout=timeout)
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            if attempt < retries - 1:
                time.sleep(backoff_factor * (2 ** attempt))
            else:
                raise e

# Implement state-aware watchdog
class Watchdog:
    def __init__(self, timeout):
        self.timeout = timeout
        self.stuck_request = False

    def detect_stall(self):
        if self.stuck_request:
            # Handle stuck request
            pass
        else:
            # Handle idle/no completion
            pass

# Usage
bot = TelegramBot(token='YOUR_TOKEN')
watchdog = Watchdog(timeout=60)

try:
    updates = get_updates_with_retry(bot)
    # Process updates
except requests.RequestException as e:
    # Handle request failure
    watchdog.detect_stall()

Verification

To verify the fix, monitor the application logs for:

  • Reduced occurrences of Polling stall detected and Network request for 'getUpdates' failed! errors.
  • Improved request timeout and retry behavior.
  • Enhanced watchdog stall detection and handling.

Extra Tips

  • Regularly review and adjust the request timeout, retry mechanism, and watchdog configuration to ensure optimal performance and reliability.
  • Consider implementing additional logging and monitoring to track request failures and stuck requests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING