openclaw - ✅(Solved) Fix Telegram long polling can leave getUpdates stuck far longer than expected [1 pull requests, 1 participants]

openclaw2026-03-26 07:12:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54992•Fetched 2026-04-08 01:33:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sinogello

Participants

sinogello

Timeline (top)

cross-referenced ×1

Telegram long polling can get stuck in a way that is only partially visible from current logs.

In a real deployment, Telegram health probes remained healthy, but OpenClaw repeatedly logged polling stalls and forced restarts. After instrumenting the polling session, the observed pattern was:

an in-flight getUpdates request remained stuck for much longer than expected
the watchdog detected the stall and forced a restart
the polling cycle eventually ended with Network request for 'getUpdates' failed!
bot.stop() / runner shutdown paths could add extra noise during recovery

Error Message

[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error ... error=Network request for 'getUpdates' failed! error=Network request for 'getUpdates' failed!

Root Cause

Suspected root cause area

PR fix notes

PR #55014: telegram: rebuild transport after stalled polling cycles

Repository: openclaw/openclaw
Author: sinogello
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/55014

Description (problem / solution / changelog)

Summary

This draft PR builds on the Telegram polling investigation and takes the next recovery step: once a polling cycle is identified as stalled or fails through a polling-network recovery path, the next polling cycle rebuilds the Telegram transport instead of reusing the previous one.

Why draft

This change affects recovery behavior, not just diagnostics. I want to keep the branch and PR ready for review, but test it locally in a real deployment for a few days before marking it ready.

What this changes

marks the current transport as dirty after:
- watchdog-detected polling stalls
- recoverable polling network errors
- unhandled polling network failures that trigger forced restart handling
rebuilds the Telegram transport on the next polling cycle when marked dirty
keeps the richer polling diagnostics and state-aware stall detection needed to verify behavior in production

Intended effect

If a request wedges in a way that survives normal abort/stop handling, the next cycle should no longer inherit the possibly-bad dispatcher/socket/transport state from the previous cycle.

Local test plan before ready-for-review

run in a real Telegram deployment for several days
compare stall frequency before/after
watch for reduced repeated polling stalls
confirm no new or recovery regressions
confirm transport rebuild logs correlate with successful recovery

Notes

This is intentionally opened as a draft first. I plan to validate it locally before requesting full upstream review.

Changed files

extensions/telegram/src/monitor.ts (modified, +8/-3)
extensions/telegram/src/polling-session.ts (modified, +89/-8)

Code Example

[telegram] Polling stall detected (active getUpdates stuck for 222.22s); forcing restart. [diag inFlight=1 outcome=started ...]
[telegram][diag] stop sequence initiated inFlight=1 outcome=started ...
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error ... error=Network request for 'getUpdates' failed!

---

durationMs=1000181
error=Network request for 'getUpdates' failed!

RAW_BUFFERClick to expand / collapse

Summary

Telegram long polling can get stuck in a way that is only partially visible from current logs.

In a real deployment, Telegram health probes remained healthy, but OpenClaw repeatedly logged polling stalls and forced restarts. After instrumenting the polling session, the observed pattern was:

an in-flight getUpdates request remained stuck for much longer than expected
the watchdog detected the stall and forced a restart
the polling cycle eventually ended with Network request for 'getUpdates' failed!
bot.stop() / runner shutdown paths could add extra noise during recovery

Observed logs

Examples from the instrumented build:

[telegram] Polling stall detected (active getUpdates stuck for 222.22s); forcing restart. [diag inFlight=1 outcome=started ...]
[telegram][diag] stop sequence initiated inFlight=1 outcome=started ...
[telegram][diag] polling cycle finished reason=polling stall detected inFlight=0 outcome=error ... error=Network request for 'getUpdates' failed!

Another example showed a much longer stuck request:

durationMs=1000181
error=Network request for 'getUpdates' failed!

That suggests the problem is not just overly aggressive watchdog behavior. The watchdog is often reacting to a genuinely stuck in-flight getUpdates request.

Why this is confusing today

Before adding extra diagnostics, the existing logs mostly showed:

Polling stall detected ...
Polling runner stop timed out after 15s ...
occasional send failures

That made it difficult to answer:

was there an active in-flight getUpdates?
how long had it been stuck?
did the request ever finish or fail?
was recovery noise coming from runner.stop(), bot.stop(), or both?

Suspected root cause area

The issue appears to be in the interaction between:

Telegram long polling (getUpdates)
fetch abort / request timeout behavior
watchdog restart logic
bot.stop() confirmation behavior

In particular, it looks possible for an in-flight getUpdates request to outlive the expected request timeout by a large margin, which may indicate that abort/timeout is not reliably terminating the underlying request in all cases.

Proposed follow-up directions

add richer polling diagnostics around getUpdates lifecycle
make watchdog stall detection state-aware (active stuck request vs idle/no completion)
reduce duplicate recovery noise during stop/restart
investigate whether transport/bot recreation should be more aggressive after certain stuck-request failures
investigate whether bot.stop() confirmation behavior contributes to recovery delays

extent analysis

Fix Plan

To address the issue of stuck getUpdates requests in Telegram long polling, we will implement the following steps:

Enhance request timeout and abort behavior:
- Set a reasonable request timeout for getUpdates (e.g., 60 seconds).
- Implement a retry mechanism with exponential backoff for failed requests.
Improve watchdog stall detection:
- Make the watchdog state-aware to distinguish between active stuck requests and idle/no completion.
- Adjust the watchdog timeout to account for the request timeout and retry mechanism.
Reduce recovery noise:
- Implement a more aggressive transport/bot recreation strategy after stuck-request failures.
- Review and optimize bot.stop() confirmation behavior to minimize recovery delays.

Example Code

import requests
import time

# Set request timeout and retry mechanism
def get_updates_with_retry(bot, timeout=60, retries=3, backoff_factor=2):
    for attempt in range(retries):
        try:
            response = requests.get('https://api.telegram.org/bot{}/getUpdates'.format(bot.token), timeout=timeout)
            response.raise_for_status()
            return response.json()
        except requests.RequestException as e:
            if attempt < retries - 1:
                time.sleep(backoff_factor * (2 ** attempt))
            else:
                raise e

# Implement state-aware watchdog
class Watchdog:
    def __init__(self, timeout):
        self.timeout = timeout
        self.stuck_request = False

    def detect_stall(self):
        if self.stuck_request:
            # Handle stuck request
            pass
        else:
            # Handle idle/no completion
            pass

# Usage
bot = TelegramBot(token='YOUR_TOKEN')
watchdog = Watchdog(timeout=60)

try:
    updates = get_updates_with_retry(bot)
    # Process updates
except requests.RequestException as e:
    # Handle request failure
    watchdog.detect_stall()

Verification

To verify the fix, monitor the application logs for:

Reduced occurrences of Polling stall detected and Network request for 'getUpdates' failed! errors.
Improved request timeout and retry behavior.
Enhanced watchdog stall detection and handling.

Extra Tips

Regularly review and adjust the request timeout, retry mechanism, and watchdog configuration to ensure optimal performance and reliability.
Consider implementing additional logging and monitoring to track request failures and stuck requests.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#response parsing #generation error #database connection #vector store #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Telegram long polling can leave getUpdates stuck far longer than expected [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Suspected root cause area

PR fix notes

PR #55014: telegram: rebuild transport after stalled polling cycles

Description (problem / solution / changelog)

Summary

Why draft

What this changes

Intended effect

Local test plan before ready-for-review

Notes

Changed files

Code Example

Summary

Observed logs

Why this is confusing today

Suspected root cause area

Proposed follow-up directions

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Telegram long polling can leave getUpdates stuck far longer than expected [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Suspected root cause area

PR fix notes

PR #55014: telegram: rebuild transport after stalled polling cycles

Description (problem / solution / changelog)

Summary

Why draft

What this changes

Intended effect

Local test plan before ready-for-review

Notes

Changed files

Code Example

Summary

Observed logs

Why this is confusing today

Suspected root cause area

Proposed follow-up directions

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING