openclaw - ✅(Solved) Fix Telegram polling: self-sustaining 409 getUpdates conflict from probe + health-monitor re-triggering transport [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#50064Fetched 2026-04-08 00:59:37
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Timeline (top)
cross-referenced ×3commented ×2referenced ×1

The gateway enters a permanent 409 getUpdates conflict loop because the Telegram probe client and polling client create competing getUpdates connections. Once triggered, the loop is self-sustaining because the long-poll timeout (30s) equals the max retry interval (30s), so each retry's server-side connection overlaps with the next.

Root Cause

Dual client creation:

  • channel.ts startAccount() (line ~489) calls probeTelegram()resolveTelegramTransport() — creates Client #1
  • monitor.ts monitorTelegramProvider() creates TelegramPollingSessioncreateTelegramBot()resolveTelegramTransport() — creates Client #2

Each call to resolveTelegramTransport() in fetch.ts creates a new dispatcher with no caching. The probe's TCP connection lingers in the socket pool and can race with the polling client's getUpdates call.

Self-sustaining loop:

  • createTelegramRunnerOptions() sets fetch.timeout: 30 (30s long-poll)
  • Grammy-runner's max retry interval is also 30s
  • When a 409 occurs, the retry fires a new getUpdates while the previous call's server-side connection is still alive (within the 30s window)
  • This creates a permanent overlap: each retry conflicts with the previous retry

Health-monitor re-trigger:

  • Even if the initial 409 self-resolves, the health-monitor (300s interval) re-probes via probeTelegram(), creating a fresh competing connection that re-triggers the loop

Fix Action

Workaround

Runtime patch reducing fetch.timeout from 30 to 10 in createTelegramRunnerOptions() resolves the self-sustaining loop. An initial 409 may still occur but recovers within one retry cycle.

PR fix notes

PR #50505: fix(telegram): avoid self-sustaining polling 409 conflicts

Description (problem / solution / changelog)

Summary

Fix a Telegram polling failure mode where getUpdates can fall into a self-sustaining 409 conflict loop.

This change tackles two parts of the issue:

  • skip the startup bot probe before polling begins, so polling owns getUpdates from the start
  • reduce grammY long-poll fetch timeout from 30s to 10s, so retry attempts do not overlap the previous server-side getUpdates window

Root cause

Issue #50064 describes a failure mode where Telegram probe and polling behavior can combine into repeated 409 Conflict: terminated by other getUpdates request errors.

The key pieces are:

  • startup probe runs before polling starts
  • polling retries can line up with the previous 30s long-poll window
  • once a 409 is triggered, the 30s fetch timeout can keep the overlap going

By removing the startup probe from the polling path and shortening the polling timeout, polling no longer competes with an immediate pre-start probe and retry cycles recover instead of re-triggering the same overlap.

What changed

  • in channel.ts, only run the startup probe for webhook accounts
    • polling accounts now go straight to monitorTelegramProvider(...)
  • in monitor.ts, reduce grammY polling fetch.timeout from 30 to 10
  • add a regression test that verifies polling startup skips the probe
  • update the runner-options test to lock the new timeout

Why this is different from nearby Telegram PRs

This is specifically about the startup probe / polling ownership conflict and the 30s long-poll = 30s retry overlap described in #50064.

It is not the same as:

  • #49910: graceful stop timeout / shutdown cleanup race
  • #50368: startup persisted-offset confirmation timeout

Testing

  • pnpm test extensions/telegram/src/monitor.test.ts
  • added coverage in extensions/telegram/src/channel.test.ts

channel.test.ts currently trips an unrelated repo test-environment import problem in this checkout (fake-indexeddb/auto via Matrix runtime mocking), but the new assertion is narrow and the Telegram monitor suite passes locally.

Closes #50064

Changed files

  • extensions/telegram/src/channel.test.ts (modified, +30/-6)
  • extensions/telegram/src/channel.ts (modified, +17/-14)
  • extensions/telegram/src/monitor.test.ts (modified, +3/-0)
  • extensions/telegram/src/monitor.ts (modified, +3/-2)

PR #56324: fix(telegram): add per-token duplicate poller guard to prevent 409 conflicts

Description (problem / solution / changelog)

Summary

  • Add a per-token active polling session registry in monitorTelegramProvider() that detects and waits for an existing session to release before starting a new one
  • Add a 500ms drain pause in the hot-reload channel restart handler between stopChannel and startChannel

Both changes prevent 409 Conflict errors from concurrent getUpdates calls on the same bot token.

Context

The gateway has no protection against duplicate polling sessions for the same bot token. Multiple scenarios can create overlapping pollers:

  1. Hot-reload race: applyHotReload restarts channels via stopChannel then startChannel, but waitForGracefulStop has a 15-second timeout (POLL_STOP_GRACE_MS). If the grammY runner does not stop within that window, the new poller starts while the old one still holds a connection.

  2. External scripts: Any process calling getUpdates on the same token (launchd agents, cron scripts, monitoring tools) creates a competing poller the gateway cannot detect.

  3. Watchdog restart overlap: The 90-second POLL_STALL_THRESHOLD_MS triggers a polling cycle restart that can overlap with the existing session if graceful stop times out.

PR #20930 fixed the SIGUSR1 + config.patch race, but the file-watcher hot-reload path remains unguarded.

Implementation

extensions/telegram/src/monitor.ts (+68 lines) — Module-level Map<string, ActivePollerEntry> keyed by bot token. Before starting polling, monitorTelegramProvider checks the registry and waits up to 5 seconds for any existing session to signal completion via a done promise. The registry is cleaned up in the finally block.

src/gateway/server-reload-handlers.ts (+4 lines) — 500ms setTimeout between stopChannel and startChannel in the hot-reload channel restart path, giving the polling session graceful stop a buffer to fully release.

Test plan

  • Existing telegram monitor tests pass (23/23)
  • Existing reload handler tests pass (12/12)
  • Verified on a 4-bot macOS setup (jarvis, atlas, forge, trader) — zero 409 errors after 10+ minutes of clean operation
  • Manual test: edit config while gateway is running, verify hot-reload restarts channels without 409s

Fixes #56230 Related: #20893, #43628, #50064, #49822, #33154

Changed files

  • extensions/telegram/src/monitor.ts (modified, +69/-0)
  • src/agents/pi-tools.params.ts (modified, +14/-4)
  • src/gateway/server-reload-handlers.ts (modified, +6/-0)
RAW_BUFFERClick to expand / collapse

Summary

The gateway enters a permanent 409 getUpdates conflict loop because the Telegram probe client and polling client create competing getUpdates connections. Once triggered, the loop is self-sustaining because the long-poll timeout (30s) equals the max retry interval (30s), so each retry's server-side connection overlaps with the next.

Reproduction

  1. Start the gateway with Telegram polling enabled (channels.telegram.enabled: true)
  2. Observe startup logs — two autoSelectFamily + dnsResultOrder log pairs appear (one from probe, one from polling)
  3. Within 30-60s, getUpdates conflict: 409 errors begin
  4. Errors continue indefinitely at ~30s intervals

Root Cause

Dual client creation:

  • channel.ts startAccount() (line ~489) calls probeTelegram()resolveTelegramTransport() — creates Client #1
  • monitor.ts monitorTelegramProvider() creates TelegramPollingSessioncreateTelegramBot()resolveTelegramTransport() — creates Client #2

Each call to resolveTelegramTransport() in fetch.ts creates a new dispatcher with no caching. The probe's TCP connection lingers in the socket pool and can race with the polling client's getUpdates call.

Self-sustaining loop:

  • createTelegramRunnerOptions() sets fetch.timeout: 30 (30s long-poll)
  • Grammy-runner's max retry interval is also 30s
  • When a 409 occurs, the retry fires a new getUpdates while the previous call's server-side connection is still alive (within the 30s window)
  • This creates a permanent overlap: each retry conflicts with the previous retry

Health-monitor re-trigger:

  • Even if the initial 409 self-resolves, the health-monitor (300s interval) re-probes via probeTelegram(), creating a fresh competing connection that re-triggers the loop

Suggested Fixes

Fix A — Transport caching (primary): Add a cache to resolveTelegramTransport() in fetch.ts, similar to the existing probeFetcherCache in probe.ts. This ensures probe and polling share the same dispatcher/connection pool.

Fix B — Break the 30s=30s deadlock: Reduce fetch.timeout in createTelegramRunnerOptions() to a value less than the max retry interval (e.g., 10-15s). This ensures the previous server-side connection expires before the next retry fires.

Fix C — Skip probe before polling: In startAccount(), skip the probeTelegram() call when the provider is about to start polling immediately. The probe is useful for health checks but redundant right before monitorTelegramProvider().

Workaround

Runtime patch reducing fetch.timeout from 30 to 10 in createTelegramRunnerOptions() resolves the self-sustaining loop. An initial 409 may still occur but recovers within one retry cycle.

Environment

  • OpenClaw latest (0537f3e59)
  • Docker on Windows 10 (gateway runs in Linux container)
  • Single bot token, single container, no webhook
  • Telegram plugin with polling mode

🤖 Generated with Claude Code

extent analysis

Fix Plan

To resolve the getUpdates conflict loop, we will implement Fix A — Transport caching. This involves adding a cache to resolveTelegramTransport() in fetch.ts to ensure the probe and polling clients share the same dispatcher/connection pool.

Code Changes

// fetch.ts
const transportCache = new Map();

async function resolveTelegramTransport() {
  const cacheKey = 'telegram-transport';
  if (transportCache.has(cacheKey)) {
    return transportCache.get(cacheKey);
  }

  // existing implementation to create the transport
  const transport = await createTransport();

  transportCache.set(cacheKey, transport);
  return transport;
}

Verification

After applying the fix, restart the gateway and verify that the getUpdates conflict: 409 errors no longer occur. Monitor the logs for the presence of a single autoSelectFamily and dnsResultOrder log pair, indicating that only one client is creating a connection.

Extra Tips

  • Consider implementing Fix B — Break the 30s=30s deadlock as an additional precaution to prevent similar issues in the future.
  • Review the probeTelegram() call in startAccount() and consider skipping it when the provider is about to start polling immediately, as suggested in Fix C — Skip probe before polling.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING