openclaw - ✅(Solved) Fix Telegram polling lacks per-token dedup guard — external scripts and hot-reload can silently create duplicate pollers [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56230Fetched 2026-04-08 01:43:15
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Root Cause

I ran into this when debugging persistent 409s on a 10-agent setup running macOS + LaunchAgent. Two full debug sessions, hundreds of 409s logged over days. The root cause turned out to be two things happening at once:

Fix Action

Fix / Workaround

Both patches are currently applied to the bundled dist files (hacky, I know), but they've been running clean for hours with zero 409s on a 4-bot setup.

PR #20930 fixed the SIGUSR1 + config.patch race condition where providers started twice during a signal-triggered restart. But the file-watcher hot-reload path (chokidarapplySnapshotapplyHotReload → channel restart) still has no dedup guard. And nothing in the gateway prevents external processes from polling the same token.

  • #20893 — Original duplicate provider bug (SIGUSR1 path, fixed by #20930)
  • #43628 — macOS LaunchAgent + configure wizard creating duplicate gateway processes
  • #50064 — Self-sustaining 409 loop from probe + health-monitor re-triggering transport
  • #49822 — 409 persists even with fresh bot token and health monitor disabled
  • #33154 — getUpdates conflict spam with multiple bot accounts

PR fix notes

PR #56324: fix(telegram): add per-token duplicate poller guard to prevent 409 conflicts

Description (problem / solution / changelog)

Summary

  • Add a per-token active polling session registry in monitorTelegramProvider() that detects and waits for an existing session to release before starting a new one
  • Add a 500ms drain pause in the hot-reload channel restart handler between stopChannel and startChannel

Both changes prevent 409 Conflict errors from concurrent getUpdates calls on the same bot token.

Context

The gateway has no protection against duplicate polling sessions for the same bot token. Multiple scenarios can create overlapping pollers:

  1. Hot-reload race: applyHotReload restarts channels via stopChannel then startChannel, but waitForGracefulStop has a 15-second timeout (POLL_STOP_GRACE_MS). If the grammY runner does not stop within that window, the new poller starts while the old one still holds a connection.

  2. External scripts: Any process calling getUpdates on the same token (launchd agents, cron scripts, monitoring tools) creates a competing poller the gateway cannot detect.

  3. Watchdog restart overlap: The 90-second POLL_STALL_THRESHOLD_MS triggers a polling cycle restart that can overlap with the existing session if graceful stop times out.

PR #20930 fixed the SIGUSR1 + config.patch race, but the file-watcher hot-reload path remains unguarded.

Implementation

extensions/telegram/src/monitor.ts (+68 lines) — Module-level Map<string, ActivePollerEntry> keyed by bot token. Before starting polling, monitorTelegramProvider checks the registry and waits up to 5 seconds for any existing session to signal completion via a done promise. The registry is cleaned up in the finally block.

src/gateway/server-reload-handlers.ts (+4 lines) — 500ms setTimeout between stopChannel and startChannel in the hot-reload channel restart path, giving the polling session graceful stop a buffer to fully release.

Test plan

  • Existing telegram monitor tests pass (23/23)
  • Existing reload handler tests pass (12/12)
  • Verified on a 4-bot macOS setup (jarvis, atlas, forge, trader) — zero 409 errors after 10+ minutes of clean operation
  • Manual test: edit config while gateway is running, verify hot-reload restarts channels without 409s

Fixes #56230 Related: #20893, #43628, #50064, #49822, #33154

Changed files

  • extensions/telegram/src/monitor.ts (modified, +69/-0)
  • src/agents/pi-tools.params.ts (modified, +14/-4)
  • src/gateway/server-reload-handlers.ts (modified, +6/-0)

Code Example

const __activePollers = new Map();

async function monitorTelegramProvider(opts = {}) {
  // Before starting, check for existing session on this token
  const existingEntry = __activePollers.get(token);
  if (existingEntry) {
    // Wait for old session to release (with timeout)
    await Promise.race([
      existingEntry.done,
      new Promise(r => setTimeout(r, 5000))
    ]);
  }
  
  // Register this session
  let resolvePollerDone;
  const pollerDone = new Promise(r => { resolvePollerDone = r; });
  __activePollers.set(token, { accountId, startedAt: Date.now(), done: pollerDone });
  
  try {
    // ... existing polling logic ...
  } finally {
    __activePollers.delete(token);
    resolvePollerDone();
  }
}

---

const restartChannel = async (name) => {
  await params.stopChannel(name);
  await new Promise(r => setTimeout(r, 500)); // let long-poll drain
  await params.startChannel(name);
};
RAW_BUFFERClick to expand / collapse

The Problem

There's no mechanism in monitorTelegramProvider() to detect or prevent duplicate polling sessions for the same bot token. If anything starts a second getUpdates call on the same token — whether it's a hot-reload race, an external script, or a health monitor probe — the gateway enters a permanent 409 Conflict loop with no recovery path short of a full restart.

I ran into this when debugging persistent 409s on a 10-agent setup running macOS + LaunchAgent. Two full debug sessions, hundreds of 409s logged over days. The root cause turned out to be two things happening at once:

  1. An external script (launched via a separate LaunchAgent) was calling getUpdates on the same bot token every 60 seconds — the gateway had no idea this was happening
  2. The hot-reload channel restart path (applyHotReloadstopChannelstartChannel) relies on waitForGracefulStop which has a 15-second timeout (POLL_STOP_GRACE_MS). If the grammY runner doesn't stop within that window, the function returns anyway, and the new poller starts while the old one is still holding a connection

Both scenarios result in two concurrent getUpdates calls on the same token → Telegram returns 409 → the poller enters a retry loop → retries create overlapping connections → self-sustaining conflict.

The Fix (what worked for me)

Added a global Map keyed by bot token in monitorTelegramProvider():

const __activePollers = new Map();

async function monitorTelegramProvider(opts = {}) {
  // Before starting, check for existing session on this token
  const existingEntry = __activePollers.get(token);
  if (existingEntry) {
    // Wait for old session to release (with timeout)
    await Promise.race([
      existingEntry.done,
      new Promise(r => setTimeout(r, 5000))
    ]);
  }
  
  // Register this session
  let resolvePollerDone;
  const pollerDone = new Promise(r => { resolvePollerDone = r; });
  __activePollers.set(token, { accountId, startedAt: Date.now(), done: pollerDone });
  
  try {
    // ... existing polling logic ...
  } finally {
    __activePollers.delete(token);
    resolvePollerDone();
  }
}

Also added a 500ms drain pause in the hot-reload handler between stopChannel and startChannel:

const restartChannel = async (name) => {
  await params.stopChannel(name);
  await new Promise(r => setTimeout(r, 500)); // let long-poll drain
  await params.startChannel(name);
};

Both patches are currently applied to the bundled dist files (hacky, I know), but they've been running clean for hours with zero 409s on a 4-bot setup.

Why the existing fix (#20930) doesn't cover this

PR #20930 fixed the SIGUSR1 + config.patch race condition where providers started twice during a signal-triggered restart. But the file-watcher hot-reload path (chokidarapplySnapshotapplyHotReload → channel restart) still has no dedup guard. And nothing in the gateway prevents external processes from polling the same token.

The 90-second POLL_STALL_THRESHOLD_MS also creates a predictable failure window — after exactly 90 seconds of clean operation, the watchdog can trigger a stalled restart that overlaps with the existing poller if waitForGracefulStop times out.

Related Issues

  • #20893 — Original duplicate provider bug (SIGUSR1 path, fixed by #20930)
  • #43628 — macOS LaunchAgent + configure wizard creating duplicate gateway processes
  • #50064 — Self-sustaining 409 loop from probe + health-monitor re-triggering transport
  • #49822 — 409 persists even with fresh bot token and health monitor disabled
  • #33154 — getUpdates conflict spam with multiple bot accounts

These all share the same underlying gap: monitorTelegramProvider has no awareness of whether another session is already polling the same token.

Environment

  • OpenClaw v2026.3.24
  • macOS (Apple Silicon, LaunchAgent-managed gateway)
  • 4 active Telegram bots (was 10, reduced during debugging)
  • Models: kimi-coding/k2p5, minimax-portal/MiniMax-M2.7, openai-codex/gpt-5.4

Suggested Approach

A per-token registry in monitorTelegramProvider (as implemented above) is the simplest fix. It's defense-in-depth — even if the channel manager's stop/start logic is perfect, the Telegram provider itself should refuse to create a second poller for a token that's already being polled. The registry lives in the same module, requires no changes to the channel manager, and adds maybe 20 lines of code.

extent analysis

Fix Plan

To prevent duplicate polling sessions for the same bot token, we need to implement a mechanism to detect and prevent concurrent getUpdates calls.

Here are the steps to fix the issue:

  • Create a global Map to store active pollers keyed by bot token.
  • Before starting a new poller, check if an existing session is already polling the same token.
  • If an existing session is found, wait for it to release with a timeout.
  • Register the new session in the Map.
  • After the polling logic is complete, remove the session from the Map.

Example code:

const __activePollers = new Map();

async function monitorTelegramProvider(opts = {}) {
  const token = opts.token;
  const existingEntry = __activePollers.get(token);
  if (existingEntry) {
    await Promise.race([
      existingEntry.done,
      new Promise(r => setTimeout(r, 5000))
    ]);
  }
  
  let resolvePollerDone;
  const pollerDone = new Promise(r => { resolvePollerDone = r; });
  __activePollers.set(token, { startedAt: Date.now(), done: pollerDone });
  
  try {
    // existing polling logic
  } finally {
    __activePollers.delete(token);
    resolvePollerDone();
  }
}

Additionally, add a drain pause in the hot-reload handler:

const restartChannel = async (name) => {
  await params.stopChannel(name);
  await new Promise(r => setTimeout(r, 500)); // let long-poll drain
  await params.startChannel(name);
};

Verification

To verify that the fix worked, monitor the gateway logs for 409 Conflict errors. If the errors persist, check the __activePollers Map to ensure that it's correctly tracking active pollers.

Also, test the hot-reload handler to ensure that it's correctly pausing between stopping and starting the channel.

Extra Tips

  • Consider adding logging to track when a poller is waiting for an existing session to release.
  • Review the POLL_STALL_THRESHOLD_MS value to ensure it's not triggering unnecessary restarts.
  • Test the fix with multiple bot tokens and concurrent polling sessions to ensure it's working as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING