hermes - ✅(Solved) Fix Telegram gateway can stay alive with dead polling after conflict retry start_polling failure [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25221Fetched 2026-05-14 03:47:57
View on GitHub
Comments
0
Participants
1
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×2referenced ×2

Error Message

mock_app.updater.start_polling = AsyncMock(side_effect=Exception("Timed out"))

Root Cause

Root Cause Hypothesis

Fix Action

Fixed

PR fix notes

PR #25232: fix(gateway): route conflict-retry start_polling failure to reconnect ladder

Description (problem / solution / changelog)

Summary

When start_polling() raises during a conflict retry in _handle_polling_conflict, the exception was logged and swallowed. Polling stayed dead with no further error callbacks, leaving the gateway "alive but deaf" until manual restart.

Root Cause

In gateway/platforms/telegram.py, _handle_polling_conflict retries start_polling() after a 409 Conflict. If that retry call itself fails (timeout, network error), the except block logged the error and returned — without routing the failure into the existing reconnect ladder.

The _handle_polling_network_error handler already has the correct pattern: when start_polling() fails in its retry path, it self-schedules a new _handle_polling_network_error task via asyncio.ensure_future (lines 814-823). The conflict-retry path was missing this.

Fix

Route conflict-retry start_polling() failures into _handle_polling_network_error, which handles retries with exponential back-off (5s, 10s, 20s, 40s, 60s cap) and eventual escalation to retryable-fatal so the supervisor can restart the gateway.

Code Intelligence

  • Analyzed: _handle_polling_conflict (callers: 2, callees: 3, flows: 0)
  • Blast radius: LOW — change is inside an exception handler on a rarely-triggered retry path
  • Related patterns: _handle_polling_network_error already uses the same asyncio.ensure_future self-scheduling pattern (lines 814-823)

Regression Coverage

New test test_conflict_retry_routes_to_reconnect_on_start_polling_failure in tests/gateway/test_telegram_network_reconnect.py:

  • Mocks start_polling to raise during conflict retry
  • Asserts that a reconnect task is scheduled in _background_tasks
  • Mirrors the existing test_reconnect_self_schedules_on_start_polling_failure which covers the network-error path

Testing

tests/gateway/test_telegram_network_reconnect.py — 16 passed
tests/gateway/test_telegram_conflict.py — 6 passed

Fixes Telegram gateway can stay alive with dead polling after conflict retry start_polling failure #25221

Changed files

  • gateway/platforms/telegram.py (modified, +10/-2)
  • tests/gateway/test_telegram_network_reconnect.py (modified, +42/-0)

PR #25285: fix(telegram): route conflict-retry start_polling failure into network reconnect ladder

Description (problem / solution / changelog)

Summary

Fixes #25221 — Telegram gateway stays alive with dead polling after the conflict-retry start_polling call fails.

The Bug

In _handle_polling_conflict(), when the retry start_polling call (line 896) raises an exception, the handler only logged the error and returned silently — no reconnect was scheduled, no further retry was triggered. This left the gateway process alive but with dead Telegram polling, requiring a manual restart.

The Fix

The fix mirrors what _handle_polling_network_error already does for network failures (lines 814-823): when start_polling fails in the conflict-retry path, schedule a background task that calls _handle_polling_network_error(retry_err). This routes the failure through the exponential-backoff reconnect ladder so the adapter recovers automatically instead of going silent.

Changed file: gateway/platforms/telegram.py (lines 904-916)

except Exception as retry_err:
    logger.warning("[%s] Telegram polling retry failed: %s", self.name, retry_err)
    # start_polling failed — polling is dead and no further error
    # callbacks will fire, so schedule the network reconnect handler
    # ourselves, routing the retry failure through the exponential-
    # backoff reconnect ladder so the adapter doesn't go silent.
    if not self.has_fatal_error:
        task = asyncio.ensure_future(
            self._handle_polling_network_error(retry_err)
        )
        self._background_tasks.add(task)
        task.add_done_callback(self._background_tasks.discard)
    return

Tests

tests/gateway/test_telegram_polling_conflict_retry.py — 4 regression tests:

  • test_conflict_retry_start_polling_failure_schedules_network_reconnect — core fix validation
  • test_conflict_retry_start_polling_failure_does_not_set_fatal_immediately — confirms transient path
  • test_conflict_retry_start_polling_failure_leaves_adapter_alive — adapter stays running
  • test_successful_conflict_retry_resets_conflict_count — conflict count resets on success

All 4 pass. Run: pytest tests/gateway/test_telegram_polling_conflict_retry.py -v

Changed files

  • gateway/platforms/telegram.py (modified, +10/-2)
  • tests/gateway/test_telegram_polling_conflict_retry.py (added, +177/-0)

Code Example

mock_app.updater.start_polling = AsyncMock(side_effect=Exception("Timed out"))
RAW_BUFFERClick to expand / collapse

Bug Description

The Telegram gateway can enter an alive but deaf state after a polling conflict recovery attempt fails.

In the observed failure, the gateway process stayed alive under systemd, but Telegram polling no longer recovered. Incoming Telegram messages stopped being handled until the gateway service was manually restarted.

Root Cause Hypothesis

In gateway/platforms/telegram.py, the Telegram adapter handles polling conflicts by retrying updater.start_polling() after a conflict/network issue. If that conflict-retry start_polling() call itself times out or raises, the exception path is logged but does not reliably re-enter the existing network reconnect ladder.

That leaves the process waiting for another polling callback/error signal that may never arrive, so the gateway remains running but effectively stops listening.

Expected Behavior

If start_polling() fails during conflict recovery, the gateway should treat that as a transient polling/network failure and continue through the existing reconnect ladder. If recovery keeps failing, it should escalate to the existing retryable/fatal path so systemd or the gateway runner can restart it cleanly.

Actual Behavior

A failed start_polling() inside the conflict-retry path can be swallowed/logged without scheduling the next recovery step. The service remains active, but polling is dead until manual restart.

Reproduction / Regression Shape

A regression test can simulate the failure by making the conflict-retry start_polling() raise, for example:

mock_app.updater.start_polling = AsyncMock(side_effect=Exception("Timed out"))

Then assert that the Telegram platform schedules/enters _handle_polling_network_error(retry_err) or equivalent reconnect handling, rather than returning to an idle state.

Suggested Fix

When the conflict-retry start_polling() call fails, schedule the existing network reconnect handler as a background task, e.g. route the exception into _handle_polling_network_error(retry_err) and track the task in _background_tasks.

This keeps the behavior consistent with other polling/network failures: retry, reconnect, then escalate if unrecoverable.

Observed Environment

  • Hermes gateway running as systemd user services
  • Telegram platform adapter
  • Multi-profile gateway setup (hermes-gateway.service, hermes-gateway-byte.service, etc.)
  • Symptom: systemd reports service active, but Telegram bot stops responding until restart

Labels

Suggested: type/bug, comp/gateway, platform/telegram

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Telegram gateway can stay alive with dead polling after conflict retry start_polling failure [2 pull requests, 1 participants]