hermes - 💡(How to fix) Fix Cron Telegram live-adapter delivery can silently drop messages after reconnect storms

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

[Telegram] Telegram network error, scheduling reconnect: Bad Gateway [Telegram] Telegram network error (attempt 1/10), reconnecting in 5s. Error: Bad Gateway telegram.error.TimedOut: Timed out

Root Cause

This is not a complete upstream fix because some platforms may need a live adapter (for example E2EE-only Matrix/Signal paths), but it suggests Telegram cron delivery should not blindly trust a long-lived adapter that has survived repeated reconnect errors.

Fix Action

Workaround

Locally, cron delivery was patched to skip the live-adapter branch for Telegram and always use the standalone path. Since standalone delivery is already used by send_message tool calls, this restored cron Telegram delivery in the affected stack.

This is not a complete upstream fix because some platforms may need a live adapter (for example E2EE-only Matrix/Signal paths), but it suggests Telegram cron delivery should not blindly trust a long-lived adapter that has survived repeated reconnect errors.

Code Example

jobs.json: last_status = ok
jobs.json: last_delivery_error = null
~/.hermes/cron/output/{job_id}/...md contains non-empty output
INFO cron.scheduler: Job '{job_id}': delivered to telegram:CHAT_ID via live adapter

---

[Telegram] Telegram network error, scheduling reconnect: Bad Gateway
[Telegram] Telegram network error (attempt 1/10), reconnecting in 5s. Error: Bad Gateway
telegram.error.TimedOut: Timed out

---

WARNING cron.scheduler: DIAG cron-deliver job=JOB_ID plat=telegram chat=CHAT_ID
  adapter='TelegramAdapter' loop_running=True text_len=890 skip_live=False
WARNING cron.scheduler: DIAG live-adapter-result job=JOB_ID type=SendResult
  repr=SendResult(success=True, message_id='1245', error=None,
                  raw_response={'message_ids': ['1245']},
                  retryable=False, continuation_message_ids=())
  success_attr=True
INFO cron.scheduler: Job 'JOB_ID': delivered to telegram:CHAT_ID via live adapter

---

from gateway.config import Platform, load_gateway_config
from tools.send_message_tool import _send_to_platform

cfg = load_gateway_config()
pconfig = cfg.platforms.get(Platform.TELEGRAM)
result = await _send_to_platform(Platform.TELEGRAM, pconfig, "CHAT_ID", "ping")
# {'success': True, 'platform': 'telegram', 'chat_id': '...', 'message_id': '1243'}
RAW_BUFFERClick to expand / collapse

Bug Description

Scheduled cron jobs with deliver: telegram:CHAT_ID can stop arriving in Telegram after the gateway has been running through sustained Telegram reconnect storms (Bad Gateway / TimedOut). The scheduler still records the delivery as successful:

  • jobs.json shows last_status: ok and last_delivery_error: null
  • the cron output file contains a full, non-empty response
  • scheduler logs say the job was delivered to telegram:CHAT_ID via live adapter
  • but the Telegram message never reaches the user

Restarting the gateway consistently restores delivery. This points at the long-running TelegramAdapter / python-telegram-bot client entering a bad state after reconnect loops, while cron's live-adapter branch still treats sends as successful.

Affected Components

  • cron/scheduler.py_deliver_result, live-adapter branch using runtime_adapter.send(...) via asyncio.run_coroutine_threadsafe
  • gateway/platforms/telegram.pyTelegramAdapter.send
  • python-telegram-bot 22.7

Observed Behavior

Multiple cron jobs configured with deliver: telegram:... were affected at once, so this does not appear to be job-specific.

Typical evidence from the broken state:

jobs.json: last_status = ok
jobs.json: last_delivery_error = null
~/.hermes/cron/output/{job_id}/...md contains non-empty output
INFO cron.scheduler: Job '{job_id}': delivered to telegram:CHAT_ID via live adapter

Actual result: no Telegram message arrives.

The condition appears after reconnect bursts like:

[Telegram] Telegram network error, scheduling reconnect: Bad Gateway
[Telegram] Telegram network error (attempt 1/10), reconnecting in 5s. Error: Bad Gateway
telegram.error.TimedOut: Timed out

gateway_state.json continues to report platforms.telegram.state == "connected"; its updated_at can remain frozen at the last successful state transition, usually gateway startup.

Expected Behavior

If TelegramAdapter.send() returns SendResult(success=True, message_id="1234"), the message should actually be delivered to the configured chat.

If the live adapter is unhealthy or Telegram refuses/drops the send, the adapter should surface a failure so cron can either:

  1. fall through to the standalone delivery path, or
  2. record a delivery error / retryable delivery failure instead of marking the run as successfully delivered.

Diagnostic Evidence

After a fresh gateway restart, the same manually triggered cron job delivered successfully and diagnostic logging around the live-adapter call showed:

WARNING cron.scheduler: DIAG cron-deliver job=JOB_ID plat=telegram chat=CHAT_ID
  adapter='TelegramAdapter' loop_running=True text_len=890 skip_live=False
WARNING cron.scheduler: DIAG live-adapter-result job=JOB_ID type=SendResult
  repr=SendResult(success=True, message_id='1245', error=None,
                  raw_response={'message_ids': ['1245']},
                  retryable=False, continuation_message_ids=())
  success_attr=True
INFO cron.scheduler: Job 'JOB_ID': delivered to telegram:CHAT_ID via live adapter

That message arrived. In the broken state before restart, the same job and configuration had repeatedly reported last_status: ok with no delivery.

A standalone send in a separate process using the same bot token, chat id, and platform config succeeded while the cron/live-adapter path was the suspected failure point:

from gateway.config import Platform, load_gateway_config
from tools.send_message_tool import _send_to_platform

cfg = load_gateway_config()
pconfig = cfg.platforms.get(Platform.TELEGRAM)
result = await _send_to_platform(Platform.TELEGRAM, pconfig, "CHAT_ID", "ping")
# {'success': True, 'platform': 'telegram', 'chat_id': '...', 'message_id': '1243'}

The standalone message arrived. The later live-adapter cron delivery reported a nearby message_id, confirming the same bot/chat backend was being used.

Workaround

Locally, cron delivery was patched to skip the live-adapter branch for Telegram and always use the standalone path. Since standalone delivery is already used by send_message tool calls, this restored cron Telegram delivery in the affected stack.

This is not a complete upstream fix because some platforms may need a live adapter (for example E2EE-only Matrix/Signal paths), but it suggests Telegram cron delivery should not blindly trust a long-lived adapter that has survived repeated reconnect errors.

Suspected Root Cause

After sustained Bad Gateway / TimedOut reconnect loops, the python-telegram-bot Bot instance held by TelegramAdapter._bot may enter a wedged state where bot.send_message() returns a Message object (so TelegramAdapter.send returns SendResult(success=True, message_id=...)), but the message is not transmitted in a way that reaches the recipient.

The gateway's own state machine still reports Telegram as connected because polling/reconnect state and send-path health are not independently verified.

Possible mechanisms:

  1. PTB/httpx client is wedged on a stale connection and incorrectly reports success.
  2. Polling/getUpdates recovers but sendMessage is not healthy.
  3. The request is accepted against an unexpected chat/topic context, though a standalone probe with the same chat id worked.

Suggested Fix Directions

In order of increasing intrusiveness:

  1. Add periodic Telegram adapter health checks (getMe() or a configured debug-channel self-send) and force a full adapter reconnect/rebuild if checks fail.
  2. Count consecutive Bad Gateway / TimedOut reconnect errors. After a threshold, discard and recreate the PTB Bot and Application objects rather than reusing the same client.
  3. In cron delivery, prefer the standalone Telegram path over the live adapter unless the platform explicitly requires live-adapter semantics.
  4. At minimum, fall through if SendResult lacks raw_response, message_id, or other strong delivery evidence. This will not catch the observed real-looking message_id case, but it is still defensive.

Steps to Reproduce

This is non-deterministic and depends on Telegram/network instability:

  1. Run hermes-gateway continuously for several days with Telegram enabled.
  2. Let Telegram encounter repeated Bad Gateway / timeout reconnect bursts, or simulate intermittent outbound HTTPS failures to api.telegram.org.
  3. Trigger any cron job with deliver: telegram:CHAT_ID.
  4. Observe that cron reports successful live-adapter delivery but the message does not arrive.
  5. Restart the gateway.
  6. Trigger the same cron job again; delivery resumes.

Environment

  • hermes-agent commit b833d8501 / tag v2026.5.7
  • Python 3.11.2
  • python-telegram-bot 22.7
  • Debian 12, Linux 6.1.0-44 amd64
  • Gateway managed by systemd as a non-root user
  • Telegram was the only configured messaging platform in the affected stack

Related / Not Duplicates

Existing related issues cover nearby symptoms but not this exact false-success live-adapter failure mode:

  • #3173 — Telegram Bad Gateway reconnect loop can make the gateway unresponsive. This issue differs because the gateway and cron keep running, but live-adapter cron delivery silently drops while reporting success.
  • #13566 / #8846 — delivery retry/status separation for transient failures. Useful follow-up, but this issue is specifically that cron never sees a failure.
  • #20915 — standalone send_message_tool._send_telegram lacking Telegram fallback transport on blocked networks. This report is the inverse: standalone send works, live adapter is suspected broken.
  • #22773 / #17139 — Telegram cron routing/target-resolution issues. Here the target resolves and scheduler logs successful live-adapter delivery.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Cron Telegram live-adapter delivery can silently drop messages after reconnect storms