hermes - ✅(Solved) Fix Gateway exits when Telegram disconnects, killing embedded cron ticker [1 pull requests, 1 comments, 1 participants]

Twsa · 2026-04-17T13:10:32Z

[hermes] PR 11691: fix gateway : keep cron alive during reconnect backoff - Repository: NousResearch/hermes-agent - Author: LeonSGP43 - State: open | merged: F… # PR #11691: fix(gateway): keep cron alive during reconnect backoff - Repository: NousResearch/hermes-agent - Author: LeonSGP43 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/11691 ## Description (problem / solution / changelog) ## What does this PR do? Keeps the gateway process alive when the last connected platform fails with a retryable fatal error and has already been queued for background reconnection. This preserves the embedded cron ticker while `_platform_reconnect_watcher` handles reconnect backoff. ## Related Issue Fixes #11614 ## Type of Change - [x] 🐛 Bug fix (non-breaking change that fixes an issue) - [x] ✅ Tests (adding or improving test coverage) ## Changes Made - remove the forced shutdown path in `gateway/run.py` when all adapters are down but platforms remain queued for reconnect - keep the existing reconnect watcher responsible for recovering the failed platform in the background - update `tests/gateway/test_platform_reconnect.py` to assert the gateway stays alive and preserves the reconnect queue instead of stopping immediately ## How to Test 1. Reproduce the retryable fatal-error path where the final connected platform disconnects and is placed into `_failed_platforms`. 2. Run `./.venv/bin/python -m pytest -o addopts='' tests/gateway/test_platform_reconnect.py -q`. 3. Confirm the reconnect test asserts `runner.stop()` is not called and the gateway remains alive while reconnect backoff continues. ## Notes The issue already points to `_platform_reconnect_watcher` as the intended recovery path. This patch keeps that behavior active instead of exiting the process and interrupting embedded cron jobs during the reconnect window. ## Changed files - `gateway/run.py` (modified, +8/-16) - `tests/gateway/test_platform_reconnect.py` (modified, +8/-7) - `tests/gateway/test_runner_fatal_adapter.py` (modified, +5/-4) ## Fixed - Fixed by PR: fix(gateway): keep cron alive during reconnect backoff (https://github.com/NousResearch/hermes-agent/pull/11691) ## Bug Description When Telegram connection fails after all retry attempts, the Gateway calls `await self.stop()` and exits. This kills the embedded cron ticker (run.py ~9590), causing scheduled cron jobs to miss execution during the restart window. ## Root Cause `gateway/run.py` lines 1051-1058: when all platforms fail with a `retryable` error, the process exits immediately instead of letting `_platform_reconnect_watcher` handle reconnection. ## Fix Applied Removed the `if adapter.fatal_error_retryable: ... await self.stop()` branch. Both retryable and non-retryable errors now stay alive — `_platform_reconnect_watcher` handles reconnection with exponential backoff (30s → 60s → 120s → 240s → 300s cap) while the cron ticker continues running. **Changed in** `gateway/run.py` lines 1047-1058: - Before: retryable errors → `await self.stop()` → process exits, cron killed - After: all errors → warning log → stay alive, cron keeps running ## Environment - Hermes Agent (NousResearch/hermes-agent) - Gateway running as systemd service with `Restart=on-failure` - Single platform: Telegram only

hermes2026-04-17 13:10:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#11614•Fetched 2026-04-18 05:59:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Twsa

Participants

Twsa

Timeline (top)

commented ×1cross-referenced ×1

Error Message

gateway/run.py lines 1051-1058: when all platforms fail with a retryable error, the process exits immediately instead of letting _platform_reconnect_watcher handle reconnection.

Root Cause

gateway/run.py lines 1051-1058: when all platforms fail with a retryable error, the process exits immediately instead of letting _platform_reconnect_watcher handle reconnection.

Fix Action

Fixed

Fixed by PR: fix(gateway): keep cron alive during reconnect backoff (https://github.com/NousResearch/hermes-agent/pull/11691)

PR fix notes

PR #11691: fix(gateway): keep cron alive during reconnect backoff

Repository: NousResearch/hermes-agent
Author: LeonSGP43
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/11691

Description (problem / solution / changelog)

What does this PR do?

Keeps the gateway process alive when the last connected platform fails with a retryable fatal error and has already been queued for background reconnection. This preserves the embedded cron ticker while _platform_reconnect_watcher handles reconnect backoff.

Related Issue

Fixes #11614

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Changes Made

remove the forced shutdown path in gateway/run.py when all adapters are down but platforms remain queued for reconnect
keep the existing reconnect watcher responsible for recovering the failed platform in the background
update tests/gateway/test_platform_reconnect.py to assert the gateway stays alive and preserves the reconnect queue instead of stopping immediately

How to Test

Reproduce the retryable fatal-error path where the final connected platform disconnects and is placed into _failed_platforms.
Run ./.venv/bin/python -m pytest -o addopts='' tests/gateway/test_platform_reconnect.py -q.
Confirm the reconnect test asserts runner.stop() is not called and the gateway remains alive while reconnect backoff continues.

Notes

The issue already points to _platform_reconnect_watcher as the intended recovery path. This patch keeps that behavior active instead of exiting the process and interrupting embedded cron jobs during the reconnect window.

Changed files

gateway/run.py (modified, +8/-16)
tests/gateway/test_platform_reconnect.py (modified, +8/-7)
tests/gateway/test_runner_fatal_adapter.py (modified, +5/-4)

RAW_BUFFERClick to expand / collapse

Bug Description

When Telegram connection fails after all retry attempts, the Gateway calls await self.stop() and exits. This kills the embedded cron ticker (run.py ~9590), causing scheduled cron jobs to miss execution during the restart window.

Root Cause

gateway/run.py lines 1051-1058: when all platforms fail with a retryable error, the process exits immediately instead of letting _platform_reconnect_watcher handle reconnection.

Fix Applied

Removed the if adapter.fatal_error_retryable: ... await self.stop() branch. Both retryable and non-retryable errors now stay alive — _platform_reconnect_watcher handles reconnection with exponential backoff (30s → 60s → 120s → 240s → 300s cap) while the cron ticker continues running.

Changed in gateway/run.py lines 1047-1058:

Before: retryable errors → await self.stop() → process exits, cron killed
After: all errors → warning log → stay alive, cron keeps running

Environment

Hermes Agent (NousResearch/hermes-agent)
Gateway running as systemd service with Restart=on-failure
Single platform: Telegram only

extent analysis

TL;DR

Remove the if adapter.fatal_error_retryable: ... await self.stop() branch in gateway/run.py to prevent the process from exiting when all platforms fail with a retryable error.

Guidance

Identify the lines of code responsible for the issue (1051-1058 in gateway/run.py) and verify that the if adapter.fatal_error_retryable branch is removed.
Ensure that the _platform_reconnect_watcher is handling reconnection with exponential backoff as intended.
Test the Gateway's behavior when all platforms fail with a retryable error to confirm that the process stays alive and the cron ticker continues running.
Review the systemd service configuration to ensure that the Restart=on-failure setting is still appropriate given the changes made to the Gateway's error handling.

Example

No code snippet is provided as the issue already includes the necessary information about the code changes made.

Notes

This fix assumes that the _platform_reconnect_watcher is correctly implemented to handle reconnection with exponential backoff. If this is not the case, additional changes may be necessary.

Recommendation

Apply the workaround by removing the if adapter.fatal_error_retryable: ... await self.stop() branch, as this allows the Gateway to stay alive and the cron ticker to continue running even when all platforms fail with a retryable error.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#permission error #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Gateway exits when Telegram disconnects, killing embedded cron ticker [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #11691: fix(gateway): keep cron alive during reconnect backoff

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Notes

Changed files

Bug Description

Root Cause

Fix Applied

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Gateway exits when Telegram disconnects, killing embedded cron ticker [1 pull requests, 1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #11691: fix(gateway): keep cron alive during reconnect backoff

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Notes

Changed files

Bug Description

Root Cause

Fix Applied

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING