hermes - ✅(Solved) Fix [Bug] Platform initialization failure blocks other platforms (Telegram blocks Feishu) [3 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17242Fetched 2026-04-30 06:48:57
View on GitHub
Comments
2
Participants
3
Timeline
17
Reactions
0
Timeline (top)
labeled ×5cross-referenced ×3commented ×2mentioned ×2

Error Message

try: init_telegram() except Exception as e: logger.warning(f"Telegram init failed: {e}, continuing...")

Root Cause

Root Cause (Suspected)

Fix Action

Fix / Workaround

Workaround

PR fix notes

PR #17270: fix(gateway): isolate platform connect failures

Description (problem / solution / changelog)

What does this PR do?

Fixes gateway startup isolation so one blocked platform connection does not prevent later platforms from starting.

Platform adapter startup and reconnect attempts now use a bounded per-platform connect timeout. If Telegram hangs during initialization, it is treated as a retryable platform failure and queued for reconnect, while Feishu and other configured platforms continue starting normally.

The timeout defaults to 30 seconds and can be configured with HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT. Setting it to 0 disables the timeout.

Related Issue

Fixes #17242

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • Added a bounded platform adapter connect helper in gateway/run.py.
  • Routed gateway startup platform connections through the timeout helper.
  • Routed platform reconnect attempts through the same timeout helper.
  • Added tests covering startup continuation when Telegram connect times out and Feishu still connects.

How to Test

  1. Run:
    uv run --with pytest --with pytest-asyncio python -m pytest tests/gateway/test_platform_reconnect.py -q -o addopts=''
  2. Verify the startup isolation test passes.
  3. Optionally set HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT=0.001 to verify hanging adapter connections become retryable timeout failures.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: Ubuntu/Linux

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

16 passed in 2.86s

## Changed files

- `gateway/run.py` (modified, +30/-2)
- `tests/gateway/test_platform_reconnect.py` (modified, +88/-2)


---

# PR #17383: fix(gateway): add per-platform connect timeout to prevent blocking (#17242)

- Repository: NousResearch/hermes-agent
- Author: vominh1919
- State: closed | merged: False
- Link: https://github.com/NousResearch/hermes-agent/pull/17383

## Description (problem / solution / changelog)

## Fixes #17242: Platform initialization failure blocks other platforms

### Problem

When multiple messaging platforms are configured (e.g., Telegram + Feishu/Lark), if one platform's `connect()` call hangs or takes very long, it blocks initialization of all subsequent platforms.

**Real-world scenario:** Telegram's retry loop (8 attempts × 15s exponential backoff = up to 120s) can prevent Feishu from ever starting — common in regions where Telegram is network-restricted.

### Root Cause

The startup loop in `GatewayRunner.start()` iterates platforms sequentially and awaits each `adapter.connect()` without a timeout:

```python
for platform, platform_config in self.config.platforms.items():
    success = await adapter.connect()  # can hang indefinitely

Fix

Wrap each platform's connect() call in asyncio.wait_for() with a 90-second timeout:

_platform_connect_timeout = 90  # seconds
success = await asyncio.wait_for(adapter.connect(), timeout=_platform_connect_timeout)

On timeout:

  • The platform is logged as timed out
  • Resources are cleaned up via _safe_adapter_disconnect()
  • The platform is queued for background reconnection (just like other transient failures)
  • The next platform starts immediately

Why 90 seconds?

  • Telegram's 8-retry loop can take up to ~120s in the worst case
  • 90s is generous enough for slow networks but prevents indefinite blocking
  • The timeout is applied per-platform, so total startup time = min(90s, connect_time) × num_platforms

Testing

  • Syntax verified with ast.parse()
  • Timeout handler is consistent with existing error handling (same cleanup, same reconnection queue)

Discussed in: #17242

Changed files

  • gateway/run.py (modified, +28/-2)

PR #17429: fix(gateway): isolate platform connect failures with per-platform timeout (#17242)

Description (problem / solution / changelog)

Salvage of #17270 by @tmimmanuel.

One platform's slow/hanging adapter.connect() no longer blocks initialization of the others. Telegram's 8-retry connect loop (~140s worst case in network-restricted regions like China) previously prevented Feishu/Lark from ever starting when Telegram was unreachable — users had to comment Telegram out to get any other platform working.

How

Wrap each adapter.connect() call in asyncio.wait_for() via a new helper, _connect_adapter_with_timeout(). Used at both the startup loop and the reconnect watcher, so a platform that stalls mid-retry also does not stall retries for the others. On timeout the platform is queued for background reconnection like any other transient failure.

  • Default timeout: 30s
  • Override: HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT env var
  • Disable: set to 0
  • Invalid values log a warning and fall back to the default

Changes

  • gateway/run.py: add _platform_connect_timeout_secs() + _connect_adapter_with_timeout(); route both startup and reconnect through it.
  • tests/gateway/test_platform_reconnect.py: two new tests — one proving startup continues past a Telegram timeout and still connects Feishu, one proving the helper raises TimeoutError on hang.
  • scripts/release.py: AUTHOR_MAP entry for tmimmanuel.

Validation

  • scripts/run_tests.sh tests/gateway/test_platform_reconnect.py → 16/16 passed
  • E2E with a HangAdapter:
    • HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT=0.2 → bounded at 0.2s, raises TimeoutError
    • HERMES_GATEWAY_PLATFORM_CONNECT_TIMEOUT=0 → timeout disabled, connect runs unbounded
    • Fast-connecting adapter returns True under any positive timeout

Salvage notes

Cherry-picked tmimmanuel's gateway/run.py + tests/gateway/test_platform_reconnect.py changes onto current main. Dropped an unrelated cron/scheduler.py portion that was a rebase artifact from their earlier #17139 branch. Also closes #17383 by @vominh1919, which proposed the same timeout approach without tests and only on the startup path (missing the reconnect-watcher site).

Closes #17242

Changed files

  • gateway/run.py (modified, +30/-2)
  • scripts/release.py (modified, +1/-0)
  • tests/gateway/test_platform_reconnect.py (modified, +88/-2)

Code Example

telegram:
          channel_prompts: {}
        feishu:
          app_id: "cli_xxx"
          app_secret: "xxx"

---

gateway:
       fail_fast: false  # Default: false - don't block on platform init failure

---

try:
         init_telegram()
     except Exception as e:
         logger.warning(f"Telegram init failed: {e}, continuing...")
RAW_BUFFERClick to expand / collapse
 ## Bug Description

 When multiple messaging platforms are configured (e.g., Telegram + Feishu/Lark), if one platform
 fails to initialize (e.g., Telegram can't connect due to network issues), it blocks the
 initialization of other platforms that are correctly configured.

 ## Steps to Reproduce

 1. Configure both Telegram and Feishu in `~/.hermes/config.yaml` (or via environment variables)
    ```yaml
    telegram:
      channel_prompts: {}
    feishu:
      app_id: "cli_xxx"
      app_secret: "xxx"
    ```
 2. Start Hermes Gateway: `hermes gateway`
 3. Telegram fails to connect (common in China due to network restrictions)
 4. **Expected**: Feishu should still work normally
 5. **Actual**: Feishu also doesn't work - no response to messages

 ## Root Cause (Suspected)

 The gateway appears to initialize platforms sequentially, and a failure in one platform (like
 Telegram connection timeout) blocks or delays the initialization of subsequent platforms.

 ## Workaround

 Currently, users must:
 1. Comment out or remove the Telegram configuration
 2. Restart the gateway
 3. Feishu then works correctly

 ## Expected Behavior

 - Platform initialization should be **independent**
 - One platform's failure should **NOT block** other platforms
 - Gateway should:
   - Log a warning for the failed platform
   - Continue initializing other platforms
   - Start successfully even if some platforms fail

 ## Suggested Fix

 Add a configuration option like:
 ```yaml
 gateway:
   fail_fast: false  # Default: false - don't block on platform init failure
 ```

 Or handle platform initialization errors gracefully:
 ```python
 try:
     init_telegram()
 except Exception as e:
     logger.warning(f"Telegram init failed: {e}, continuing...")
 ```

 ## Environment

 - **Hermes version**: (run `hermes --version` to check)
 - **OS**: WSL / Linux
 - **Affected platforms**: Telegram (fails) → Feishu/Lark (blocked)
 - **Network**: China (Telegram blocked by GFW)

 ## Related Issues

 - #16586 - Stale gateway lock files blocking platforms
 - #1526 - Telegram connection fails intermittently
 - #16376 - macOS stale Telegram token lock

 ---

 **Additional context**: This is a common issue for users in regions where certain platforms are
 restricted. The gateway should be resilient to individual platform failures.

extent analysis

TL;DR

To resolve the issue, consider adding a configuration option to prevent platform initialization failures from blocking other platforms, such as setting fail_fast to false or handling initialization errors gracefully.

Guidance

  • Review the ~/.hermes/config.yaml file to ensure that all platform configurations are correct and properly formatted.
  • Consider implementing a try-except block in the platform initialization code to catch and log exceptions, allowing the gateway to continue initializing other platforms.
  • Test the suggested fix by adding the fail_fast: false configuration option and verifying that the gateway initializes all platforms independently, even if one platform fails.
  • Investigate the related issues (#16586, #1526, #16376) to ensure that the solution does not introduce any regressions.

Example

try:
    init_telegram()
except Exception as e:
    logger.warning(f"Telegram init failed: {e}, continuing...")

Notes

The suggested fix assumes that the issue is caused by the sequential initialization of platforms and that handling errors gracefully will resolve the issue. However, the root cause may be more complex, and additional debugging may be required.

Recommendation

Apply the workaround by adding the fail_fast: false configuration option or handling platform initialization errors gracefully, as this is likely to resolve the issue and provide a more resilient gateway.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING