hermes - ✅(Solved) Fix QQ Bot adapter silently stops reconnecting without notifying gateway [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14539Fetched 2026-04-24 06:16:36
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×1referenced ×1

When the QQ Bot WebSocket gateway connection drops and cannot be re-established within MAX_RECONNECT_ATTEMPTS (100 attempts), the _listen_loop() in gateway/platforms/qqbot/adapter.py calls return silently — it does not notify the GatewayRunner that the adapter has died.

This causes the gateway to believe the QQ platform is still "connected" (the last _mark_connected() state persists), while the adapter is actually dead. The gateway process keeps running, but no messages are received or sent via QQ.

The systemd Restart=on-failure policy never triggers because the process does not exit.

Error Message

  1. _listen_loop() returns without error — adapter is dead but gateway does not know logger.error("[%s] Max reconnect attempts reached", self._log_tag)

Root Cause

In gateway/platforms/qqbot/adapter.py, _listen_loop():

if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
    logger.error("[%s] Max reconnect attempts reached", self._log_tag)
    return  # ← Silent return, gateway runner not notified

No call to _set_fatal_error() or equivalent to propagate the failure upward.

Fix Action

Fixed

PR fix notes

PR #14565: fix(qqbot): notify gateway via _set_fatal_error when reconnect loop exhausts

Description (problem / solution / changelog)

What does this PR do?

`_listen_loop()` in `gateway/platforms/qqbot/adapter.py` has three paths where `MAX_RECONNECT_ATTEMPTS` is reached and the loop exits via `return` — but none of them call `_set_fatal_error()` or `_notify_fatal_error()`. The gateway runner keeps the platform state as the last `_mark_connected()` value (connected), while the adapter is silently dead. Because the gateway process itself does not exit, `systemd Restart=on-failure` never fires.

Three exhaustion paths are affected:

  • 4008 rate-limit branch — bare `return` with no `logger.error`, no `_set_fatal_error()`, no `_notify_fatal_error()`
  • QQCloseError general branch — has `logger.error` but no `_set_fatal_error()` or `_notify_fatal_error()`
  • Exception branch — has `logger.error` but no `_set_fatal_error()` or `_notify_fatal_error()`

Fix: add `_set_fatal_error("qq_reconnect_exhausted", ..., retryable=False)` followed by `await self._notify_fatal_error()` to all three paths. `_set_fatal_error()` alone only writes to the status file — `_notify_fatal_error()` is required to invoke the GatewayRunner's `_handle_adapter_fatal_error` handler, which disconnects the adapter and triggers `systemd Restart=on-failure`. This matches the pattern already used in the Telegram adapter (`gateway/platforms/telegram.py` lines 366, 460).

Related Issue

Fixes #14539

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor
  • 🎯 New skill

Changes Made

  • `gateway/platforms/qqbot/adapter.py`: add `_set_fatal_error()` + `await _notify_fatal_error()` before each of the three `return` points in `_listen_loop()` (+18 lines)
  • `tests/gateway/test_qqbot.py`: add `TestListenLoopReconnectExhaustion` with one test per exhaustion path, each asserting both `has_fatal_error` and `_notify_fatal_error` was awaited (+100 lines)

How to Test

Requires a live QQ Bot setup. Unit tests cover all three paths:

```bash pytest tests/gateway/test_qqbot.py::TestListenLoopReconnectExhaustion -v ```

All 3 tests pass.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run `pytest tests/gateway/test_qqbot.py -v` and pre-existing failures are unrelated to this change (they require `pytest-asyncio` which is in dev deps)
  • I've added tests for my changes
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • I've updated relevant documentation — or N/A
  • I've updated `cli-config.yaml.example` — or N/A
  • I've updated `CONTRIBUTING.md` or `AGENTS.md` — or N/A
  • I've considered cross-platform impact — or N/A
  • I've updated tool descriptions/schemas — or N/A

Changed files

  • gateway/platforms/qqbot/adapter.py (modified, +19/-0)
  • tests/gateway/test_qqbot.py (modified, +99/-0)

Code Example

if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
    logger.error("[%s] Max reconnect attempts reached", self._log_tag)
    return  # ← Silent return, gateway runner not notified
RAW_BUFFERClick to expand / collapse

Description

When the QQ Bot WebSocket gateway connection drops and cannot be re-established within MAX_RECONNECT_ATTEMPTS (100 attempts), the _listen_loop() in gateway/platforms/qqbot/adapter.py calls return silently — it does not notify the GatewayRunner that the adapter has died.

This causes the gateway to believe the QQ platform is still "connected" (the last _mark_connected() state persists), while the adapter is actually dead. The gateway process keeps running, but no messages are received or sent via QQ.

The systemd Restart=on-failure policy never triggers because the process does not exit.

Steps to reproduce

  1. Run Hermes gateway with QQ bot enabled
  2. QQ WebSocket disconnects (network block or server-side disconnect)
  3. Reconnect attempts exhaust after ~100 × 60s backoff loop (~100 minutes)
  4. _listen_loop() returns without error — adapter is dead but gateway does not know
  5. Gateway shows qqbot: connected in gateway_state.json despite being dead

Expected behavior

When the QQ adapter reconnect loop exhausts, it should call self._set_fatal_error() or otherwise notify the GatewayRunner so that:

  • The platform state is marked as disconnected or fatal
  • The gateway runner marks the adapter as dead
  • (Optionally) the gateway exits so systemd can restart it

Root cause

In gateway/platforms/qqbot/adapter.py, _listen_loop():

if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
    logger.error("[%s] Max reconnect attempts reached", self._log_tag)
    return  # ← Silent return, gateway runner not notified

No call to _set_fatal_error() or equivalent to propagate the failure upward.

Environment

  • Hermes Agent version: main (latest, b7e71fb7)
  • QQ Bot adapter version: 1.1.0
  • Platform: Linux, systemd user service
  • QQ Bot WebSocket reconnect backoff: [2, 5, 10, 30, 60] seconds
  • MAX_RECONNECT_ATTEMPTS: 100

extent analysis

TL;DR

The QQ Bot WebSocket gateway connection issue can be fixed by modifying the _listen_loop() function in gateway/platforms/qqbot/adapter.py to call _set_fatal_error() when the maximum reconnect attempts are exhausted.

Guidance

  • Modify the _listen_loop() function to call _set_fatal_error() when backoff_idx >= MAX_RECONNECT_ATTEMPTS to notify the GatewayRunner of the adapter's failure.
  • Verify that the GatewayRunner correctly marks the adapter as dead and updates the platform state to disconnected or fatal after the modification.
  • Consider adding a check to ensure that the gateway exits after the adapter is marked as dead, allowing systemd to restart the process.
  • Review the systemd configuration to ensure that the Restart=on-failure policy is correctly triggered when the gateway process exits.

Example

if backoff_idx >= MAX_RECONNECT_ATTEMPTS:
    logger.error("[%s] Max reconnect attempts reached", self._log_tag)
    self._set_fatal_error()  # Notify GatewayRunner of adapter failure
    return

Notes

The provided solution assumes that the _set_fatal_error() method is correctly implemented to notify the GatewayRunner and update the platform state. Additional modifications may be necessary to ensure that the gateway exits and systemd restarts the process correctly.

Recommendation

Apply the workaround by modifying the _listen_loop() function to call _set_fatal_error() when the maximum reconnect attempts are exhausted, as this will allow the GatewayRunner to correctly mark the adapter as dead and update the platform state.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the QQ adapter reconnect loop exhausts, it should call self._set_fatal_error() or otherwise notify the GatewayRunner so that:

  • The platform state is marked as disconnected or fatal
  • The gateway runner marks the adapter as dead
  • (Optionally) the gateway exits so systemd can restart it

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING