hermes - 💡(How to fix) Fix [Bug]: QQ Bot WebSocket silently dies — no reconnection after heartbeat ACK timeout

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

The QQ Bot WebSocket adapter has a silent death issue: after some time (typically several minutes), the connection stops working without any error logs or reconnection attempts. The gateway process remains alive, but the QQ bot appears offline from the user's perspective.

  1. No WebSocket-level keepalive: _open_ws() calls aiohttp.ClientSession.ws_connect() without the heartbeat parameter. When network issues occur (NAT timeout, firewall state table expiry, etc.), the TCP connection appears alive but packets are silently dropped. The _read_events() await self._ws.receive() call hangs indefinitely on this half-open connection — no error, no reconnection.
  2. QQ bot goes offline silently — no error logs, no reconnection attempts No subsequent error/warning/reconnect log entries — the adapter simply stopped.

Root Cause

Two related issues in gateway/platforms/qqbot/adapter.py:

  1. No WebSocket-level keepalive: _open_ws() calls aiohttp.ClientSession.ws_connect() without the heartbeat parameter. When network issues occur (NAT timeout, firewall state table expiry, etc.), the TCP connection appears alive but packets are silently dropped. The _read_events() await self._ws.receive() call hangs indefinitely on this half-open connection — no error, no reconnection.

  2. No heartbeat ACK tracking: The _heartbeat_loop() sends op 1 heartbeats at 80% of the server-specified interval but never tracks op 11 (Heartbeat ACK) responses. When the server stops responding to heartbeats (because it already closed the connection), the client keeps sending heartbeats into the void without detecting the failure.

Code Example

2026-05-08 09:17:42,717 INFO gateway.platforms.base: [QQBot] Sending response (73 chars) to ...
RAW_BUFFERClick to expand / collapse

Describe the bug

The QQ Bot WebSocket adapter has a silent death issue: after some time (typically several minutes), the connection stops working without any error logs or reconnection attempts. The gateway process remains alive, but the QQ bot appears offline from the user's perspective.

Root cause

Two related issues in gateway/platforms/qqbot/adapter.py:

  1. No WebSocket-level keepalive: _open_ws() calls aiohttp.ClientSession.ws_connect() without the heartbeat parameter. When network issues occur (NAT timeout, firewall state table expiry, etc.), the TCP connection appears alive but packets are silently dropped. The _read_events() await self._ws.receive() call hangs indefinitely on this half-open connection — no error, no reconnection.

  2. No heartbeat ACK tracking: The _heartbeat_loop() sends op 1 heartbeats at 80% of the server-specified interval but never tracks op 11 (Heartbeat ACK) responses. When the server stops responding to heartbeats (because it already closed the connection), the client keeps sending heartbeats into the void without detecting the failure.

Reproduction

  1. Start gateway with QQ bot configured
  2. Wait several minutes (varies by network conditions)
  3. QQ bot goes offline silently — no error logs, no reconnection attempts
  4. Gateway process still running but WebSocket connection is dead
  5. Requires manual gateway restart to recover

Logs example

From gateway.log, the last activity before silent death:

2026-05-08 09:17:42,717 INFO gateway.platforms.base: [QQBot] Sending response (73 chars) to ...

No subsequent error/warning/reconnect log entries — the adapter simply stopped.

Proposed fix

Two-layer protection:

  1. Add heartbeat=50.0 to ws_connect() for WebSocket-level ping/pong keepalive (below QQ's ~60s server timeout)
  2. Track heartbeat ACKs (op 11) in _heartbeat_loop() — if 2 consecutive heartbeats go unacknowledged, force-close the WebSocket to trigger the existing reconnection logic

Additional context

  • This is a known class of issue for long-lived WebSocket connections behind NAT/proxies
  • The existing reconnection logic (_reconnect()) works correctly when it's triggered — the problem is that nothing triggers it when the connection silently dies
  • Similar issues have been reported: #18221, #19648, #17703

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING