hermes - 💡(How to fix) Fix Gateway: Multi-platform WebSockets share single event loop, causing cascading disconnections

hermes2026-05-07 04:28:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When running multiple messaging platforms simultaneously (WeCom + Feishu + QQBot), the Hermes Gateway experiences cascading WebSocket disconnections. All platform connections share a single Python asyncio event loop. When the agent is processing a message (calling LLM API, executing tools, etc.), the event loop becomes occupied, and WebSocket keepalive pings for other platforms are not serviced in time. This causes the remote servers to drop the connections.

Root Cause

In gateway/run.py, all platform adapters and the agent processing loop run within the same asyncio event loop. Each platform maintains its own WebSocket connection with server-side keepalive expectations. When any single operation blocks the event loop for more than a few seconds (e.g., LLM API call, tool execution), the WebSocket keepalive/ping tasks for ALL other platforms are delayed, causing their respective servers to time out and close the connection.

Fix Action

Fix / Workaround

Current Workarounds (for affected users)

RAW_BUFFERClick to expand / collapse

Description

Impact

Messages sent on one platform can cause other platforms to disconnect
During the reconnect window (15-30 seconds), messages on the affected platform are lost
User finds the gateway unresponsive and needs to wait for reconnect
Cascading effect: one platform's disconnect can trigger another's during reconnect processing
This prevents users from relying on the gateway for remote access

Root Cause

Reproduction Steps

Configure at least 2 platforms (e.g., WeCom + Feishu)
Start the gateway
Send a message on one platform that triggers agent processing
Observe: while the first message is being processed, other platforms disconnect

Environment

Hermes Agent v0.12.0 (2026.4.30)
macOS (launchd)
Platforms: Feishu + WeCom (and optionally QQBot)
All platforms use WebSocket long-connection mode

Current Workarounds (for affected users)

Set busy_input_mode: queue in config.yaml to serialize incoming messages (prevents cascading from multiple concurrent messages)
Reduce WeCom heartbeat interval from 30s to 15s (gateway/platforms/wecom.py: HEARTBEAT_INTERVAL_SECONDS = 15)
Configure Feishu ping interval explicitly via extra.ws_ping_interval: 10 in config.yaml
But these only reduce frequency, not eliminate the root cause

Additional Context

This issue manifests as correlated disconnections across platforms - timestamps within 1-11 seconds of each other. For example, when a user sends a message on WeCom at T+0s, Feishu's WebSocket drops at T+1s (keepalive ping timeout), and WeCom may also drop at T+11s as it tries to recover.

The Feishu platform adapter (lark_oapi SDK) appears to have a server-side session timeout of approximately 19-29 minutes, which exacerbates the issue - even idle connections eventually drop, and the reconnect phase briefly blocks the event loop.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Gateway: Multi-platform WebSockets share single event loop, causing cascading disconnections

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Current Workarounds (for affected users)

Description

Impact

Root Cause

Reproduction Steps

Environment

Suggested Solutions

Option A: Isolate each platform's WebSocket in a separate asyncio event loop

Option B: Run agent processing in a thread pool

Option C: Per-platform dedicated heartbeat thread

Current Workarounds (for affected users)

Additional Context

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Gateway: Multi-platform WebSockets share single event loop, causing cascading disconnections

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Current Workarounds (for affected users)

Description

Impact

Root Cause

Reproduction Steps

Environment

Suggested Solutions

Option A: Isolate each platform's WebSocket in a separate asyncio event loop

Option B: Run agent processing in a thread pool

Option C: Per-platform dedicated heartbeat thread

Current Workarounds (for affected users)

Additional Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING