hermes - 💡(How to fix) Fix [Weixin] Rate limit handling causes message loss — need exponential backoff + session reconnect [1 pull requests]

hermes2026-05-07 09:54:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Three related issues affect Weixin channel reliability. Suggesting consolidated fix.

Error Message

rate limited; backing off 3.0s × 5 → send failed → cron delivery error logger.error("[%s] session expired; attempting reconnect", self.name)

Root Cause

Three related issues affect Weixin channel reliability. Suggesting consolidated fix.

Fix Action

Fixed

Fixed by PR: fix(weixin): exponential rate-limit backoff, session reconnect, dynamic chunk delay (https://github.com/NousResearch/hermes-agent/pull/21135)

Code Example

rate limited; backing off 3.0s × 5 → send failed → cron delivery error

---

import random
base_wait = self._send_chunk_retry_delay_seconds * 3
wait = min(base_wait * (2 ** attempt) + random.uniform(0, 2), 60)

---

_WEIXIN_SEND_SEMAPHORE = asyncio.Semaphore(1)

async def send_weixin_direct(...):
    async with _WEIXIN_SEND_SEMAPHORE:
        # ... existing send logic

---

async def _reconnect(self) -> bool:
    persisted = load_weixin_account(self._hermes_home, self._account_id)
    if persisted and persisted.get("token"):
        self._token = persisted["token"]
        self._base_url = persisted.get("base_url", self._base_url)
    if self._poll_session and not self._poll_session.closed:
        await self._poll_session.close()
    self._poll_session = aiohttp.ClientSession(
        trust_env=True, connector=_make_ssl_connector()
    )
    self._token_store.restore(self._account_id)
    return True

---

if session_expired:
    logger.error("[%s] session expired; attempting reconnect", self.name)
    if not await self._reconnect():
        await asyncio.sleep(600)
    continue

---

dynamic_delay = max(
    self._send_chunk_delay_seconds,
    2.0 + (len(chunks) * 0.3)
)

RAW_BUFFERClick to expand / collapse

Description

Three related issues affect Weixin channel reliability. Suggesting consolidated fix.

1. Rate Limit Backoff Strategy Is Too Simple (P0)

weixin.py L1582 uses a fixed 9s backoff. iLink rate limits typically last several minutes. Cron push messages fail consecutively:

rate limited; backing off 3.0s × 5 → send failed → cron delivery error

Suggested fix — exponential backoff with jitter:

import random
base_wait = self._send_chunk_retry_delay_seconds * 3
wait = min(base_wait * (2 ** attempt) + random.uniform(0, 2), 60)

Additionally, add a global send semaphore in send_weixin_direct() to serialize concurrent cron deliveries:

_WEIXIN_SEND_SEMAPHORE = asyncio.Semaphore(1)

async def send_weixin_direct(...):
    async with _WEIXIN_SEND_SEMAPHORE:
        # ... existing send logic

2. Session Expired Does Not Trigger Reconnection (P0)

_poll_loop L1282 encounters errcode=-14 and only sleeps 10 minutes, then continues polling with the stale token. Disconnection requires manual gateway restart to recover.

Suggested fix — add a _reconnect() method:

async def _reconnect(self) -> bool:
    persisted = load_weixin_account(self._hermes_home, self._account_id)
    if persisted and persisted.get("token"):
        self._token = persisted["token"]
        self._base_url = persisted.get("base_url", self._base_url)
    if self._poll_session and not self._poll_session.closed:
        await self._poll_session.close()
    self._poll_session = aiohttp.ClientSession(
        trust_env=True, connector=_make_ssl_connector()
    )
    self._token_store.restore(self._account_id)
    return True

Then in _poll_loop:

if session_expired:
    logger.error("[%s] session expired; attempting reconnect", self.name)
    if not await self._reconnect():
        await asyncio.sleep(600)
    continue

3. Multi-Chunk Delay Too Short (P1)

send() L1671 uses a fixed 1.5s interval between chunks. 5-8 chunks send within 10s, triggering rate limits.

Suggested fix — dynamic delay:

dynamic_delay = max(
    self._send_chunk_delay_seconds,
    2.0 + (len(chunks) * 0.3)
)

Environment

Hermes Agent v0.12.0 (2026.4.30)
Python 3.11.6
TencentOS Server 4, kernel 6.6.110
iLink Bot API via ilinkai.weixin.qq.com

Additional Context

These issues primarily affect cron-scheduled push messages (leaderboard reports, daily briefings) where multiple messages are sent in rapid succession. Interactive single-message delivery is less affected but can still hit rate limits during long multi-chunk responses.

Logs reproduced on 2026-05-05 and 2026-05-07.

/cc @joeytao

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Weixin] Rate limit handling causes message loss — need exponential backoff + session reconnect [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Description

1. Rate Limit Backoff Strategy Is Too Simple (P0)

2. Session Expired Does Not Trigger Reconnection (P0)

3. Multi-Chunk Delay Too Short (P1)

Environment

Additional Context

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Weixin] Rate limit handling causes message loss — need exponential backoff + session reconnect [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Description

1. Rate Limit Backoff Strategy Is Too Simple (P0)

2. Session Expired Does Not Trigger Reconnection (P0)

3. Multi-Chunk Delay Too Short (P1)

Environment

Additional Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING