hermes - 💡(How to fix) Fix [Weixin] Rate limit handling causes message loss — need exponential backoff + session reconnect [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Three related issues affect Weixin channel reliability. Suggesting consolidated fix.

Error Message

rate limited; backing off 3.0s × 5 → send failed → cron delivery error logger.error("[%s] session expired; attempting reconnect", self.name)

Root Cause

Three related issues affect Weixin channel reliability. Suggesting consolidated fix.

Fix Action

Fixed

Code Example

rate limited; backing off 3.0s × 5 → send failed → cron delivery error

---

import random
base_wait = self._send_chunk_retry_delay_seconds * 3
wait = min(base_wait * (2 ** attempt) + random.uniform(0, 2), 60)

---

_WEIXIN_SEND_SEMAPHORE = asyncio.Semaphore(1)

async def send_weixin_direct(...):
    async with _WEIXIN_SEND_SEMAPHORE:
        # ... existing send logic

---

async def _reconnect(self) -> bool:
    persisted = load_weixin_account(self._hermes_home, self._account_id)
    if persisted and persisted.get("token"):
        self._token = persisted["token"]
        self._base_url = persisted.get("base_url", self._base_url)
    if self._poll_session and not self._poll_session.closed:
        await self._poll_session.close()
    self._poll_session = aiohttp.ClientSession(
        trust_env=True, connector=_make_ssl_connector()
    )
    self._token_store.restore(self._account_id)
    return True

---

if session_expired:
    logger.error("[%s] session expired; attempting reconnect", self.name)
    if not await self._reconnect():
        await asyncio.sleep(600)
    continue

---

dynamic_delay = max(
    self._send_chunk_delay_seconds,
    2.0 + (len(chunks) * 0.3)
)
RAW_BUFFERClick to expand / collapse

Description

Three related issues affect Weixin channel reliability. Suggesting consolidated fix.

1. Rate Limit Backoff Strategy Is Too Simple (P0)

weixin.py L1582 uses a fixed 9s backoff. iLink rate limits typically last several minutes. Cron push messages fail consecutively:

rate limited; backing off 3.0s × 5 → send failed → cron delivery error

Suggested fix — exponential backoff with jitter:

import random
base_wait = self._send_chunk_retry_delay_seconds * 3
wait = min(base_wait * (2 ** attempt) + random.uniform(0, 2), 60)

Additionally, add a global send semaphore in send_weixin_direct() to serialize concurrent cron deliveries:

_WEIXIN_SEND_SEMAPHORE = asyncio.Semaphore(1)

async def send_weixin_direct(...):
    async with _WEIXIN_SEND_SEMAPHORE:
        # ... existing send logic

2. Session Expired Does Not Trigger Reconnection (P0)

_poll_loop L1282 encounters errcode=-14 and only sleeps 10 minutes, then continues polling with the stale token. Disconnection requires manual gateway restart to recover.

Suggested fix — add a _reconnect() method:

async def _reconnect(self) -> bool:
    persisted = load_weixin_account(self._hermes_home, self._account_id)
    if persisted and persisted.get("token"):
        self._token = persisted["token"]
        self._base_url = persisted.get("base_url", self._base_url)
    if self._poll_session and not self._poll_session.closed:
        await self._poll_session.close()
    self._poll_session = aiohttp.ClientSession(
        trust_env=True, connector=_make_ssl_connector()
    )
    self._token_store.restore(self._account_id)
    return True

Then in _poll_loop:

if session_expired:
    logger.error("[%s] session expired; attempting reconnect", self.name)
    if not await self._reconnect():
        await asyncio.sleep(600)
    continue

3. Multi-Chunk Delay Too Short (P1)

send() L1671 uses a fixed 1.5s interval between chunks. 5-8 chunks send within 10s, triggering rate limits.

Suggested fix — dynamic delay:

dynamic_delay = max(
    self._send_chunk_delay_seconds,
    2.0 + (len(chunks) * 0.3)
)

Environment

  • Hermes Agent v0.12.0 (2026.4.30)
  • Python 3.11.6
  • TencentOS Server 4, kernel 6.6.110
  • iLink Bot API via ilinkai.weixin.qq.com

Additional Context

These issues primarily affect cron-scheduled push messages (leaderboard reports, daily briefings) where multiple messages are sent in rapid succession. Interactive single-message delivery is less affected but can still hit rate limits during long multi-chunk responses.

Logs reproduced on 2026-05-05 and 2026-05-07.

/cc @joeytao

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING