hermes - ✅(Solved) Fix weixin: cron-initiated push fails with iLink ret=-2 when context_token is stale (not recognized as session-expired) [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17228Fetched 2026-04-29 06:36:36
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
0
Timeline (top)
labeled ×4commented ×2cross-referenced ×1

The Weixin adapter's session-expired fallback only recognizes errcode=-14 as a stale-session signal. In practice, iLink also returns ret=-2 errmsg="unknown error" for what appears to be the same underlying condition — typically when a cron job tries to push a message to a chat that hasn't had user activity for many hours. The adapter treats ret=-2 as a generic failure, exhausts its retries with the same stale context_token, and the push is dropped.

Error Message

[Weixin] send chunk failed ... attempt=1/3, retrying in 1.00s: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error [Weixin] send chunk failed ... attempt=2/3, retrying in 2.00s: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error [Weixin] send failed ...: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error [Weixin] send chunk failed ... attempt=1/3, retrying in 1.00s: Timeout context manager should be used inside a task ...

Root Cause

Secondary effect: once the first attempt raises, subsequent retries in the same send cycle surface a misleading Timeout context manager should be used inside a task error (likely aiohttp session left in a bad state after the RuntimeError), which obscures the root cause in last_delivery_error.

PR fix notes

PR #17287: fix: recognize ret=-2 as stale-session signal in Weixin adapter (#17228)

Description (problem / solution / changelog)

Problem

The Weixin adapter's session-expired fallback only recognizes errcode=-14 as a stale-session signal. In practice, iLink also returns ret=-2 with errmsg="unknown error" for the same underlying condition — typically when a cron job tries to push a message to a chat that hasn't had user activity for many hours.

The adapter treats ret=-2 as a rate-limit, exhausting retries with the same stale context_token instead of refreshing the session. Cron deliveries silently fail on the Weixin leg.

Fix

Added _is_stale_session_ret() helper function that distinguishes ret=-2 with errmsg="unknown error" from genuine rate limits. Updated both the poll loop (_run_poll_loop) and _send_text_chunk to use the helper.

3 changes in gateway/platforms/weixin.py:

  1. Added _is_stale_session_ret() helper (centralizes the dual-meaning logic)
  2. Updated poll-loop stale-session check (line ~1257)
  3. Updated _send_text_chunk stale-session check (lines ~1519-1522)

Before vs After

ScenarioBeforeAfter
iLink ret=-2, errmsg="unknown error"Treated as rate-limit, retries fail silentlyRecognized as stale session, strips context_token and retries
iLink ret=-2, errmsg="freq limit"Rate-limit backoff (correct)Rate-limit backoff (unchanged)
iLink errcode=-14Session expired pause (correct)Session expired pause (unchanged)

Tests

No existing tests cover this specific behavior. The fix is minimal and follows the same pattern as the existing errcode=-14 handling.

Fixes #17228

Changed files

  • gateway/platforms/weixin.py (modified, +15/-1)

Code Example

[Weixin] send chunk failed ... attempt=1/3, retrying in 1.00s: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error
[Weixin] send chunk failed ... attempt=2/3, retrying in 2.00s: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error
[Weixin] send failed ...: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error
[Weixin] send chunk failed ... attempt=1/3, retrying in 1.00s: Timeout context manager should be used inside a task
...

---

is_session_expired = (
    ret == SESSION_EXPIRED_ERRCODE           # -14
    or errcode == SESSION_EXPIRED_ERRCODE
)
RAW_BUFFERClick to expand / collapse

Summary

The Weixin adapter's session-expired fallback only recognizes errcode=-14 as a stale-session signal. In practice, iLink also returns ret=-2 errmsg="unknown error" for what appears to be the same underlying condition — typically when a cron job tries to push a message to a chat that hasn't had user activity for many hours. The adapter treats ret=-2 as a generic failure, exhausts its retries with the same stale context_token, and the push is dropped.

Impact

Cron jobs with deliver: "feishu,weixin" (or single deliver: "weixin") silently fail on the Weixin leg if the target chat hasn't been active recently. Feishu delivery works; Weixin delivery loses the message.

Secondary effect: once the first attempt raises, subsequent retries in the same send cycle surface a misleading Timeout context manager should be used inside a task error (likely aiohttp session left in a bad state after the RuntimeError), which obscures the root cause in last_delivery_error.

Reproduction

  1. Configure a cron job that delivers to Weixin on a schedule (e.g. 0 8 * * *).
  2. Make sure no user message is sent to the bot for several hours before the cron fires.
  3. Observe cron triggers → adapter logs:
[Weixin] send chunk failed ... attempt=1/3, retrying in 1.00s: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error
[Weixin] send chunk failed ... attempt=2/3, retrying in 2.00s: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error
[Weixin] send failed ...: iLink sendmessage error: ret=-2 errcode=None errmsg=unknown error
[Weixin] send chunk failed ... attempt=1/3, retrying in 1.00s: Timeout context manager should be used inside a task
...
  1. Send any message from the user to the bot (refreshing the server-side session), then trigger the same cron job — it succeeds.

Evidence

In my setup, three cron deliveries failed on 2026-04-29 at 08:06, 08:17, 09:04 (all with ret=-2). After interactive replies at 09:24 / 09:28 refreshed the session, the 09:30 cron delivery succeeded without any code change.

Root cause (adapter side)

In gateway/platforms/weixin.py, _send_chunk_with_retry (~L1502–1556):

is_session_expired = (
    ret == SESSION_EXPIRED_ERRCODE           # -14
    or errcode == SESSION_EXPIRED_ERRCODE
)

Only -14 triggers the tokenless-fallback path that the docstring explicitly designed for cron push:

On session-expired errors (errcode -14), automatically retries without context_token — iLink accepts tokenless sends as a degraded fallback, which keeps cron-initiated push messages working even when no user message has refreshed the session recently.

When iLink returns ret=-2 (which empirically correlates with the same stale-token condition), the adapter doesn't strip the token, so all 3 retries reuse the bad token and the send is dropped.

Underlying issue

The real problem is on the iLink side: ret=-2 errmsg="unknown error" is semantically ambiguous. It would be better for iLink to return a specific error code for stale sessions, rather than forcing clients to heuristically map ret=-2 to "probably session expired". Until that's clarified, the adapter could either:

Option A — pragmatic fix: extend is_session_expired to also match ret=-2 and try the tokenless fallback. Downside: if ret=-2 ever means something else, we mask that error behind a tokenless retry.

Option B — defensive fix: on any retriable error, invalidate the cached context_token after the first failed attempt and retry tokenless. Clearer semantics, but mildly more aggressive.

Option C — upstream: raise this with iLink to get a distinct error code for session expiry, so adapters don't have to guess.

Environment

  • hermes-agent commit: 3ff3dfb5 (fix(telegram): accept /cmd@botname from bot menu in groups)
  • OS: Ubuntu 22.04
  • Delivery mode: cron native deliver: "feishu,weixin" (comma-separated multi-target — which by the way isn't documented, only shown in _resolve_delivery_targets)

Side note

website/docs/user-guide/features/cron.md doesn't document the comma-separated multi-target deliver syntax that cron/scheduler.py::_resolve_delivery_targets supports. Worth adding — right now users are told to use send_message for the second target, which on Weixin hits a separate aiohttp event-loop issue.

extent analysis

TL;DR

The Weixin adapter should be updated to handle ret=-2 errors as potential session-expired signals, in addition to errcode=-14, to prevent silent failures in cron jobs.

Guidance

  • Update the is_session_expired check in gateway/platforms/weixin.py to include ret==-2 as a potential session-expired error.
  • Consider implementing a tokenless fallback for ret==-2 errors, similar to the existing fallback for errcode==-14 errors.
  • As an alternative, invalidate the cached context_token after the first failed attempt and retry tokenless for any retriable error.
  • Raise the issue with iLink to obtain a distinct error code for session expiry, allowing adapters to handle errors more accurately.

Example

is_session_expired = (
    ret == SESSION_EXPIRED_ERRCODE           # -14
    or errcode == SESSION_EXPIRED_ERRCODE
    or ret == -2  # potential session-expired error
)

Notes

The chosen solution may have implications for error handling and logging, as ret==-2 errors may not always indicate session expiry. It is essential to monitor the adapter's behavior after implementing the fix to ensure it does not introduce new issues.

Recommendation

Apply the pragmatic fix (Option A) by extending the is_session_expired check to include ret==-2, as it provides a straightforward solution to the immediate problem. However, it is crucial to continue pursuing a distinct error code for session expiry from iLink to ensure long-term accuracy and reliability.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING