hermes - ✅(Solved) Fix gateway/feishu: fallback send in _send_with_retry should strip thread_id from metadata [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#24808Fetched 2026-05-14 03:51:36
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1

When a primary send fails and _send_with_retry (base.py) falls back to a plain-text message, it passes the original metadata unchanged to send(). If metadata contains a thread_id, _send_raw_message in feishu.py uses it as receive_id with receive_id_type="thread_id". If the thread is stale or invalid, Feishu returns [99992402] field validation failed, causing the fallback to also fail.

Error Message

Observed error

ERROR gateway.platforms.base: [Feishu] Fallback send also failed: [99992402] field validation failed

Root Cause

When a primary send fails and _send_with_retry (base.py) falls back to a plain-text message, it passes the original metadata unchanged to send(). If metadata contains a thread_id, _send_raw_message in feishu.py uses it as receive_id with receive_id_type="thread_id". If the thread is stale or invalid, Feishu returns [99992402] field validation failed, causing the fallback to also fail.

Fix Action

Fixed

PR fix notes

PR #24813: fix(gateway/feishu): in-process WS reconnect + fallback send strips thread_id

Description (problem / solution / changelog)

Closes #24807, closes #24808

Problems

1. WS thread exit is silently swallowed (no in-process reconnect)

When the Feishu WebSocket thread exits unexpectedly (keepalive ping timeout escaping lark_oapi's internal reconnect loop), the exception was caught by a bare except Exception: pass in _run_official_feishu_ws_client. No fatal error was raised, so the adapter stayed registered but went deaf. Recovery required a full gateway process restart via systemd (exit 75 / TEMPFAIL), which killed any in-flight agent tasks and triggered the 60s drain timeout.

2. Fallback send fails with [99992402] field validation failed

_send_with_retry's plain-text fallback passed the original metadata unchanged. If metadata contained a stale thread_id, Feishu returned [99992402] field validation failed, making the fallback fail too.

Fixes

1. Add a done-callback on _ws_future in _connect_websocket. When the WS thread exits while the adapter is still running, the callback calls _set_fatal_error(retryable=True) and schedules _notify_fatal_error(). This wires into the existing _failed_platforms reconnect infrastructure in run.py (exponential backoff, up to _MAX_ATTEMPTS) — reconnect happens in-process without restarting the gateway.

2. Before the fallback send() call, build a copy of metadata with thread_id and reply_to_message_id stripped, so the fallback degrades to a plain top-level chat message.

Changes

  • gateway/platforms/feishu.py: add _on_ws_thread_done done-callback on _ws_future
  • gateway/platforms/base.py: strip thread_id/reply_to_message_id from metadata before fallback send

Changed files

  • gateway/platforms/base.py (modified, +11/-2)
  • gateway/platforms/feishu.py (modified, +13/-0)

Code Example

ERROR gateway.platforms.base: [Feishu] Fallback send also failed: [99992402] field validation failed

---

fallback_metadata = {k: v for k, v in (metadata or {}).items()
                     if k not in ("thread_id", "reply_to_message_id")}
fallback_result = await self.send(..., metadata=fallback_metadata)
RAW_BUFFERClick to expand / collapse

Summary

When a primary send fails and _send_with_retry (base.py) falls back to a plain-text message, it passes the original metadata unchanged to send(). If metadata contains a thread_id, _send_raw_message in feishu.py uses it as receive_id with receive_id_type="thread_id". If the thread is stale or invalid, Feishu returns [99992402] field validation failed, causing the fallback to also fail.

Observed error

ERROR gateway.platforms.base: [Feishu] Fallback send also failed: [99992402] field validation failed

Expected behavior

The fallback send path should strip thread_id (and reply_to_message_id) from metadata so it degrades gracefully to a plain chat_id send, which is more likely to succeed.

Suggested fix

In base.py _send_with_retry, replace the fallback send metadata:

fallback_metadata = {k: v for k, v in (metadata or {}).items()
                     if k not in ("thread_id", "reply_to_message_id")}
fallback_result = await self.send(..., metadata=fallback_metadata)

Relevant code

  • gateway/platforms/base.py _send_with_retry (~line 2476) — fallback send passes original metadata unchanged
  • gateway/platforms/feishu.py _send_raw_message (~line 4295) — uses thread_id from metadata as receive_id

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The fallback send path should strip thread_id (and reply_to_message_id) from metadata so it degrades gracefully to a plain chat_id send, which is more likely to succeed.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix gateway/feishu: fallback send in _send_with_retry should strip thread_id from metadata [1 pull requests, 1 participants]