hermes - ✅(Solved) Fix [Bug]: Weixin inbound voice messages skip audio download when text hint exists [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11686Fetched 2026-04-18 05:59:22
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fix / Workaround

I have a patch ready for this and can open a PR.

PR fix notes

PR #11689: fix(weixin): keep inbound voice audio when text hint exists

Description (problem / solution / changelog)

What does this PR do?

Fixes a Weixin inbound voice handling bug: if voice_item.text is already present, Hermes currently skips downloading the actual voice media.

That is too aggressive. Some Weixin / iLink variants provide both:

  • a text hint
  • a downloadable voice payload

Before this patch, Hermes kept the text hint but dropped the audio entirely. After this patch, Hermes still preserves the text hint, but it also keeps the real voice payload when the media reference is available.

Related Issue

Fixes #11686

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • gateway/platforms/weixin.py
    • stop treating voice_item.text as a reason to skip voice download
    • only skip when there is no usable media reference
    • extract inbound voice hint metadata (voice_text_hint, voice_duration_ms)
    • normalize short playtime values that appear to be in seconds instead of milliseconds
  • gateway/platforms/base.py
    • add a small MessageEvent.metadata dict so adapters can carry normalized inbound metadata without inventing ad-hoc attributes
  • tests/gateway/test_weixin.py
    • add regressions for:
      • keeping audio even when a text hint exists
      • playtime normalization
      • safe no-media fallback
      • attaching normalized voice metadata to the inbound event

How to Test

  1. Run: python -m pytest tests/gateway/test_weixin.py tests/gateway/test_platform_base.py tests/gateway/test_stt_config.py -q
  2. Send a Weixin voice note where the inbound payload contains both voice_item.text and downloadable voice media.
  3. Confirm Hermes keeps both:
    • event.text still contains the hint
    • event.media_urls includes the cached voice file

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

Targeted validation run:

python -m pytest tests/gateway/test_weixin.py tests/gateway/test_platform_base.py tests/gateway/test_stt_config.py -q
127 passed

Changed files

  • gateway/platforms/base.py (modified, +4/-0)
  • gateway/platforms/weixin.py (modified, +30/-1)
  • tests/gateway/test_weixin.py (modified, +90/-0)

Code Example

async def _download_voice(self, item: Dict[str, Any]) -> Optional[str]:
    voice_item = item.get("voice_item") or {}
    media = voice_item.get("media") or {}
    if voice_item.get("text"):
        return None
RAW_BUFFERClick to expand / collapse

Bug Description

When a Weixin inbound voice message already includes voice_item.text, Hermes skips downloading the actual voice media.

That means the gateway keeps the text hint, but drops the original audio payload entirely. Anything downstream that wants the real voice note — STT fallback, format-specific handling, future voice analysis, or even just preserving the .silk file — never sees it.

Steps to Reproduce

  1. Run Hermes with the Weixin adapter.
  2. Send a Weixin voice note from a client / bridge variant that populates both:
    • voice_item.text
    • downloadable voice_item.media
  3. Inspect the normalized inbound MessageEvent.
  4. event.text is present, but event.media_urls is empty for the voice item.

Expected Behavior

If the inbound voice item has downloadable media, Hermes should cache the voice payload even when a text hint already exists.

The text hint should still be preserved, but it should act as a hint, not as a reason to discard the audio.

Actual Behavior

WeixinAdapter._download_voice() returns early when voice_item.text is present, so the voice media is never downloaded.

Affected Component

  • Gateway (Telegram/Discord/Slack/WhatsApp)
  • Agent Core (conversation loop, context compression, memory)
  • Other

Messaging Platform (if gateway-related)

  • Weixin

Additional Logs / Traceback (optional)

Current logic in gateway/platforms/weixin.py:

async def _download_voice(self, item: Dict[str, Any]) -> Optional[str]:
    voice_item = item.get("voice_item") or {}
    media = voice_item.get("media") or {}
    if voice_item.get("text"):
        return None

Root Cause Analysis (optional)

The adapter currently treats voice_item.text as if it were a full replacement for the voice media.

That assumption is too strong. In practice, some Weixin/iLink variants provide both:

  • a text hint / transcript
  • a downloadable voice payload

Those two signals should coexist.

Proposed Fix (optional)

Only skip the download when the voice item has no usable media reference.

If media exists, keep downloading and caching the voice payload, and preserve the text hint separately so later stages can use both.

I have a patch ready for this and can open a PR.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The issue can be fixed by modifying the _download_voice method in weixin.py to download the voice media even when voice_item.text is present.

Guidance

  • Review the current logic in gateway/platforms/weixin.py and update the _download_voice method to check for the presence of voice_item.media before deciding whether to download the voice media.
  • Verify that the updated method correctly downloads and caches the voice payload when both voice_item.text and voice_item.media are present.
  • Test the changes with different Weixin client/bridge variants to ensure compatibility.
  • Consider adding logging or debugging statements to monitor the behavior of the updated method.

Example

async def _download_voice(self, item: Dict[str, Any]) -> Optional[str]:
    voice_item = item.get("voice_item") or {}
    media = voice_item.get("media") or {}
    if not media:
        return None
    # Proceed with downloading the voice media

Notes

The proposed fix assumes that the presence of voice_item.media indicates that the voice payload is downloadable. However, additional checks may be necessary to handle cases where voice_item.media is present but the download fails.

Recommendation

Apply the workaround by updating the _download_voice method to download the voice media even when voice_item.text is present, as this will allow the gateway to preserve both the text hint and the original audio payload.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: Weixin inbound voice messages skip audio download when text hint exists [1 pull requests, 1 participants]