hermes - 💡(How to fix) Fix WeChat voice messages use Tencent Cloud STT which garbles non-Chinese languages — should route through Hermes' own STT pipeline

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When Hermes receives a voice message on WeChat (Weixin), the gateway relies on Tencent Cloud's STT transcription (voice_item.text). For non-Chinese languages (Russian, European languages, Arabic, etc.), Tencent Cloud STT defaults to English or Chinese transcription, producing garbled/unintelligible text. The fix is straightforward: always download the raw SILK/Opus audio and route it through Hermes' own STT pipeline (mlx-whisper, whisper.cpp, etc.), which handles multiple languages well.

Error Message

async def _download_voice(self, item: Dict[str, Any]) -> Optional[str]: voice_item = item.get("voice_item") or {} media = voice_item.get("media") or {} if voice_item.get("text"): # ← If Tencent already transcribed → return None return None try: data = await _download_and_decrypt_media(...) return cache_audio_from_bytes(data, ".silk") except Exception as exc: logger.warning("[%s] voice download failed: %s", self.name, exc) return None

Root Cause

Two locations in gateway/platforms/weixin.py short-circuit on Tencent Cloud's text output:

Fix Action

Fix / Workaround

Discovered and patched locally; filing as a formal issue for upstream inclusion.

Code Example

async def _download_voice(self, item: Dict[str, Any]) -> Optional[str]:
    voice_item = item.get("voice_item") or {}
    media = voice_item.get("media") or {}
    if voice_item.get("text"):          # ← If Tencent already transcribed → return None
        return None
    try:
        data = await _download_and_decrypt_media(...)
        return cache_audio_from_bytes(data, ".silk")
    except Exception as exc:
        logger.warning("[%s] voice download failed: %s", self.name, exc)
        return None

---

for item in item_list:
    if item.get("type") == ITEM_VOICE:
        voice_text = str((item.get("voice_item") or {}).get("text") or "")
        if voice_text:
            return voice_text           # ← Returns garbled transcription

---

# In ~/.hermes/config.yaml
weixin:
  always_download_voice: true   # Default: true — prefer Hermes STT over Tencent Cloud
RAW_BUFFERClick to expand / collapse

Summary

When Hermes receives a voice message on WeChat (Weixin), the gateway relies on Tencent Cloud's STT transcription (voice_item.text). For non-Chinese languages (Russian, European languages, Arabic, etc.), Tencent Cloud STT defaults to English or Chinese transcription, producing garbled/unintelligible text. The fix is straightforward: always download the raw SILK/Opus audio and route it through Hermes' own STT pipeline (mlx-whisper, whisper.cpp, etc.), which handles multiple languages well.

Affected Users

All Hermes users who speak a non-Chinese language on WeChat. This effectively makes voice messages unusable for anyone outside China using the WeChat gateway — a large portion of the international Hermes userbase.

Root Cause

Two locations in gateway/platforms/weixin.py short-circuit on Tencent Cloud's text output:

1. _download_voice() (line ~1529–1543) — Skips downloading audio entirely

async def _download_voice(self, item: Dict[str, Any]) -> Optional[str]:
    voice_item = item.get("voice_item") or {}
    media = voice_item.get("media") or {}
    if voice_item.get("text"):          # ← If Tencent already transcribed → return None
        return None
    try:
        data = await _download_and_decrypt_media(...)
        return cache_audio_from_bytes(data, ".silk")
    except Exception as exc:
        logger.warning("[%s] voice download failed: %s", self.name, exc)
        return None

The guard if voice_item.get("text"): return None prevents the raw audio from ever being downloaded. Since Tencent always produces some text (even when wrong), this branch always fires for non-Chinese audio.

2. _extract_text() (line ~1001–1004) — Returns Tencent's text directly

for item in item_list:
    if item.get("type") == ITEM_VOICE:
        voice_text = str((item.get("voice_item") or {}).get("text") or "")
        if voice_text:
            return voice_text           # ← Returns garbled transcription

This returns Tencent's (incorrect) text as the message content, bypassing any STT processing Hermes would normally do on audio attachments.

How the Bug Manifests

  1. User sends a voice message in Russian on WeChat
  2. Tencent Cloud STT transcribes it as English gibberish (e.g., random English phonemes that sort-of match the Russian sounds)
  3. Hermes receives the garbled English text as the user's message
  4. Hermes responds to nonsense text — completely useless

Proposed Fix

The fix is small, self-contained, and well-suited for a first-time contributor or quick PR:

Option A (Recommended): Always download voice, use Hermes STT

  1. In _download_voice(), remove the if voice_item.get("text"): return None guard so the raw audio is always downloaded
  2. In _extract_text(), always return the audio path (or empty string for voice items) so the audio gets passed through the normal media pipeline → Hermes STT
  3. The downstream Hermes STT (mlx-whisper, whisper.cpp, etc.) already handles multiple languages correctly

This could be gated behind a config option for backward compatibility:

# In ~/.hermes/config.yaml
weixin:
  always_download_voice: true   # Default: true — prefer Hermes STT over Tencent Cloud

Option B: Language-match check

Check Tencent's language hint (if available) and only skip download for zh or en; fall through to download for everything else.

Implementation Scope

  • File: gateway/platforms/weixin.py
  • Methods to modify: _download_voice(), _extract_text()
  • Config key: weixin.always_download_voice (or similar)
  • No new dependencies required — Hermes already has STT configured

Why This Matters

This is a quality-of-life issue that gatekeeps the WeChat platform for international users. Without this fix, WeChat voice is effectively broken outside China. The fix is ~10 lines of code and leverages existing infrastructure that already works well.


Discovered and patched locally; filing as a formal issue for upstream inclusion.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix WeChat voice messages use Tencent Cloud STT which garbles non-Chinese languages — should route through Hermes' own STT pipeline