hermes - 💡(How to fix) Fix Weixin(WeChat) voice messages in Silk format skip STT — russian transcription broken

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

  1. _extract_text() — returns voice_item.text from WeChat directly, which is transcribed through WeChat's Chinese pipeline. For Russian (and likely other non-Chinese languages) the transcription is garbage. The configured STT (mlx-whisper) is never called because text propagation short-circuits the media pipeline.

Code Example

for item in item_list:
    if item.get("type") == ITEM_VOICE:
        return ""        # ← always, STT must handle it

---

import pysilk, io, wave
with open(silk_path, "rb") as f:
    pcm_buf = io.BytesIO()
    pysilk.decode(f, pcm_buf, 24000)
    pcm_data = pcm_buf.getvalue()
with wave.open(wav_path, "wb") as w:
    w.setnchannels(1)
    w.setsampwidth(2)
    w.setframerate(24000)
    w.writeframes(pcm_data)
RAW_BUFFERClick to expand / collapse

Problem

WeChat sends voice messages in Silk (proprietary Tencent codec). The Weixin platform in Hermes has two issues:

  1. _extract_text() — returns voice_item.text from WeChat directly, which is transcribed through WeChat's Chinese pipeline. For Russian (and likely other non-Chinese languages) the transcription is garbage. The configured STT (mlx-whisper) is never called because text propagation short-circuits the media pipeline.

  2. _download_voice() — has an early if voice_item.get("text"): return None, so if WeChat returned any transcription (even bad), the .silk file is never downloaded. Even when downloaded, the .silk raw bytes are returned — but Hermes gateway/STT expects WAV/MP3.

Environment

  • Hermes Agent v0.14.0 (2026.5.16)
  • macOS (Apple Silicon)
  • STT: mlx-community/whisper-small-mlx via mlx-whisper
  • WeChat (Weixin) bot account

Fix (local, not yet PR)

1. _extract_text() — ignore voice_item.text

for item in item_list:
    if item.get("type") == ITEM_VOICE:
        return ""        # ← always, STT must handle it

2. _download_voice() — always download, convert Silk→WAV

Removed the early return. After downloading the .silk blob, convert via pysilk + Python wave:

import pysilk, io, wave
with open(silk_path, "rb") as f:
    pcm_buf = io.BytesIO()
    pysilk.decode(f, pcm_buf, 24000)
    pcm_data = pcm_buf.getvalue()
with wave.open(wav_path, "wb") as w:
    w.setnchannels(1)
    w.setsampwidth(2)
    w.setframerate(24000)
    w.writeframes(pcm_data)

The .wav is then picked up by the standard gateway STT pipeline.

Dependencies

pip install silk-python (module name pysilk — Cython-backed, correct). NOT pip install pysilk (that's an empty stub 0.0.1).

Files

gateway/platforms/weixin.py_extract_text() (~line 1000) and _download_voice() (~line 1529).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Weixin(WeChat) voice messages in Silk format skip STT — russian transcription broken