hermes - 💡(How to fix) Fix Weixin(WeChat) voice messages in Silk format skip STT — russian transcription broken

hermes2026-05-25 18:31:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

_extract_text() — returns voice_item.text from WeChat directly, which is transcribed through WeChat's Chinese pipeline. For Russian (and likely other non-Chinese languages) the transcription is garbage. The configured STT (mlx-whisper) is never called because text propagation short-circuits the media pipeline.

Code Example

for item in item_list:
    if item.get("type") == ITEM_VOICE:
        return ""        # ← always, STT must handle it

---

import pysilk, io, wave
with open(silk_path, "rb") as f:
    pcm_buf = io.BytesIO()
    pysilk.decode(f, pcm_buf, 24000)
    pcm_data = pcm_buf.getvalue()
with wave.open(wav_path, "wb") as w:
    w.setnchannels(1)
    w.setsampwidth(2)
    w.setframerate(24000)
    w.writeframes(pcm_data)

RAW_BUFFERClick to expand / collapse

Problem

WeChat sends voice messages in Silk (proprietary Tencent codec). The Weixin platform in Hermes has two issues:

_extract_text() — returns voice_item.text from WeChat directly, which is transcribed through WeChat's Chinese pipeline. For Russian (and likely other non-Chinese languages) the transcription is garbage. The configured STT (mlx-whisper) is never called because text propagation short-circuits the media pipeline.
_download_voice() — has an early if voice_item.get("text"): return None, so if WeChat returned any transcription (even bad), the .silk file is never downloaded. Even when downloaded, the .silk raw bytes are returned — but Hermes gateway/STT expects WAV/MP3.

Environment

Hermes Agent v0.14.0 (2026.5.16)
macOS (Apple Silicon)
STT: mlx-community/whisper-small-mlx via mlx-whisper
WeChat (Weixin) bot account

Fix (local, not yet PR)

1. `_extract_text()` — ignore `voice_item.text`

for item in item_list:
    if item.get("type") == ITEM_VOICE:
        return ""        # ← always, STT must handle it

2. `_download_voice()` — always download, convert Silk→WAV

Removed the early return. After downloading the .silk blob, convert via pysilk + Python wave:

import pysilk, io, wave
with open(silk_path, "rb") as f:
    pcm_buf = io.BytesIO()
    pysilk.decode(f, pcm_buf, 24000)
    pcm_data = pcm_buf.getvalue()
with wave.open(wav_path, "wb") as w:
    w.setnchannels(1)
    w.setsampwidth(2)
    w.setframerate(24000)
    w.writeframes(pcm_data)

The .wav is then picked up by the standard gateway STT pipeline.

Dependencies

pip install silk-python (module name pysilk — Cython-backed, correct). NOT pip install pysilk (that's an empty stub 0.0.1).

Files

gateway/platforms/weixin.py — _extract_text() (~line 1000) and _download_voice() (~line 1529).

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Weixin(WeChat) voice messages in Silk format skip STT — russian transcription broken

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Problem

Environment

Fix (local, not yet PR)

1. `_extract_text()` — ignore `voice_item.text`

2. `_download_voice()` — always download, convert Silk→WAV

Dependencies

Files

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Weixin(WeChat) voice messages in Silk format skip STT — russian transcription broken

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Problem

Environment

Fix (local, not yet PR)

1. _extract_text() — ignore voice_item.text

2. _download_voice() — always download, convert Silk→WAV

Dependencies

Files

Still need to ship something?

TRENDING

1. `_extract_text()` — ignore `voice_item.text`

2. `_download_voice()` — always download, convert Silk→WAV