hermes - 💡(How to fix) Fix Feishu voice messages incorrectly classified as AUDIO (file attachment), skipping STT transcription

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Feishu voice messages (Opus/OGG, real-time voice) are classified as MessageType.AUDIO by the Feishu adapter. Since commit b93996c35 ("fix(gateway): route Telegram audio file attachments away from STT pipeline"), MessageType.AUDIO is treated as an audio file attachment and skips STT transcription entirely, while only MessageType.VOICE goes through the STT pipeline.

This means Feishu voice messages are never auto-transcribed, breaking a core feature for Feishu users.

Root Cause

Commit b93996c35 introduced a split in gateway/run.py:

# Before (0.13.0): both AUDIO and VOICE went through STT
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

# After (0.14.0): AUDIO = file attachment (no STT), VOICE = voice message (STT)
if event.message_type == MessageType.AUDIO:
    audio_file_paths.append(path)  # never STT
elif event.message_type == MessageType.VOICE or ...:
    audio_paths.append(path)  # always STT

This split was correct for Telegram (where AUDIO = .mp3/.m4a file attachments, VOICE = Opus/OGG voice messages), but Feishu sends all voice messages as AUDIO type — it has no VOICE equivalent. So Feishu users lose STT entirely.

Fix Action

Fix

The fix is in the Feishu adapter (gateway/platforms/feishu.py). When the Feishu adapter normalizes an audio message, map it to MessageType.VOICE instead of MessageType.AUDIO:

if preferred == "audio":
    # Feishu audio = voice message (Opus/OGG), not file attachment
    return MessageType.VOICE

This ensures:

  • Feishu voice messages → STT transcription ✅
  • Feishu file attachments (.mp3 etc.) → still treated as DOCUMENT, no STT ✅
  • Telegram AUDIO/VOICE split → unaffected ✅

Code Example

# Before (0.13.0): both AUDIO and VOICE went through STT
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

# After (0.14.0): AUDIO = file attachment (no STT), VOICE = voice message (STT)
if event.message_type == MessageType.AUDIO:
    audio_file_paths.append(path)  # never STT
elif event.message_type == MessageType.VOICE or ...:
    audio_paths.append(path)  # always STT

---

if preferred == "audio":
    # Feishu audio = voice message (Opus/OGG), not file attachment
    return MessageType.VOICE
RAW_BUFFERClick to expand / collapse

Description

Feishu voice messages (Opus/OGG, real-time voice) are classified as MessageType.AUDIO by the Feishu adapter. Since commit b93996c35 ("fix(gateway): route Telegram audio file attachments away from STT pipeline"), MessageType.AUDIO is treated as an audio file attachment and skips STT transcription entirely, while only MessageType.VOICE goes through the STT pipeline.

This means Feishu voice messages are never auto-transcribed, breaking a core feature for Feishu users.

Root Cause

Commit b93996c35 introduced a split in gateway/run.py:

# Before (0.13.0): both AUDIO and VOICE went through STT
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

# After (0.14.0): AUDIO = file attachment (no STT), VOICE = voice message (STT)
if event.message_type == MessageType.AUDIO:
    audio_file_paths.append(path)  # never STT
elif event.message_type == MessageType.VOICE or ...:
    audio_paths.append(path)  # always STT

This split was correct for Telegram (where AUDIO = .mp3/.m4a file attachments, VOICE = Opus/OGG voice messages), but Feishu sends all voice messages as AUDIO type — it has no VOICE equivalent. So Feishu users lose STT entirely.

Fix

The fix is in the Feishu adapter (gateway/platforms/feishu.py). When the Feishu adapter normalizes an audio message, map it to MessageType.VOICE instead of MessageType.AUDIO:

if preferred == "audio":
    # Feishu audio = voice message (Opus/OGG), not file attachment
    return MessageType.VOICE

This ensures:

  • Feishu voice messages → STT transcription ✅
  • Feishu file attachments (.mp3 etc.) → still treated as DOCUMENT, no STT ✅
  • Telegram AUDIO/VOICE split → unaffected ✅

Environment

  • Hermes Agent v0.14.0 (2026.5.16)
  • Platform: Feishu/Lark
  • STT provider: local_http (faster-whisper)

Related

  • Commit: b93996c35e71518d4a68313f4dd7bef63e72b870

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Feishu voice messages incorrectly classified as AUDIO (file attachment), skipping STT transcription