hermes - 💡(How to fix) Fix Feishu voice messages incorrectly classified as AUDIO (file attachment), skipping STT transcription

hermes2026-05-23 08:14:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Feishu voice messages (Opus/OGG, real-time voice) are classified as MessageType.AUDIO by the Feishu adapter. Since commit b93996c35 ("fix(gateway): route Telegram audio file attachments away from STT pipeline"), MessageType.AUDIO is treated as an audio file attachment and skips STT transcription entirely, while only MessageType.VOICE goes through the STT pipeline.

This means Feishu voice messages are never auto-transcribed, breaking a core feature for Feishu users.

Root Cause

Commit b93996c35 introduced a split in gateway/run.py:

# Before (0.13.0): both AUDIO and VOICE went through STT
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

# After (0.14.0): AUDIO = file attachment (no STT), VOICE = voice message (STT)
if event.message_type == MessageType.AUDIO:
    audio_file_paths.append(path)  # never STT
elif event.message_type == MessageType.VOICE or ...:
    audio_paths.append(path)  # always STT

This split was correct for Telegram (where AUDIO = .mp3/.m4a file attachments, VOICE = Opus/OGG voice messages), but Feishu sends all voice messages as AUDIO type — it has no VOICE equivalent. So Feishu users lose STT entirely.

Fix Action

Fix

The fix is in the Feishu adapter (gateway/platforms/feishu.py). When the Feishu adapter normalizes an audio message, map it to MessageType.VOICE instead of MessageType.AUDIO:

if preferred == "audio":
    # Feishu audio = voice message (Opus/OGG), not file attachment
    return MessageType.VOICE

This ensures:

Feishu voice messages → STT transcription ✅
Feishu file attachments (.mp3 etc.) → still treated as DOCUMENT, no STT ✅
Telegram AUDIO/VOICE split → unaffected ✅

Code Example

# Before (0.13.0): both AUDIO and VOICE went through STT
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

# After (0.14.0): AUDIO = file attachment (no STT), VOICE = voice message (STT)
if event.message_type == MessageType.AUDIO:
    audio_file_paths.append(path)  # never STT
elif event.message_type == MessageType.VOICE or ...:
    audio_paths.append(path)  # always STT

---

if preferred == "audio":
    # Feishu audio = voice message (Opus/OGG), not file attachment
    return MessageType.VOICE

RAW_BUFFERClick to expand / collapse

Description

This means Feishu voice messages are never auto-transcribed, breaking a core feature for Feishu users.

Root Cause

Commit b93996c35 introduced a split in gateway/run.py:

# Before (0.13.0): both AUDIO and VOICE went through STT
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

# After (0.14.0): AUDIO = file attachment (no STT), VOICE = voice message (STT)
if event.message_type == MessageType.AUDIO:
    audio_file_paths.append(path)  # never STT
elif event.message_type == MessageType.VOICE or ...:
    audio_paths.append(path)  # always STT

Fix

The fix is in the Feishu adapter (gateway/platforms/feishu.py). When the Feishu adapter normalizes an audio message, map it to MessageType.VOICE instead of MessageType.AUDIO:

if preferred == "audio":
    # Feishu audio = voice message (Opus/OGG), not file attachment
    return MessageType.VOICE

This ensures:

Feishu voice messages → STT transcription ✅
Feishu file attachments (.mp3 etc.) → still treated as DOCUMENT, no STT ✅
Telegram AUDIO/VOICE split → unaffected ✅

Environment

Hermes Agent v0.14.0 (2026.5.16)
Platform: Feishu/Lark
STT provider: local_http (faster-whisper)

Commit: b93996c35e71518d4a68313f4dd7bef63e72b870

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Feishu voice messages incorrectly classified as AUDIO (file attachment), skipping STT transcription

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix

Code Example

Description

Root Cause

Fix

Environment

Related

Still need to ship something?

TRENDING