hermes - 💡(How to fix) Fix [BUG]Audio attachments cached with wrong .ogg extension (real format MP4/QuickTime+AAC) → on-device transcription fails

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Incoming audio attachments are cached with a hardcoded/fallback .ogg extension that does not match the file's real container. When the audio is later handed to an extension-based transcriber (e.g. an on-device Apple AVFoundation / Speech transcribe tool), demuxer selection keys off the bogus .ogg extension and fails — even though the bytes are a perfectly readable format.

Observed with a Discord audio attachment whose real content is QuickTime/MP4 + AAC, saved as audio_<hash>.ogg. Apple transcription returned:

{"error":"Error executing tool transcribe_audio: The operation could not be completed"}

Error Message

{"error":"Error executing tool transcribe_audio: The operation could not be completed"}

Root Cause

Two layers both default to .ogg:

1. Discord adapter — MIME allowlist falls back to .ogg plugins/platforms/discord/adapter.py (audio attachment branch):

elif content_type.startswith("audio/"):
    ext = "." + content_type.split("/")[-1].split(";")[0]
    if ext not in {".ogg", ".mp3", ".wav", ".webm", ".m4a"}:
        ext = ".ogg"   # <-- any audio MIME outside the 5-item allowlist gets mislabeled

A QuickTime/AAC attachment arrives as audio/mp4 / audio/x-m4a / audio/aac, which map to .mp4 / .x-m4a / .aac — none are in the allowlist, so the file is forced to .ogg while its bytes are MP4/QuickTime.

2. Shared cache helper defaults to .ogg gateway/platforms/base.py:

def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
    ...
    filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"

This helper is shared by every platform adapter (Discord, Telegram, Signal, Matrix, WhatsApp, Mattermost, google_chat, simplex), and several callers also pass ext or ".ogg". So the mislabel is not Discord-specific — any platform whose reported MIME falls outside the allowlist produces a .ogg file with non-Ogg content.

Fix Action

Fix / Workaround

  • On-device / extension-sensitive transcription silently fails for any audio whose real format ≠ .ogg but gets mislabeled .ogg.
  • Affects all platform adapters via the shared cache_audio_from_bytes.
  • Forces consumers into unnecessary transcoding workarounds for files that were already in a supported format.

Code Example

{"error":"Error executing tool transcribe_audio: The operation could not be completed"}

---

elif content_type.startswith("audio/"):
    ext = "." + content_type.split("/")[-1].split(";")[0]
    if ext not in {".ogg", ".mp3", ".wav", ".webm", ".m4a"}:
        ext = ".ogg"   # <-- any audio MIME outside the 5-item allowlist gets mislabeled

---

def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
    ...
    filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"

---

$ ffprobe -show_entries format=format_name ... audio_61de14323054.ogg
format_name=mov,mp4,m4a,3gp,3g2,mj2     # QuickTime / MOV, NOT ogg
$ xxd audio_61de14323054.ogg | head -1
00000000: 0000 0014 6674 7970 7174 2020   ....ftypqt      # 'ftypqt' = QuickTime
$ file audio_61de14323054.ogg
... ISO Media, Apple QuickTime movie

---

def _sniff_audio_ext(data: bytes, fallback: str) -> str:
    if data[4:8] == b"ftyp":                       # MP4 / QuickTime / M4A
        return ".m4a"
    if data[:4] == b"OggS":                        # Ogg (Opus/Vorbis)
        return ".ogg"
    if data[:4] == b"RIFF" and data[8:12] == b"WAVE":
        return ".wav"
    if data[:3] == b"ID3" or data[:2] in (b"\xff\xfb", b"\xff\xf3", b"\xff\xf2"):
        return ".mp3"
    if data[:4] == b"fLaC":
        return ".flac"
    if data[:4] == b"\x1aE\xdf\xa3":               # EBML -> webm/mkv
        return ".webm"
    return fallback

def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
    ext = _sniff_audio_ext(data, ext)
    ...
RAW_BUFFERClick to expand / collapse

Summary

Incoming audio attachments are cached with a hardcoded/fallback .ogg extension that does not match the file's real container. When the audio is later handed to an extension-based transcriber (e.g. an on-device Apple AVFoundation / Speech transcribe tool), demuxer selection keys off the bogus .ogg extension and fails — even though the bytes are a perfectly readable format.

Observed with a Discord audio attachment whose real content is QuickTime/MP4 + AAC, saved as audio_<hash>.ogg. Apple transcription returned:

{"error":"Error executing tool transcribe_audio: The operation could not be completed"}

Root cause

Two layers both default to .ogg:

1. Discord adapter — MIME allowlist falls back to .ogg plugins/platforms/discord/adapter.py (audio attachment branch):

elif content_type.startswith("audio/"):
    ext = "." + content_type.split("/")[-1].split(";")[0]
    if ext not in {".ogg", ".mp3", ".wav", ".webm", ".m4a"}:
        ext = ".ogg"   # <-- any audio MIME outside the 5-item allowlist gets mislabeled

A QuickTime/AAC attachment arrives as audio/mp4 / audio/x-m4a / audio/aac, which map to .mp4 / .x-m4a / .aac — none are in the allowlist, so the file is forced to .ogg while its bytes are MP4/QuickTime.

2. Shared cache helper defaults to .ogg gateway/platforms/base.py:

def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
    ...
    filename = f"audio_{uuid.uuid4().hex[:12]}{ext}"

This helper is shared by every platform adapter (Discord, Telegram, Signal, Matrix, WhatsApp, Mattermost, google_chat, simplex), and several callers also pass ext or ".ogg". So the mislabel is not Discord-specific — any platform whose reported MIME falls outside the allowlist produces a .ogg file with non-Ogg content.

Evidence

$ ffprobe -show_entries format=format_name ... audio_61de14323054.ogg
format_name=mov,mp4,m4a,3gp,3g2,mj2     # QuickTime / MOV, NOT ogg
$ xxd audio_61de14323054.ogg | head -1
00000000: 0000 0014 6674 7970 7174 2020   ....ftypqt      # 'ftypqt' = QuickTime
$ file audio_61de14323054.ogg
... ISO Media, Apple QuickTime movie

The stream is aac, 44100 Hz, mono, 27.4s — natively readable by Apple once the extension is correct.

Impact

  • On-device / extension-sensitive transcription silently fails for any audio whose real format ≠ .ogg but gets mislabeled .ogg.
  • Affects all platform adapters via the shared cache_audio_from_bytes.
  • Forces consumers into unnecessary transcoding workarounds for files that were already in a supported format.

Proposed fix

Stop trusting the platform-reported MIME / caller-supplied extension. Sniff the real container from the leading magic bytes inside cache_audio_from_bytes (single central fix that covers all adapters):

def _sniff_audio_ext(data: bytes, fallback: str) -> str:
    if data[4:8] == b"ftyp":                       # MP4 / QuickTime / M4A
        return ".m4a"
    if data[:4] == b"OggS":                        # Ogg (Opus/Vorbis)
        return ".ogg"
    if data[:4] == b"RIFF" and data[8:12] == b"WAVE":
        return ".wav"
    if data[:3] == b"ID3" or data[:2] in (b"\xff\xfb", b"\xff\xf3", b"\xff\xf2"):
        return ".mp3"
    if data[:4] == b"fLaC":
        return ".flac"
    if data[:4] == b"\x1aE\xdf\xa3":               # EBML -> webm/mkv
        return ".webm"
    return fallback

def cache_audio_from_bytes(data: bytes, ext: str = ".ogg") -> str:
    ext = _sniff_audio_ext(data, ext)
    ...

This guarantees the on-disk extension matches the real content for every platform, with no transcoding needed for already-supported formats.

Environment

  • hermes-agent @ commit 5cbc3fbd (fix(cli): /yolo in chat must enable session bypass)
  • macOS, on-device Apple transcription path

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [BUG]Audio attachments cached with wrong .ogg extension (real format MP4/QuickTime+AAC) → on-device transcription fails