hermes - ✅(Solved) Fix Matrix voice replies sent as mp3 instead of ogg/opus — render as broken attachments [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14841Fetched 2026-04-24 06:14:31
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×5cross-referenced ×1

Fix Action

Fix / Workaround

Workaround

Patching both lines locally after each install. Diff in datacenter ansible role: https://github.com/winnersight/datacenter/blob/main/ansible/roles/hermes/files/patch-tts-ogg.sh

PR fix notes

PR #14900: fix: send voice replies as Opus/OGG on Matrix

Description (problem / solution / changelog)

Summary

Matrix requires Opus codec in an OGG container for proper voice bubbles per MSC3245. When sent as MP3, Matrix treats voice replies as generic file attachments — on mobile clients they appear as broken or unplayable.

Two code paths were affected:

1. tools/tts_tool.py — model-invoked text_to_speech tool

The want_opus predicate at line 960 only matched "telegram", so when the model calls the TTS tool directly on a Matrix session the output was always MP3 regardless of provider capabilities.

Fix: Extended the check to platform in ("telegram", "matrix").

2. gateway/run.py — auto voice-reply in _send_voice_reply()

The output path was hardcoded to .mp3 regardless of the target platform. For providers that support native Opus output (OpenAI, ElevenLabs, Mistral, Gemini), passing .ogg as the extension allows the tool to request response_format=opus directly — no ffmpeg conversion needed.

Fix: Platform-aware extension selection (.ogg for Telegram/Matrix, .mp3 otherwise).

Testing

  • py_compile passes for both modified files
  • All 7 tests in tests/gateway/test_matrix_voice.py pass
  • All TTS-related tests pass (43 passed, 3 pre-existing failures unrelated to this change)
  • Broader gateway + tools test suite: 312 passed (3 pre-existing failures)

Closes #14841

Changed files

  • gateway/run.py (modified, +5/-2)
  • tools/tts_tool.py (modified, +4/-4)

Code Example

# Use .mp3 extension so edge-tts conversion to opus works correctly.
# The TTS tool may convert to .ogg — use file_path from result.
audio_path = os.path.join(
    tempfile.gettempdir(), "hermes_voice",
    f"tts_reply_{_uuid.uuid4().hex[:12]}.mp3",
)

---

platform = get_session_env("HERMES_SESSION_PLATFORM", "").lower()
want_opus = (platform == "telegram")
...
if want_opus and provider in ("openai", "elevenlabs", "mistral", "gemini"):
    file_path = out_dir / f"tts_{timestamp}.ogg"
else:
    file_path = out_dir / f"tts_{timestamp}.mp3"
RAW_BUFFERClick to expand / collapse

Summary

Hermes sends voice replies to Matrix as mp3, which Matrix (per MSC3245) treats as a generic file attachment, not as a playable voice bubble. On mobile clients the message often shows as a broken attachment or won't play at all. Matrix voice messages require Opus codec in an OGG container.

Two places in the code default to mp3 for Matrix even when the provider can produce opus natively:

1. gateway/run.py — auto voice-reply output extension

# Use .mp3 extension so edge-tts conversion to opus works correctly.
# The TTS tool may convert to .ogg — use file_path from result.
audio_path = os.path.join(
    tempfile.gettempdir(), "hermes_voice",
    f"tts_reply_{_uuid.uuid4().hex[:12]}.mp3",
)

When the tts provider is OpenAI-compatible (e.g. a self-hosted Kokoro TTS speaking the OpenAI audio.speech API), tts_tool._generate_openai_tts() already routes .oggresponse_format="opus". Hardcoding .mp3 here discards that.

2. tools/tts_tool.py — model-invoked text_to_speech tool

platform = get_session_env("HERMES_SESSION_PLATFORM", "").lower()
want_opus = (platform == "telegram")
...
if want_opus and provider in ("openai", "elevenlabs", "mistral", "gemini"):
    file_path = out_dir / f"tts_{timestamp}.ogg"
else:
    file_path = out_dir / f"tts_{timestamp}.mp3"

Matrix is absent from the want_opus predicate, so when the model calls the speak tool directly the output is always mp3 regardless of provider support.

Environment

  • hermes-agent on matrix channel, OpenAI-compatible TTS provider (self-hosted Kokoro, tts.openai.base_url = http://nova:11438/v1, model = kokoro, voice = af_heart).
  • Nova/Kokoro supports response_format=opus natively and produces valid Ogg/Opus containers.
  • Matrix homeserver: Synapse + MAS (ESS).
  • Tested with Element on desktop + iOS mobile: mp3 uploads render as broken/unplayable attachments; opus uploads render as proper voice bubbles.

Proposed fix

  1. In gateway/run.py, either:
    • use .ogg by default and rely on tts_tool._generate_openai_tts() to negotiate opus, OR
    • pick the extension per-platform (mp3 for platforms that can't do voice bubbles; ogg for telegram and matrix).
  2. In tools/tts_tool.py, change want_opus = (platform == "telegram") to include matrix — platform in ("telegram", "matrix") — or refactor to a platform→preferred-format map.

Workaround

Patching both lines locally after each install. Diff in datacenter ansible role: https://github.com/winnersight/datacenter/blob/main/ansible/roles/hermes/files/patch-tts-ogg.sh

Happy to send a PR if you'd like — the change is small and self-contained.

extent analysis

TL;DR

To fix the issue of Hermes sending voice replies to Matrix as mp3 instead of Opus in an OGG container, modify the code to use the correct file extension and codec for Matrix.

Guidance

  • Update the gateway/run.py file to use .ogg as the default extension or pick the extension per-platform, allowing Matrix to receive voice messages in the correct format.
  • Modify the tools/tts_tool.py file to include Matrix in the want_opus predicate, ensuring that the text_to_speech tool produces Opus files for Matrix.
  • Verify that the TTS provider (e.g., OpenAI-compatible Kokoro) supports response_format=opus natively and produces valid Ogg/Opus containers.
  • Test the changes with Element on desktop and iOS mobile to ensure that voice messages are rendered as proper voice bubbles.

Example

# In gateway/run.py
audio_path = os.path.join(
    tempfile.gettempdir(), "hermes_voice",
    f"tts_reply_{_uuid.uuid4().hex[:12]}.ogg",
)

# In tools/tts_tool.py
want_opus = (platform in ("telegram", "matrix"))

Notes

The proposed fix requires modifying the Hermes code to accommodate the specific requirements of the Matrix platform. The changes are self-contained and can be applied locally or through a PR.

Recommendation

Apply the workaround by patching the lines locally after each install, as described in the provided diff in the datacenter ansible role, until a formal fix is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Matrix voice replies sent as mp3 instead of ogg/opus — render as broken attachments [1 pull requests, 1 participants]