hermes - ✅(Solved) Fix Matrix voice replies sent as mp3 instead of ogg/opus — render as broken attachments [1 pull requests, 1 participants]

hermes2026-04-24 01:19:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14841•Fetched 2026-04-24 06:14:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

vkrmch

Participants

vkrmch

Timeline (top)

labeled ×5cross-referenced ×1

Fix Action

Fix / Workaround

Workaround

Patching both lines locally after each install. Diff in datacenter ansible role: https://github.com/winnersight/datacenter/blob/main/ansible/roles/hermes/files/patch-tts-ogg.sh

PR fix notes

PR #14900: fix: send voice replies as Opus/OGG on Matrix

Repository: NousResearch/hermes-agent
Author: Kailigithub
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14900

Description (problem / solution / changelog)

Summary

Matrix requires Opus codec in an OGG container for proper voice bubbles per MSC3245. When sent as MP3, Matrix treats voice replies as generic file attachments — on mobile clients they appear as broken or unplayable.

Two code paths were affected:

1. `tools/tts_tool.py` — model-invoked `text_to_speech` tool

The want_opus predicate at line 960 only matched "telegram", so when the model calls the TTS tool directly on a Matrix session the output was always MP3 regardless of provider capabilities.

Fix: Extended the check to platform in ("telegram", "matrix").

2. `gateway/run.py` — auto voice-reply in `_send_voice_reply()`

The output path was hardcoded to .mp3 regardless of the target platform. For providers that support native Opus output (OpenAI, ElevenLabs, Mistral, Gemini), passing .ogg as the extension allows the tool to request response_format=opus directly — no ffmpeg conversion needed.

Fix: Platform-aware extension selection (.ogg for Telegram/Matrix, .mp3 otherwise).

Testing

py_compile passes for both modified files
All 7 tests in tests/gateway/test_matrix_voice.py pass
All TTS-related tests pass (43 passed, 3 pre-existing failures unrelated to this change)
Broader gateway + tools test suite: 312 passed (3 pre-existing failures)

Closes #14841

Changed files

gateway/run.py (modified, +5/-2)
tools/tts_tool.py (modified, +4/-4)

Code Example

# Use .mp3 extension so edge-tts conversion to opus works correctly.
# The TTS tool may convert to .ogg — use file_path from result.
audio_path = os.path.join(
    tempfile.gettempdir(), "hermes_voice",
    f"tts_reply_{_uuid.uuid4().hex[:12]}.mp3",
)

---

platform = get_session_env("HERMES_SESSION_PLATFORM", "").lower()
want_opus = (platform == "telegram")
...
if want_opus and provider in ("openai", "elevenlabs", "mistral", "gemini"):
    file_path = out_dir / f"tts_{timestamp}.ogg"
else:
    file_path = out_dir / f"tts_{timestamp}.mp3"

RAW_BUFFERClick to expand / collapse

Summary

Hermes sends voice replies to Matrix as mp3, which Matrix (per MSC3245) treats as a generic file attachment, not as a playable voice bubble. On mobile clients the message often shows as a broken attachment or won't play at all. Matrix voice messages require Opus codec in an OGG container.

Two places in the code default to mp3 for Matrix even when the provider can produce opus natively:

1. `gateway/run.py` — auto voice-reply output extension

# Use .mp3 extension so edge-tts conversion to opus works correctly.
# The TTS tool may convert to .ogg — use file_path from result.
audio_path = os.path.join(
    tempfile.gettempdir(), "hermes_voice",
    f"tts_reply_{_uuid.uuid4().hex[:12]}.mp3",
)

When the tts provider is OpenAI-compatible (e.g. a self-hosted Kokoro TTS speaking the OpenAI audio.speech API), tts_tool._generate_openai_tts() already routes .ogg → response_format="opus". Hardcoding .mp3 here discards that.

2. `tools/tts_tool.py` — model-invoked `text_to_speech` tool

platform = get_session_env("HERMES_SESSION_PLATFORM", "").lower()
want_opus = (platform == "telegram")
...
if want_opus and provider in ("openai", "elevenlabs", "mistral", "gemini"):
    file_path = out_dir / f"tts_{timestamp}.ogg"
else:
    file_path = out_dir / f"tts_{timestamp}.mp3"

Matrix is absent from the want_opus predicate, so when the model calls the speak tool directly the output is always mp3 regardless of provider support.

Environment

hermes-agent on matrix channel, OpenAI-compatible TTS provider (self-hosted Kokoro, tts.openai.base_url = http://nova:11438/v1, model = kokoro, voice = af_heart).
Nova/Kokoro supports response_format=opus natively and produces valid Ogg/Opus containers.
Matrix homeserver: Synapse + MAS (ESS).
Tested with Element on desktop + iOS mobile: mp3 uploads render as broken/unplayable attachments; opus uploads render as proper voice bubbles.

Proposed fix

In gateway/run.py, either:
- use .ogg by default and rely on tts_tool._generate_openai_tts() to negotiate opus, OR
- pick the extension per-platform (mp3 for platforms that can't do voice bubbles; ogg for telegram and matrix).
In tools/tts_tool.py, change want_opus = (platform == "telegram") to include matrix — platform in ("telegram", "matrix") — or refactor to a platform→preferred-format map.

Workaround

Patching both lines locally after each install. Diff in datacenter ansible role: https://github.com/winnersight/datacenter/blob/main/ansible/roles/hermes/files/patch-tts-ogg.sh

Happy to send a PR if you'd like — the change is small and self-contained.

extent analysis

TL;DR

To fix the issue of Hermes sending voice replies to Matrix as mp3 instead of Opus in an OGG container, modify the code to use the correct file extension and codec for Matrix.

Guidance

Update the gateway/run.py file to use .ogg as the default extension or pick the extension per-platform, allowing Matrix to receive voice messages in the correct format.
Modify the tools/tts_tool.py file to include Matrix in the want_opus predicate, ensuring that the text_to_speech tool produces Opus files for Matrix.
Verify that the TTS provider (e.g., OpenAI-compatible Kokoro) supports response_format=opus natively and produces valid Ogg/Opus containers.
Test the changes with Element on desktop and iOS mobile to ensure that voice messages are rendered as proper voice bubbles.

Example

# In gateway/run.py
audio_path = os.path.join(
    tempfile.gettempdir(), "hermes_voice",
    f"tts_reply_{_uuid.uuid4().hex[:12]}.ogg",
)

# In tools/tts_tool.py
want_opus = (platform in ("telegram", "matrix"))

Notes

The proposed fix requires modifying the Hermes code to accommodate the specific requirements of the Matrix platform. The changes are self-contained and can be applied locally or through a PR.

Recommendation

Apply the workaround by patching the lines locally after each install, as described in the provided diff in the datacenter ansible role, until a formal fix is available.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #generation error #database connection #vector store #embedding generation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix Matrix voice replies sent as mp3 instead of ogg/opus — render as broken attachments [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #14900: fix: send voice replies as Opus/OGG on Matrix

Description (problem / solution / changelog)

Summary

1. `tools/tts_tool.py` — model-invoked `text_to_speech` tool

2. `gateway/run.py` — auto voice-reply in `_send_voice_reply()`

Testing

Changed files

Code Example

1. `gateway/run.py` — auto voice-reply output extension

2. `tools/tts_tool.py` — model-invoked `text_to_speech` tool

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix Matrix voice replies sent as mp3 instead of ogg/opus — render as broken attachments [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

PR fix notes

PR #14900: fix: send voice replies as Opus/OGG on Matrix

Description (problem / solution / changelog)

Summary

1. tools/tts_tool.py — model-invoked text_to_speech tool

2. gateway/run.py — auto voice-reply in _send_voice_reply()

Testing

Changed files

Code Example

1. gateway/run.py — auto voice-reply output extension

2. tools/tts_tool.py — model-invoked text_to_speech tool

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. `tools/tts_tool.py` — model-invoked `text_to_speech` tool

2. `gateway/run.py` — auto voice-reply in `_send_voice_reply()`

1. `gateway/run.py` — auto voice-reply output extension

2. `tools/tts_tool.py` — model-invoked `text_to_speech` tool