hermes - ✅(Solved) Fix TTS: Suboptimal Opus encoding for Gemini/Edge TTS causes quality loss in Telegram voice messages [1 pull requests, 1 participants]

hermes2026-05-02 12:19:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18818•Fetched 2026-05-03 04:54:06

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Qwinty

Participants

Qwinty

Timeline (top)

labeled ×5cross-referenced ×1

Root Cause

The issue is in tools/tts_tool.py:

_generate_gemini_tts() (around line 1199)
_convert_to_opus() (around line 712)

Both use this ffmpeg command:

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

Problems:

-vbr off forces Constant Bitrate (CBR)
Missing -application voip (important for speech optimization)
Missing -compression_level 10 for better encoding quality
This is especially noticeable on Telegram, which expects well-encoded Opus for voice bubbles

Fix Action

Fixed

Fixed by PR: fix(tts): improve Opus encoding quality for Telegram voice messages (https://github.com/NousResearch/hermes-agent/pull/18861)

PR fix notes

PR #18861: fix(tts): improve Opus encoding quality for Telegram voice messages

Repository: NousResearch/hermes-agent
Author: shellybotmoyer
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18861

Description (problem / solution / changelog)

Summary

Fixes #18818 — suboptimal Opus encoding for Gemini/Edge TTS causing quality loss in Telegram voice messages.

Problem

Both _convert_to_opus() and _generate_gemini_tts() use ffmpeg with suboptimal Opus encoding settings:

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

Issues:

-vbr off forces Constant Bitrate (CBR) — wastes bits on silence, starves complex segments
Missing -application voip — Opus encoder defaults to audio mode, not optimized for speech
Missing -compression_level 10 — lower compression quality

This causes:

Noticeably lower speech quality compared to properly-encoded Opus
Inconsistent Telegram voice bubble rendering (some files appear as regular audio instead of voice messages)

Fix

Changed ffmpeg encoding parameters at both call sites:

Parameter	Before	After	Why
`-b:a`	`64k`	`48k`	48k VBR is standard for VoIP; VBR uses bits more efficiently
`-vbr`	`off` (CBR)	`on` (VBR)	VBR allocates bits where needed for speech
`-application`	(missing)	`voip`	Optimizes Opus for voice intelligibility
`-compression_level`	(missing)	`10`	Maximum encoding quality

Testing

ffmpeg -application voip is a standard libopus option (value 2048) that enables CELT+SILK hybrid mode optimized for speech
48k VBR is the standard bitrate for high-quality VoIP Opus
No functional changes to the conversion pipeline — only encoding parameters changed

Affected Code

tools/tts_tool.py:712 — _convert_to_opus() (used by Edge TTS, xAI TTS, and other providers)
tools/tts_tool.py:1201 — _generate_gemini_tts() inline conversion

Changed files

tools/tts_tool.py (modified, +2/-2)

Code Example

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

RAW_BUFFERClick to expand / collapse

Bug Description

When using Gemini TTS (or Edge TTS) with Telegram, the resulting audio files have noticeably lower quality compared to xAI TTS. Additionally, some outputs are delivered as regular audio files instead of native voice messages (with waveform).

Root Cause

The issue is in tools/tts_tool.py:

_generate_gemini_tts() (around line 1199)
_convert_to_opus() (around line 712)

Both use this ffmpeg command:

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

Problems:

-vbr off forces Constant Bitrate (CBR)
Missing -application voip (important for speech optimization)
Missing -compression_level 10 for better encoding quality
This is especially noticeable on Telegram, which expects well-encoded Opus for voice bubbles

Steps to Reproduce

Use Gemini TTS provider (or Edge TTS)
Generate audio and request .ogg output (for Telegram)
Send the file to Telegram
Observe quality drop compared to xAI TTS and/or file appearing as regular audio instead of voice message

Expected Behavior

High-quality Opus encoding optimized for speech
Files should consistently appear as native Telegram voice messages (waveform + voice bubble)

Actual Behavior

Noticeable quality degradation after conversion
Inconsistent behavior between providers (xAI works better)

Affected Component

Tools (TTS)

Messaging Platform

This is specific to Telegram voice message requirements (Opus in OGG container with specific encoding flags).

Note: A local fix has been tested and significantly improves both quality and voice message display.

extent analysis

TL;DR

Modify the ffmpeg command in tools/tts_tool.py to optimize Opus encoding for speech and ensure compatibility with Telegram's voice message requirements.

Guidance

Update the _generate_gemini_tts() and _convert_to_opus() functions in tools/tts_tool.py to use the corrected ffmpeg command with -application voip, -compression_level 10, and remove -vbr off to allow for variable bitrate (VBR) encoding.
Verify the fix by generating audio files using the updated code and checking their quality and compatibility with Telegram's voice messages.
Test the updated code with different TTS providers (e.g., Gemini, Edge) to ensure consistent behavior and quality.
Consider adding error handling and logging to monitor the encoding process and detect potential issues.

Example

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -application voip -compression_level 10 output.ogg -y

Notes

The provided fix is specific to Telegram's voice message requirements and may not be applicable to other messaging platforms. The updated ffmpeg command should improve the quality and compatibility of the generated audio files, but further testing and verification are necessary to ensure the fix works as expected.

Recommendation

Apply the workaround by updating the ffmpeg command in tools/tts_tool.py to optimize Opus encoding for speech and ensure compatibility with Telegram's voice message requirements, as this has been tested and shown to significantly improve both quality and voice message display.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#optimization #API middleware #SSR setup #ISR setup #authentication setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix TTS: Suboptimal Opus encoding for Gemini/Edge TTS causes quality loss in Telegram voice messages [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #18861: fix(tts): improve Opus encoding quality for Telegram voice messages

Description (problem / solution / changelog)

Summary

Problem

Fix

Testing

Affected Code

Changed files

Code Example

Bug Description

Root Cause

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING