hermes - ✅(Solved) Fix TTS: Suboptimal Opus encoding for Gemini/Edge TTS causes quality loss in Telegram voice messages [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18818Fetched 2026-05-03 04:54:06
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×5cross-referenced ×1

Root Cause

The issue is in tools/tts_tool.py:

  • _generate_gemini_tts() (around line 1199)
  • _convert_to_opus() (around line 712)

Both use this ffmpeg command:

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

Problems:

  • -vbr off forces Constant Bitrate (CBR)
  • Missing -application voip (important for speech optimization)
  • Missing -compression_level 10 for better encoding quality
  • This is especially noticeable on Telegram, which expects well-encoded Opus for voice bubbles

Fix Action

Fixed

PR fix notes

PR #18861: fix(tts): improve Opus encoding quality for Telegram voice messages

Description (problem / solution / changelog)

Summary

Fixes #18818 — suboptimal Opus encoding for Gemini/Edge TTS causing quality loss in Telegram voice messages.

Problem

Both _convert_to_opus() and _generate_gemini_tts() use ffmpeg with suboptimal Opus encoding settings:

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

Issues:

  • -vbr off forces Constant Bitrate (CBR) — wastes bits on silence, starves complex segments
  • Missing -application voip — Opus encoder defaults to audio mode, not optimized for speech
  • Missing -compression_level 10 — lower compression quality

This causes:

  1. Noticeably lower speech quality compared to properly-encoded Opus
  2. Inconsistent Telegram voice bubble rendering (some files appear as regular audio instead of voice messages)

Fix

Changed ffmpeg encoding parameters at both call sites:

ParameterBeforeAfterWhy
-b:a64k48k48k VBR is standard for VoIP; VBR uses bits more efficiently
-vbroff (CBR)on (VBR)VBR allocates bits where needed for speech
-application(missing)voipOptimizes Opus for voice intelligibility
-compression_level(missing)10Maximum encoding quality

Testing

  • ffmpeg -application voip is a standard libopus option (value 2048) that enables CELT+SILK hybrid mode optimized for speech
  • 48k VBR is the standard bitrate for high-quality VoIP Opus
  • No functional changes to the conversion pipeline — only encoding parameters changed

Affected Code

  • tools/tts_tool.py:712_convert_to_opus() (used by Edge TTS, xAI TTS, and other providers)
  • tools/tts_tool.py:1201_generate_gemini_tts() inline conversion

Changed files

  • tools/tts_tool.py (modified, +2/-2)

Code Example

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y
RAW_BUFFERClick to expand / collapse

Bug Description

When using Gemini TTS (or Edge TTS) with Telegram, the resulting audio files have noticeably lower quality compared to xAI TTS. Additionally, some outputs are delivered as regular audio files instead of native voice messages (with waveform).

Root Cause

The issue is in tools/tts_tool.py:

  • _generate_gemini_tts() (around line 1199)
  • _convert_to_opus() (around line 712)

Both use this ffmpeg command:

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -vbr off output.ogg -y

Problems:

  • -vbr off forces Constant Bitrate (CBR)
  • Missing -application voip (important for speech optimization)
  • Missing -compression_level 10 for better encoding quality
  • This is especially noticeable on Telegram, which expects well-encoded Opus for voice bubbles

Steps to Reproduce

  1. Use Gemini TTS provider (or Edge TTS)
  2. Generate audio and request .ogg output (for Telegram)
  3. Send the file to Telegram
  4. Observe quality drop compared to xAI TTS and/or file appearing as regular audio instead of voice message

Expected Behavior

  • High-quality Opus encoding optimized for speech
  • Files should consistently appear as native Telegram voice messages (waveform + voice bubble)

Actual Behavior

  • Noticeable quality degradation after conversion
  • Inconsistent behavior between providers (xAI works better)

Affected Component

  • Tools (TTS)

Messaging Platform

  • Telegram

This is specific to Telegram voice message requirements (Opus in OGG container with specific encoding flags).


Note: A local fix has been tested and significantly improves both quality and voice message display.

extent analysis

TL;DR

Modify the ffmpeg command in tools/tts_tool.py to optimize Opus encoding for speech and ensure compatibility with Telegram's voice message requirements.

Guidance

  • Update the _generate_gemini_tts() and _convert_to_opus() functions in tools/tts_tool.py to use the corrected ffmpeg command with -application voip, -compression_level 10, and remove -vbr off to allow for variable bitrate (VBR) encoding.
  • Verify the fix by generating audio files using the updated code and checking their quality and compatibility with Telegram's voice messages.
  • Test the updated code with different TTS providers (e.g., Gemini, Edge) to ensure consistent behavior and quality.
  • Consider adding error handling and logging to monitor the encoding process and detect potential issues.

Example

ffmpeg -i input -acodec libopus -ac 1 -b:a 64k -application voip -compression_level 10 output.ogg -y

Notes

The provided fix is specific to Telegram's voice message requirements and may not be applicable to other messaging platforms. The updated ffmpeg command should improve the quality and compatibility of the generated audio files, but further testing and verification are necessary to ensure the fix works as expected.

Recommendation

Apply the workaround by updating the ffmpeg command in tools/tts_tool.py to optimize Opus encoding for speech and ensure compatibility with Telegram's voice message requirements, as this has been tested and shown to significantly improve both quality and voice message display.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix TTS: Suboptimal Opus encoding for Gemini/Edge TTS causes quality loss in Telegram voice messages [1 pull requests, 1 participants]