hermes - 💡(How to fix) Fix feat(tts): serial queue to prevent overlapping audio playback [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#23065Fetched 2026-05-11 03:31:28
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
labeled ×5commented ×1mentioned ×1subscribed ×1

Root Cause

This happens because the current TTS pipeline has no notion of a playback queue: each text_to_speech call produces a new file independently, with no awareness of whether a previous clip is still being consumed by the client.

Fix Action

Workaround

Currently users can only mitigate this by using slower TTS providers or shorter agent responses. No configuration exists to prevent the overlap.

RAW_BUFFERClick to expand / collapse

Problem

When voice.auto_tts is enabled (CLI or gateway), TTS audio is generated for each agent response. If the agent generates a new response before the previous TTS audio has finished playing, two audio clips overlap — effectively two voices talking at once.

This happens because the current TTS pipeline has no notion of a playback queue: each text_to_speech call produces a new file independently, with no awareness of whether a previous clip is still being consumed by the client.

Expected behavior

TTS output should be serialized — a new clip should not start playing (or being sent to the client) until the previous one has completed. In practice this means:

  1. The gateway/CLI tracks TTS playback state per session/chat
  2. If a new agent response arrives while audio is still playing, queue it
  3. Play the queued clip after the current one finishes

Affected platforms

  • CLI (local audio playback)
  • Gateway platforms that deliver voice bubbles (Telegram, Discord, Matrix, etc.)

Proposed approaches

  • Gateway: Add a per-chat TTS queue in gateway/run.py / base.py — buffer audio files and send them sequentially with a small gap.
  • CLI: Add a playback tracker in the CLI audio handler that defers sending new audio until the current one finishes.
  • Config option: voice.tts_queue: true (default on) to enable/disable.

Workaround

Currently users can only mitigate this by using slower TTS providers or shorter agent responses. No configuration exists to prevent the overlap.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

TTS output should be serialized — a new clip should not start playing (or being sent to the client) until the previous one has completed. In practice this means:

  1. The gateway/CLI tracks TTS playback state per session/chat
  2. If a new agent response arrives while audio is still playing, queue it
  3. Play the queued clip after the current one finishes

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat(tts): serial queue to prevent overlapping audio playback [1 comments, 2 participants]