hermes - 💡(How to fix) Fix Auto voice reply silently dropped on long-running voice-in turns (streaming + tool calls)

Error Message

_send_voice_reply either was not invoked, or was invoked and exited silently. The except Exception wrapper in _send_voice_reply would have logged "Auto voice reply failed" at WARNING level — no such line appears in gateway.log, errors.log, or mcp-stderr.log for that window. The function has multiple silent-return paths (if not tts_text: return, plus the except Exception not catching CancelledError in Python 3.8+) so observability is limited. 2. except Exception does not catch asyncio.CancelledError in Python 3.8+ (it inherits from BaseException, not Exception). If the parent task is cancelled mid-await on asyncio.to_thread(text_to_speech_tool, ...) — which can take 60–90s for a 1000+-char ElevenLabs Multilingual v2 generation — the cancellation propagates silently. The finally cleanup runs but no log is emitted. 4. Observe: text arrives, audio does not, no error logged

Change except Exception to except BaseException (or add explicit except asyncio.CancelledError re-raise with logging) in _send_voice_reply

Fix Action

Workaround

A possible workaround is to have the agent explicitly call text_to_speech when the v0.13 voice-transcript wrapper format ([The user sent a voice message~ Here's what they said: "..."]) is detected in the user message. This routes the audio reply through the has_agent_tts dedup path that the runner respects, sidestepping _send_voice_reply entirely.

Code Example

[T+0s]    INFO gateway.platforms.telegram: [Telegram] Cached user voice at .../audio_<hash>.ogg
[T+0s]    INFO gateway.run: inbound message: platform=telegram chat=<id> msg=''
[T+4s]    INFO tools.transcription_tools: Transcribed audio_<hash>.ogg via OpenAI API (149 chars)
[T+4s]    INFO gateway.run: lane_router ... input_kind=voice-transcript text_len=149 lane=L5 ...
   [10 failed tool calls between T+115s and T+128s — separate, unrelated issue]
[T+143s]  INFO gateway.run: Suppressing normal final send for session ...: final delivery already confirmed (streamed=True previewed=False).
[T+143s]  INFO gateway.run: response ready: platform=telegram time=142.6s api_calls=5 response=1088 chars
   [no "Auto voice reply" log line of any kind — neither success nor warning]
[T+207s]  INFO gateway.run: inbound message: ... msg='Audio?'   # user follow-up
[T+243s]  INFO gateway.run: response ready: ... time=36.4s api_calls=2 response=93 chars
   # tts_<timestamp>.ogg appeared in the audio cache — audio delivered for this follow-up turn only

---

{"telegram:<chat_id>": "voice_only"}

---

# Dedup: base adapter auto-TTS already handles voice input
# (play_tts plays in VC when connected, so runner can skip).
# When streaming already delivered the text (already_sent=True),
# the base adapter will receive None and can't run auto-TTS,
# so the runner must take over.
if is_voice_input and not already_sent:
    return False

Environment

Hermes Agent v0.13.0 (2026.5.7) (commit 498bfc7)
Python 3.11
Platform: Telegram, DM chat
Voice mode: voice_only (set via /voice on), persisted in gateway_voice_mode.json
TTS provider: ElevenLabs (reproduced on both eleven_multilingual_v2 and eleven_v3)
Streaming enabled (default), reasoning lane = deep_reasoning (L5)

What happened

User sent a voice message in a non-English language (~149-char STT result). The agent ran for ~142s, made 5 OpenRouter API calls and several MCP tool calls, including ~10 failed tool calls returning HTTP 400 (separate, unrelated tool bug). The agent eventually produced a ~1000-char text response, which streamed to Telegram successfully.

No auto voice reply (auto-TTS) was sent. The user had to send a follow-up message asking for audio to get the audio version, which then succeeded via an explicit text_to_speech tool call from the agent.

For shorter voice-in turns on the same chat earlier the same session (under 200 chars, single API call, no tool failures), auto voice reply fired correctly. The difference correlates with response length, total turn duration, and presence of failed tool calls — but I have not isolated which is the trigger.

Expected

For chats in voice_mode=voice_only, voice input should always trigger an auto-TTS audio reply alongside the text response — that is what the runner's _send_voice_reply fallback exists for when streaming has consumed the response and the base adapter's _process_message auto-TTS path can't fire.

Actual

Relevant gateway log excerpt (sanitized)

[T+0s]    INFO gateway.platforms.telegram: [Telegram] Cached user voice at .../audio_<hash>.ogg
[T+0s]    INFO gateway.run: inbound message: platform=telegram chat=<id> msg=''
[T+4s]    INFO tools.transcription_tools: Transcribed audio_<hash>.ogg via OpenAI API (149 chars)
[T+4s]    INFO gateway.run: lane_router ... input_kind=voice-transcript text_len=149 lane=L5 ...
   [10 failed tool calls between T+115s and T+128s — separate, unrelated issue]
[T+143s]  INFO gateway.run: Suppressing normal final send for session ...: final delivery already confirmed (streamed=True previewed=False).
[T+143s]  INFO gateway.run: response ready: platform=telegram time=142.6s api_calls=5 response=1088 chars
   [no "Auto voice reply" log line of any kind — neither success nor warning]
[T+207s]  INFO gateway.run: inbound message: ... msg='Audio?'   # user follow-up
[T+243s]  INFO gateway.run: response ready: ... time=36.4s api_calls=2 response=93 chars
   # tts_<timestamp>.ogg appeared in the audio cache — audio delivered for this follow-up turn only

voice_mode state at the time:

{"telegram:<chat_id>": "voice_only"}

Diagnostic notes

_should_send_voice_reply (gateway/run.py:8983) has a dedup branch:

# Dedup: base adapter auto-TTS already handles voice input
# (play_tts plays in VC when connected, so runner can skip).
# When streaming already delivered the text (already_sent=True),
# the base adapter will receive None and can't run auto-TTS,
# so the runner must take over.
if is_voice_input and not already_sent:
    return False

For the failing turn, already_sent=True is confirmed set by the "Suppressing normal final send" log line (emitted at gateway/run.py:14981 inside _run_agent, before that function returns to _handle_message_with_agent). So this branch should not fire and the runner should proceed to _send_voice_reply.

If _send_voice_reply did run, the silent failure is the issue. Two candidate exits in that function:

if not tts_text: return after _strip_markdown_for_tts(text[:4000]) — silent return with no log if the markdown stripper returns empty for the input.
except Exception does not catch asyncio.CancelledError in Python 3.8+ (it inherits from BaseException, not Exception). If the parent task is cancelled mid-await on asyncio.to_thread(text_to_speech_tool, ...) — which can take 60–90s for a 1000+-char ElevenLabs Multilingual v2 generation — the cancellation propagates silently. The finally cleanup runs but no log is emitted.

Repro hypothesis (untested)

Set voice_mode=voice_only for a Telegram chat
Send a voice message that triggers a long agent run (>60s) with multiple tool calls
Expect auto-TTS audio reply alongside the streamed text response
Observe: text arrives, audio does not, no error logged

Suggested investigation

Log at INFO level when _send_voice_reply is entered and at INFO/DEBUG when each exit path is taken (incl. the if not tts_text early return)
Change except Exception to except BaseException (or add explicit except asyncio.CancelledError re-raise with logging) in _send_voice_reply
Consider whether the parent message-handler task can complete (return) while _send_voice_reply is still awaiting asyncio.to_thread(text_to_speech_tool, ...); if so, the long TTS generation may be at risk of cancellation when the next inbound message arrives

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Auto voice reply silently dropped on long-running voice-in turns (streaming + tool calls)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

Code Example

Environment

What happened

Expected

Actual

Relevant gateway log excerpt (sanitized)

Diagnostic notes

Repro hypothesis (untested)

Suggested investigation

Workaround

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Auto voice reply silently dropped on long-running voice-in turns (streaming + tool calls)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

Code Example

Environment

What happened

Expected

Actual

Relevant gateway log excerpt (sanitized)

Diagnostic notes

Repro hypothesis (untested)

Suggested investigation

Workaround

Still need to ship something?

RELATED_DISCOVERY

TRENDING