hermes - 💡(How to fix) Fix Telegram voice batching can drop the in-flight response when subsequent voices arrive mid-processing [3 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In a Telegram group chat with multiple rapid-fire voice messages, the gateway's batching pipeline can silently drop the in-flight assistant response for a middle message when newer voices arrive before that response is sent.

Root Cause

In a Telegram group chat with multiple rapid-fire voice messages, the gateway's batching pipeline can silently drop the in-flight assistant response for a middle message when newer voices arrive before that response is sent.

Fix Action

Fixed

Code Example

HH:MM:00  [Telegram] Cached user voice at audio_AAA.ogg
HH:MM:00  inbound message: msg=''
HH:MM:12  response ready: time=11.6s response=78 chars       # voice 1 — fine

HH:MM:72  [Telegram] Cached user voice at audio_BBB.ogg      # voice 2
HH:MM:72  inbound message: msg=''
HH:MM:75  [Telegram] Cached user voice at audio_CCC.ogg      # voice 3
HH:MM:77  tools.transcription_tools: Transcribed audio_BBB ('hermes test 2')
HH:MM:78  [Telegram] Cached user voice at audio_DDD.ogg      # voice 4
HH:MM:79  run_agent: conversation turn: history=6 msg='[The user sent a voice message~ Here\'s what they said: "hermes test 2"]'
HH:MM:86  tools.transcription_tools: Transcribed audio_CCC ('test 3')
HH:MM:90  tools.transcription_tools: Transcribed audio_DDD ('test 4')
HH:MM:91  run_agent: conversation turn: history=8 msg='[The user sent a voice message~ Here\'s what they said: "test 3"]  [The user sent a voice message~ Here\'s what they said: "test 4"]'
HH:MM:93  response ready: time=21.0s response=71 chars       # voices 3+4 — fine
RAW_BUFFERClick to expand / collapse

Summary

In a Telegram group chat with multiple rapid-fire voice messages, the gateway's batching pipeline can silently drop the in-flight assistant response for a middle message when newer voices arrive before that response is sent.

Environment

  • Hermes Agent 0.14.0
  • Provider: anthropic, model claude-haiku-4-5-20251001
  • Telegram via python-telegram-bot polling mode
  • STT: local faster-whisper, model=base/small
  • Voice messages, ~2s each, sent ~3 seconds apart

Repro

  1. In a group chat where messages reach the agent (we used free_response_chats to bypass the mention filter for the test), send voice 1 and wait ~60 seconds. → response arrives normally.
  2. Send voice 2, then voice 3 ~3 seconds later, then voice 4 ~3 seconds after that (rapid-fire).
  3. Voices 3 + 4 are batched into one agent turn (expected, documented).
  4. Voice 2 is transcribed and a conversation turn at history=N is logged for it — but no response ready / Sending response line appears for it. The session jsonl has no record of voice 2 or any assistant reply to it.
  5. The combined response that gets sent acknowledges voices 3 + 4 but not voice 2.

Observed (anonymized log excerpt)

HH:MM:00  [Telegram] Cached user voice at audio_AAA.ogg
HH:MM:00  inbound message: msg=''
HH:MM:12  response ready: time=11.6s response=78 chars       # voice 1 — fine

HH:MM:72  [Telegram] Cached user voice at audio_BBB.ogg      # voice 2
HH:MM:72  inbound message: msg=''
HH:MM:75  [Telegram] Cached user voice at audio_CCC.ogg      # voice 3
HH:MM:77  tools.transcription_tools: Transcribed audio_BBB ('hermes test 2')
HH:MM:78  [Telegram] Cached user voice at audio_DDD.ogg      # voice 4
HH:MM:79  run_agent: conversation turn: history=6 msg='[The user sent a voice message~ Here\'s what they said: "hermes test 2"]'
HH:MM:86  tools.transcription_tools: Transcribed audio_CCC ('test 3')
HH:MM:90  tools.transcription_tools: Transcribed audio_DDD ('test 4')
HH:MM:91  run_agent: conversation turn: history=8 msg='[The user sent a voice message~ Here\'s what they said: "test 3"]  [The user sent a voice message~ Here\'s what they said: "test 4"]'
HH:MM:93  response ready: time=21.0s response=71 chars       # voices 3+4 — fine

There is no response ready line between the two conversation turn entries. Voice 2's history=6 turn appears to start but its response (and the user message itself) never make it to the session jsonl. Final session has entries for voice 1, the batched voice 3+4, and their two responses — but nothing for voice 2.

Expected

Voice 2 should either:

  1. Get its own response, or
  2. Be batched into the voices 3+4 turn (so the combined message acknowledges all three voice notes — like other media batching paths)

Either is fine; silent loss isn't.

Source pointer

gateway/platforms/telegram.py_handle_media_message enqueues per-voice; the batch flush path appears to cancel the prior pending media turn when a new voice arrives within the batching window, but only if the previous turn was actually pending in the queue. If the prior turn has already started but not finished, its response gets orphaned.

I haven't tested the fix, but I think the resolution is to either (a) wait for in-flight turns to finish before starting a new batch, or (b) merge late-arriving voices into the in-flight turn instead of starting a new one.

Related

While investigating, also noticed: _should_process_message filters voice messages against mention_patterns before transcription. Since voice has no text/caption at that point, voice messages never match a regex like "hermes". This forces users to either set free_response_chats (which lets everything through) or only accept voice via reply-to-bot. Happy to file that as a separate issue if useful.

Thanks for the great framework.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING