hermes - ✅(Solved) Fix Telegram: audio file attachments misclassified as voice messages, routed to STT pipeline [5 pull requests, 1 participants]

hermes2026-05-13 07:20:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#24870•Fetched 2026-05-14 03:51:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jisi-assist

Participants

jisi-assist

Timeline (top)

cross-referenced ×5labeled ×5

Fix Action

Fixed

Fixed by PR: fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870) (https://github.com/NousResearch/hermes-agent/pull/24879)
Fixed by PR: fix(telegram): keep audio attachments as files (https://github.com/NousResearch/hermes-agent/pull/24883)
Fixed by PR: fix(gateway): route audio file attachments as files, not STT input (https://github.com/NousResearch/hermes-agent/pull/25097)
Fixed by PR: feat(telegram): skip-STT audio path + 2GB cap via local Bot API server (https://github.com/NousResearch/hermes-agent/pull/25274)
Fixed by PR: feat(telegram): skip-STT audio path + 2GB cap via local Bot API server (https://github.com/NousResearch/hermes-agent/pull/25280)

PR fix notes

PR #24879: fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)

Repository: NousResearch/hermes-agent
Author: Bartok9
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/24879

Description (problem / solution / changelog)

Summary

Fixes #24870 — Telegram audio file attachments were being misclassified as voice messages and auto-transcribed by the STT pipeline.

Root Cause

gateway/run.py's inbound message routing block matched both MessageType.VOICE and MessageType.AUDIO into audio_paths, which were then fed unconditionally to _enrich_message_with_transcription.

Per the Telegram Bot API, three distinct payload fields exist:

Field	Type	Correct handling
`message.voice`	Opus/OGG voice message	STT pipeline
`message.audio`	Audio file attachment (.mp3, .m4a, etc.)	Save as file, NOT STT — was broken
`message.document` (audio mime)	Generic file	Existing document route

Fix

Introduce a new audio_file_paths list populated exclusively by MessageType.AUDIO events.
Narrow the audio_paths selector to MessageType.VOICE (and bare audio/ MIME-type events that are not explicitly AUDIO or DOCUMENT).
After the STT block, inject a document-style context note for each audio file path, giving the agent the file path and asking what to do with it — consistent with how plain documents are handled.

Before / After

Before — sending song.mp3 via Telegram attachment:

[The user said: "[STT transcript of your mp3 here]"]

…the transcribe skill never received the file path.

After — sending song.mp3 via Telegram attachment:

[The user sent an audio file attachment: 'song.mp3'. It is saved at: /path/to/cache/song.mp3.
Ask the user what they'd like you to do with it, or pass the path to a transcription or media tool.]

Testing

5 new tests in tests/gateway/test_telegram_audio_vs_voice.py:

test_voice_message_still_transcribed — regression guard, VOICE still goes to STT
test_audio_attachment_skips_stt — core fix, AUDIO never calls transcribe_audio
test_audio_attachment_context_note_format — verifies note content and display name
test_audio_attachment_skips_stt_when_stt_disabled — STT-disabled notice must not appear for file attachments
test_telegram_media_type_detection_audio_vs_voice — sanity: AUDIO != VOICE enum values

All 5 new tests + existing test_stt_config.py (5 tests) pass.

Changed files

gateway/run.py (modified, +24/-1)
tests/gateway/test_telegram_audio_vs_voice.py (added, +184/-0)

PR #24883: fix(telegram): keep audio attachments as files

Repository: NousResearch/hermes-agent
Author: felix-windsor
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/24883

Description (problem / solution / changelog)

Summary

treat Telegram message.audio as an attached file/document instead of a voice message
allow audio documents such as mp3/m4a/ogg/wav/flac to be cached and passed to the agent as files
skip automatic STT for document audio while preserving STT for Telegram voice notes

Fixes #24870

Tests

scripts/run_tests.sh tests/gateway/test_telegram_documents.py tests/gateway/test_stt_config.py tests/gateway/test_tts_media_routing.py
.venv/bin/ruff check gateway/platforms/base.py gateway/platforms/telegram.py gateway/run.py tests/gateway/test_telegram_documents.py tests/gateway/test_stt_config.py

Changed files

gateway/platforms/base.py (modified, +6/-0)
gateway/platforms/telegram.py (modified, +19/-5)
gateway/run.py (modified, +12/-2)
tests/gateway/test_stt_config.py (modified, +42/-0)
tests/gateway/test_telegram_documents.py (modified, +48/-0)

PR #25097: fix(gateway): route audio file attachments as files, not STT input

Repository: NousResearch/hermes-agent
Author: zccyman
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25097

Description (problem / solution / changelog)

Summary

Telegram distinguishes between msg.voice (voice messages) and msg.audio (audio file attachments). The gateway was routing both types to the STT pipeline, causing:

Audio files (.mp3, .m4a, etc.) sent as file attachments being auto-transcribed instead of preserved as files
No way to bypass STT for audio file attachments
The transcribe skill receiving transcribed text instead of the actual audio file

Root Cause

gateway/run.py:6769 included MessageType.AUDIO in the STT routing condition alongside MessageType.VOICE:

# Before (buggy)
if mtype.startswith("audio/") or event.message_type in {MessageType.VOICE, MessageType.AUDIO}:
    audio_paths.append(path)

Fix

Changed the condition to only match MessageType.VOICE:

# After (fixed)
if mtype.startswith("audio/") and event.message_type == MessageType.VOICE:
    audio_paths.append(path)

Audio files (MessageType.AUDIO) now fall through to the media URL text placeholder ([User sent audio: /path]) and remain accessible as file attachments, while voice messages continue to be transcribed normally.

Testing

6 new regression tests in tests/gateway/test_audio_voice_routing.py
237 existing related tests passing (STT, telegram documents, voice commands)
Zero regressions

Closes #24870

Changed files

gateway/platforms/telegram.py (modified, +3/-1)
gateway/run.py (modified, +4/-1)
hermes_cli/model_switch.py (modified, +32/-1)
tests/gateway/test_audio_voice_routing.py (added, +161/-0)
tests/hermes_cli/test_model_switch_token_validation.py (added, +121/-0)

PR #25274: feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Repository: NousResearch/hermes-agent
Author: alber70g
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25274

Description (problem / solution / changelog)

Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on attachments larger than the public Bot API's 20MB getFile ceiling.

What's new

stt.enabled: false no longer drops voice/audio with a generic "transcription disabled" note. The gateway probes the cached file's duration (wave → mutagen → ffprobe ladder) and surfaces [The user sent a voice message: <abs path> (duration: M:SS)] to the agent so a skill or tool can pick up the raw file. The previous placeholder is replaced rather than appended when present.
platforms.telegram.extra.base_url set → adapter auto-lifts its document size cap from 20MB to 2GB (the local telegram-bot-api --local ceiling) and the "too large" reply reports the active limit dynamically. No new config knob; presence of base_url is the opt-in.
platforms.telegram.extra.local_mode: true wires Application.builder().local_mode(True) on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when telegram-bot-api runs in --local mode (the server returns absolute filesystem paths, not /file/bot... URLs).

Files

gateway/run.py: rewrites the stt.enabled: false branch of _enrich_message_with_transcription. New _format_duration + _probe_audio_duration helpers.
gateway/platforms/telegram.py: _max_doc_bytes instance attribute derived from extra.base_url; local_mode builder wiring; dynamic "too large" message.
tests/gateway/test_stt_config.py: covers path-surfacing with and without an existing user message, and placeholder replacement.
tests/gateway/test_telegram_max_doc_bytes.py: 3 cases — default 20MB without base_url, 2GB when set, empty-string base_url keeps default.
website/docs/user-guide/messaging/telegram.md: new "Skipping STT" subsection under Voice Messages and a full "Large Files (>20MB) via Local Bot API Server" walkthrough (api_id/api_hash, docker-compose, one-time logOut migration, platforms.telegram.extra config, the local_mode disk-access requirement, the silent HTTP-fallback 404).
website/docs/user-guide/features/voice-mode.md: documents the stt.enabled knob in the config reference.

Validation

pytest tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py → 9/9 passing.
Verified end-to-end on a live deployment: gateway log shows Using custom Telegram base_url: http://... and Using Telegram local_mode (read files from disk) on startup; voice messages above 20MB cache to disk and surface their path to the agent.

What does this PR do?

Related Issue

Fixes #24870 #15145

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

tts disabled will forward the filepath and info about the audio
setting base_url in telegram will allow use of custom tg-bot-api and >20MB <2GB file sizes

How to Test

setup tts disabled and tg-bot-api docker container according to docs
send an audio file larger than 20MB
observe the logs

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: Ubuntu

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

This skill is broadly useful to most users (if bundled) — see Contributing Guide
SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

Changed files

gateway/platforms/telegram.py (modified, +19/-3)
gateway/run.py (modified, +71/-9)
tests/gateway/test_stt_config.py (modified, +28/-2)
tests/gateway/test_telegram_max_doc_bytes.py (added, +56/-0)
website/docs/user-guide/features/voice-mode.md (modified, +5/-0)
website/docs/user-guide/messaging/telegram.md (modified, +148/-0)

PR #25280: feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Repository: NousResearch/hermes-agent
Author: alber70g
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25280

Description (problem / solution / changelog)

What & Why

Two coordinated changes that unblock downstream audio pipelines (diarization, custom transcription, archival) on Telegram attachments larger than the public Bot API's 20 MB getFile ceiling.

1. `stt.enabled: false` surfaces audio file paths to the agent

Previously a no-op note: "transcription disabled." Now the gateway still caches the voice/audio attachment, probes its duration (`wave` → `mutagen` → `ffprobe` ladder), and surfaces:

``` [The user sent a voice message: /home/<user>/.hermes/cache/audio/<hash>.ogg (duration: 12:34)] ```

…so a skill or tool can pick up the raw file. The previous `(The user sent a message with no text content)` placeholder is replaced rather than appended when present.

2. Local Bot API server unlocks 2 GB downloads

When `platforms.telegram.extra.base_url` is set, the adapter:

Auto-lifts the document size cap from 20 MB → 2 GB (the `telegram-bot-api` `--local` ceiling).
Reports the active limit dynamically in the "too large" reply.
No new top-level config knob: presence of `base_url` is the opt-in.

A new `platforms.telegram.extra.local_mode: true` wires `Application.builder().local_mode(True)` on the python-telegram-bot builder. PTB then reads files from disk instead of HTTP, which is required when `telegram-bot-api` runs in `--local` mode (the server returns absolute filesystem paths, not `/file/bot...` URLs).

How to test

Path 1 — STT-skip path (no local server required)

Set `stt.enabled: false` in `~/.hermes/config.yaml`.
Restart the gateway.
Send the bot a voice note ≤ 20 MB.
Check the inbound log message contains `[The user sent a voice message: /path/to/cache/audio/<hash>.ogg (duration: M:SS)]`.

Path 2 — Local Bot API server (full pipeline)

Follow the new docs at `website/docs/user-guide/messaging/telegram.md` → Large Files (>20MB) via Local Bot API Server. Six steps cover: getting api_id/api_hash, running the docker container with `TELEGRAM_LOCAL=1`, the one-time `logOut` migration, Hermes config, the `local_mode` disk-access requirement, and a smoke test with a >20 MB voice message.

Successful startup log lines:

``` [Telegram] Using custom Telegram base_url: http://... [Telegram] Using Telegram local_mode (read files from disk) ```

Automated

`scripts/run_tests.sh tests/gateway/test_telegram_max_doc_bytes.py tests/gateway/test_stt_config.py` → 9/9 passing.
`scripts/check-windows-footguns.py --diff main` → clean.

Test plan

Unit tests for both code paths (`test_telegram_max_doc_bytes.py`, `test_stt_config.py`)
CI-parity test runner (`scripts/run_tests.sh`) green on touched files
Windows-footguns check clean
Manual end-to-end on Linux: bot connects to local server, voice messages above 20 MB cache to disk, audio path surfaced to agent
Pre-existing test failures (`test_tts_media_routing.py` × 3, `test_api_server.py` etc. import errors) reproduce against `HEAD~1` — not introduced by this PR.

Platforms tested

Linux (Ubuntu 24.04). The new code uses portable APIs (`os.path.abspath`, `asyncio.create_subprocess_exec` with try/except fallback for ffprobe); no Unix-only syscalls introduced.

Security note

The new docs include a prominent warning that the local Bot API server takes the bot token in the URL path with no additional auth — operators must keep it on a private network and not expose port 8081 publicly. No change to Hermes-side security posture; the warning is purely advisory for operators running the optional local server.

Out of scope (deferred)

Slack's 20 MB cap and WeCom's 20 MB cap (other adapters; operator confirmed Telegram is the blocker).
MTProto migration (much larger blast radius; local Bot API server covers the use case).
Streaming-to-disk for ≥ 1 GB downloads (PTB's `download_as_bytearray` still loads the full payload into memory; worth revisiting under measured memory pressure).

Related Issue

Fixes #24870 #15145

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

tts disabled will forward the filepath and info about the audio
setting base_url in telegram will allow use of custom tg-bot-api and >20MB <2GB file sizes

How to Test

setup tts disabled and tg-bot-api docker container according to docs
send an audio file larger than 20MB
observe the logs

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: Ubuntu

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

🤖 Generated with Claude Code

Changed files

gateway/platforms/telegram.py (modified, +19/-3)
gateway/run.py (modified, +71/-9)
tests/gateway/test_stt_config.py (modified, +28/-2)
tests/gateway/test_telegram_max_doc_bytes.py (added, +56/-0)
website/docs/user-guide/features/voice-mode.md (modified, +5/-0)
website/docs/user-guide/messaging/telegram.md (modified, +148/-0)

Code Example

if msg.voice:
    # STT pipeline
elif msg.audio:
    # Save as file only, do NOT run STT
elif msg.document:
    # Check mime type; if audio, save file only

RAW_BUFFERClick to expand / collapse

Bug Description

On Telegram, Hermes Agent fails to distinguish between message.audio (audio file attachments) and message.voice (voice messages). Both types are routed to the STT pipeline, resulting in:

Audio files sent as file attachments being auto-transcribed instead of saved as files
The transcribe skill never receives the actual audio file, making it unusable
No way to bypass STT for audio file attachments

Steps to Reproduce

Send an audio file via Telegram attachment (any format: .mp3, .m4a, .ogg, .wav, etc.)
Alternatively: save audio to Files app, then attach via Telegram
Observe that Hermes Agent treats it as a voice message and runs STT

Expected Behavior

Per Telegram API, there are three distinct message fields:

message.voice → voice messages (Opus/OGG), should go to STT
message.audio → audio files/music, should be saved as files, NOT to STT
message.document → generic files, need mime type check

The correct cascading logic should be:

if msg.voice:
    # STT pipeline
elif msg.audio:
    # Save as file only, do NOT run STT
elif msg.document:
    # Check mime type; if audio, save file only

Actual Behavior

Hermes Agent routes all audio-related fields (voice, audio, and audio document) to the STT pipeline without distinguishing between them.

Environment

Hermes Agent version: latest
OS: macOS
Platform: Telegram
STT provider: local (faster-whisper)

Additional Context

The transcribe skill depends on receiving actual audio file paths to process with Whisper CLI. Since all audio is routed through STT, the skill is effectively broken for Telegram platform.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Telegram: audio file attachments misclassified as voice messages, routed to STT pipeline [5 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #24879: fix(gateway): route Telegram audio file attachments away from STT pipeline (#24870)

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Before / After

Testing

Changed files

PR #24883: fix(telegram): keep audio attachments as files

Description (problem / solution / changelog)

Summary

Tests

Changed files

PR #25097: fix(gateway): route audio file attachments as files, not STT input

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

Changed files

PR #25274: feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Description (problem / solution / changelog)

What's new

Files

Validation

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Changed files

PR #25280: feat(telegram): skip-STT audio path + 2GB cap via local Bot API server

Description (problem / solution / changelog)

What & Why

1. `stt.enabled: false` surfaces audio file paths to the agent

2. Local Bot API server unlocks 2 GB downloads

How to test

Path 1 — STT-skip path (no local server required)

Path 2 — Local Bot API server (full pipeline)

Automated

Test plan

Platforms tested

Security note

Out of scope (deferred)

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Additional Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING