hermes - 💡(How to fix) Fix [Feature]: Option to disable automatic speech-to-text for voice messages [1 participants]

hermes2026-04-24 13:05:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15145•Fetched 2026-04-25 06:24:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

alber70g

Participants

alber70g

Timeline (top)

labeled ×4

Fix Action

Fix / Workaround

Use case / motivation I maintain a personal knowledge wiki where meeting recordings are diarized with parakeet-rs, speaker identities are resolved by extracting audio snippets, and transcripts are stored via a structured ingestion pipeline. The automatic STT bypasses this entire workflow and provides no speaker separation.

Code Example

[The user sent a voice message~ Here's what they said: "Ja, we moeten even..."]

---

[The user sent a voice message: /path/to/downloaded/audio.ogg (duration: 12:34)]

---

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Current behavior When a user sends a voice message, Hermes automatically transcribes it to text and injects the transcript into the conversation context. The agent never sees or receives the actual audio file.

Example of what the agent currently receives:

[The user sent a voice message~ Here's what they said: "Ja, we moeten even..."]

Problem This makes it impossible for the agent to:

Run speaker diarization (e.g. with parakeet-rs) to identify who spoke when
Perform audio quality checks or noise analysis
Extract speaker snippets to ask the user "who is Speaker 0?"
Use custom transcription pipelines (different models, languages, formatting)
Archive the original audio alongside transcripts

Desired behavior Provide a way (per-message or per-conversation) to receive the voice message as an audio file instead of (or in addition to) the automatic transcript.

Ideally, the agent should receive something like:

[The user sent a voice message: /path/to/downloaded/audio.ogg (duration: 12:34)]

Then the agent can decide what to do — transcribe it itself, run diarization, store it, etc.

Proposed Solution

Possible solutions

Per-message opt-out: A user prefix or command (e.g. /voice or !nostt) that tells Hermes "send me the file, not the transcript"
Agent-side preference: A setting the agent can toggle: "for this conversation, request raw audio for voice messages"
Always provide both: Send the audio file path and the transcript, letting the agent choose which to use
Platform-level config: A setting in Hermes config to disable automatic STT globally or per-platform
An mcp/skill to transcribe when the agent sees fit

Alternatives Considered

No response

Feature Type

Configuration option

Scope

Small (single file, < 50 lines)

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

extent analysis

TL;DR

To address the issue, consider implementing a configuration option or command that allows agents to opt-out of automatic speech-to-text transcription and receive the original audio file instead.

Guidance

Investigate adding a user prefix or command (e.g., /voice or !nostt) to instruct Hermes to send the audio file rather than the transcript.
Explore implementing an agent-side preference to request raw audio for voice messages in specific conversations.
Consider sending both the audio file path and the transcript, allowing the agent to choose which to use.
Review the Hermes configuration to determine if a platform-level setting can be added to disable automatic STT globally or per-platform.

Example

No code example is provided due to the lack of specific technical details in the issue.

Notes

The ideal solution will depend on the specific requirements and constraints of the Hermes system, including its current architecture and configuration options.

Recommendation

Apply a workaround by implementing a configuration option or command to opt-out of automatic STT transcription, as this approach seems to be the most feasible and flexible solution based on the provided information.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#permission error #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Option to disable automatic speech-to-text for voice messages [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Option to disable automatic speech-to-text for voice messages [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING