hermes - 💡(How to fix) Fix [Feature]: First-class Soniox STT provider support

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

stt:
  enabled: true
  provider: soniox
  soniox:
    api_key: ${SONIOX_API_KEY}
    model: stt-rt-preview        # or stt-async-preview for batch
    language_hints: [en, hu]     # optional; auto-detect when omitted
    enable_speaker_diarization: false
    enable_endpoint_detection: true
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes today supports Whisper-family STT backends (OpenAI Whisper, Groq Whisper, local faster-whisper, generic HERMES_LOCAL_STT_COMMAND). For multilingual voice workflows (Telegram/Discord/CLI) and low-latency streaming, Soniox offers a stronger product:

  • real-time WebSocket STT with partial results
  • automatic language ID + code-switching across 60+ languages in a single stream (no need to pre-declare locale)
  • speaker diarization
  • built-in PII redaction
  • async batch endpoint as well

There is no first-class config path today. Users would have to wrap Soniox behind HERMES_LOCAL_STT_COMMAND, which loses streaming/partials and the OpenAI-shaped response surface that downstream code expects.

Proposed Solution

Add soniox as a provider option alongside openai / groq / local. Config sketch:

stt:
  enabled: true
  provider: soniox
  soniox:
    api_key: ${SONIOX_API_KEY}
    model: stt-rt-preview        # or stt-async-preview for batch
    language_hints: [en, hu]     # optional; auto-detect when omitted
    enable_speaker_diarization: false
    enable_endpoint_detection: true

Endpoints:

  • realtime: wss://stt-rt.soniox.com/transcribe-websocket
  • async: https://api.soniox.com/v1/transcriptions

Adapter would normalize Soniox tokens[] (with per-token language + speaker) into the existing transcript shape Hermes consumes.

Alternatives Considered

  1. Generic HERMES_LOCAL_STT_COMMAND wrapper — works for batch, but drops streaming/partials and forces a custom JSON-shape translation per user.
  2. Stick with Whisper — fine for English, weaker for code-switched speech and higher latency than Soniox realtime.
  3. Deepgram (already on the #1166 wishlist) — comparable capability; this request is additive, not a replacement.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: First-class Soniox STT provider support