hermes - 💡(How to fix) Fix [Feature]: First-class Soniox STT provider support

hermes2026-05-09 09:08:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Code Example

stt:
  enabled: true
  provider: soniox
  soniox:
    api_key: ${SONIOX_API_KEY}
    model: stt-rt-preview        # or stt-async-preview for batch
    language_hints: [en, hu]     # optional; auto-detect when omitted
    enable_speaker_diarization: false
    enable_endpoint_detection: true

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes today supports Whisper-family STT backends (OpenAI Whisper, Groq Whisper, local faster-whisper, generic HERMES_LOCAL_STT_COMMAND). For multilingual voice workflows (Telegram/Discord/CLI) and low-latency streaming, Soniox offers a stronger product:

real-time WebSocket STT with partial results
automatic language ID + code-switching across 60+ languages in a single stream (no need to pre-declare locale)
speaker diarization
built-in PII redaction
async batch endpoint as well

There is no first-class config path today. Users would have to wrap Soniox behind HERMES_LOCAL_STT_COMMAND, which loses streaming/partials and the OpenAI-shaped response surface that downstream code expects.

Proposed Solution

Add soniox as a provider option alongside openai / groq / local. Config sketch:

stt:
  enabled: true
  provider: soniox
  soniox:
    api_key: ${SONIOX_API_KEY}
    model: stt-rt-preview        # or stt-async-preview for batch
    language_hints: [en, hu]     # optional; auto-detect when omitted
    enable_speaker_diarization: false
    enable_endpoint_detection: true

Endpoints:

realtime: wss://stt-rt.soniox.com/transcribe-websocket
async: https://api.soniox.com/v1/transcriptions

Adapter would normalize Soniox tokens[] (with per-token language + speaker) into the existing transcript shape Hermes consumes.

Alternatives Considered

Generic HERMES_LOCAL_STT_COMMAND wrapper — works for batch, but drops streaming/partials and forces a custom JSON-shape translation per user.
Stick with Whisper — fine for English, weaker for code-switched speech and higher latency than Soniox realtime.
Deepgram (already on the #1166 wishlist) — comparable capability; this request is additive, not a replacement.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: First-class Soniox STT provider support

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: First-class Soniox STT provider support

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

Alternatives Considered

Feature Type

Scope

Still need to ship something?

RELATED_DISCOVERY

TRENDING