openclaw - 💡(How to fix) Fix [Feature]: Support local/self-hosted STT engine for Talk Mode realtime voice [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84143Fetched 2026-05-20 03:43:32
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Author
Timeline (top)
labeled ×2closed ×1commented ×1

Add support for local or self-hosted Speech-to-Text (STT) engines in Talk Mode, so users without cloud API keys can still use high-quality realtime voice conversations.

Root Cause

Add support for local or self-hosted Speech-to-Text (STT) engines in Talk Mode, so users without cloud API keys can still use high-quality realtime voice conversations.

Code Example

{
  talk: {
    realtime: {
      provider: "openai",
      providers: {
        openai: {
          baseUrl: "http://localhost:8080",  // proposed new field
          apiKey: "not-needed",
          model: "whisper-large-v3",
        },
      },
    },
  },
}
RAW_BUFFERClick to expand / collapse

Summary

Add support for local or self-hosted Speech-to-Text (STT) engines in Talk Mode, so users without cloud API keys can still use high-quality realtime voice conversations.

Problem to Solve

Talk Mode currently has two STT paths, both with significant limitations:

PathSTT ProviderConfigurable?
macOS Native TalkmacOS Speech framework (built-in)Not replaceable; quality mediocre for non-English
Browser/WebChat TalkOpenAI Realtime API (WebRTC)No baseUrl override; requires OpenAI API key
Gateway RelayDeepgram / ElevenLabs / Mistral / xAIAll cloud APIs; requires separate API keys

No path supports a local or self-hosted STT server. Users without cloud API access (regions with network restrictions, privacy-conscious users, zero API cost users) are locked into macOS native speech recognition with limited quality.

Proposed Solution

Allow talk.realtime.providers.<provider> to accept a baseUrl configuration, similar to how messages.tts.providers.openai.baseUrl already works:

{
  talk: {
    realtime: {
      provider: "openai",
      providers: {
        openai: {
          baseUrl: "http://localhost:8080",  // proposed new field
          apiKey: "not-needed",
          model: "whisper-large-v3",
        },
      },
    },
  },
}

This would enable pointing Talk Mode to:

  • whisper.cpp server (OpenAI-compatible WebSocket)
  • faster-whisper streaming endpoint
  • SenseVoice or other local ASR engines
  • Any self-hosted OpenAI Realtime API-compatible STT service

For Gateway relay, similarly allow streaming.providers.<provider> to accept custom endpoints.

Combined with existing messages.tts.providers.openai.baseUrl (local TTS) and talk.providers.mlx (local TTS), this enables fully local, high-quality realtime voice.

Use Cases

  1. Users in regions where OpenAI/Google APIs are inaccessible (GFW, etc.)
  2. Privacy-first users — voice data never leaves local network
  3. Zero API cost — self-host on Apple Silicon
  4. Custom language support — local models for regional languages

Environment

  • OpenClaw version: 2026.5.18
  • Hardware: Mac Mini M4 Pro (Gateway) + Mac Studio (model server)
  • Talk mode tested: macOS native (quality limited) and WebChat (API key required, unavailable)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Support local/self-hosted STT engine for Talk Mode realtime voice [1 comments, 2 participants]