openclaw - 💡(How to fix) Fix [Feature]: Language-aware TTS routing for inbound voice replies [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74722Fetched 2026-04-30 06:20:46
View on GitHub
Comments
2
Participants
3
Timeline
2
Reactions
2
Author
Timeline (top)
commented ×2

Add language-aware TTS provider/model/voice routing so auto-TTS replies can use the detected input language instead of a single fixed voice.

Root Cause

Add language-aware TTS provider/model/voice routing so auto-TTS replies can use the detected input language instead of a single fixed voice.

Code Example

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "openai",
      "providers": {
        "openai": {
          "baseUrl": "http://local-german-tts/v1",
          "model": "orpheus-german-fix",
          "voice": "jana"
        },
        "spanish": {
          "provider": "openai",
          "baseUrl": "http://local-spanish-tts/v1",
          "model": "f5-spanish",
          "voice": "sofia"
        }
      },
      "languageRouting": {
        "enabled": true,
        "source": "input",
        "fallbackLanguage": "de",
        "routes": {
          "de": { "provider": "openai", "model": "orpheus-german-fix", "voice": "jana" },
          "en": { "provider": "openai", "model": "orpheus-en", "voice": "tara" },
          "es": { "provider": "spanish", "model": "f5-spanish", "voice": "sofia" }
        }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Add language-aware TTS provider/model/voice routing so auto-TTS replies can use the detected input language instead of a single fixed voice.

Problem to solve

OpenClaw currently supports messages.tts.auto: "inbound", which correctly controls when audio is returned: text input gets text output, voice/audio input gets spoken output.

However, the spoken output still uses one fixed configured TTS provider/model/voice. This is limiting for multilingual agents and language tutors.

Example desired behavior:

  • German text in → German text out
  • German voice in → German spoken reply using a German voice
  • English text in → English text out
  • English voice in → English spoken reply using an English voice
  • Spanish text in → Spanish text out
  • Spanish voice in → Spanish spoken reply using a Spanish voice

Today, an agent can decide whether to reply in the user’s language at the text level, but TTS synthesis remains tied to the fixed configured voice/model. A Spanish voice message may therefore get a Spanish text response synthesized by a German voice, or require manual per-agent/provider switching.

Proposed solution

Add optional language-aware routing to TTS configuration. auto: "inbound" would still decide whether audio should be generated, while a new routing layer decides which TTS provider/model/voice should synthesize the reply.

Possible config shape:

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "openai",
      "providers": {
        "openai": {
          "baseUrl": "http://local-german-tts/v1",
          "model": "orpheus-german-fix",
          "voice": "jana"
        },
        "spanish": {
          "provider": "openai",
          "baseUrl": "http://local-spanish-tts/v1",
          "model": "f5-spanish",
          "voice": "sofia"
        }
      },
      "languageRouting": {
        "enabled": true,
        "source": "input",
        "fallbackLanguage": "de",
        "routes": {
          "de": { "provider": "openai", "model": "orpheus-german-fix", "voice": "jana" },
          "en": { "provider": "openai", "model": "orpheus-en", "voice": "tara" },
          "es": { "provider": "spanish", "model": "f5-spanish", "voice": "sofia" }
        }
      }
    }
  }
}

Routing source options could be:

  • input: use detected language from inbound audio transcription when available; for text input, use lightweight text language detection.
  • reply: use detected language of the assistant reply text.
  • auto: prefer transcript/input language, fall back to reply language.

For audio input, if the STT provider returns detected language metadata, OpenClaw should preserve it in the message context so TTS can route from it. If unavailable, OpenClaw can fall back to text language detection on the transcript or reply.

This should also support per-agent overrides via agents.list[].tts, deep-merging with global messages.tts, consistent with existing TTS override behavior.

Alternatives considered

  1. Create one agent per language with fixed TTS settings.

    • Works for dedicated language bots, but not for multilingual conversations or users switching languages mid-chat.
  2. Use prompt instructions to force the assistant to reply in a certain language.

    • Helps text generation, but does not select the correct TTS voice/model.
  3. Manually switch /tts provider or config before conversations.

    • Too much manual work and not viable for mixed-language voice chats.
  4. Use one multilingual TTS voice for all languages.

    • Sometimes acceptable, but quality and accent are worse than dedicated voices/models, especially for local custom TTS stacks.

Impact

Affected users/systems/channels:

  • Telegram/voice-note users
  • Multilingual assistants
  • Language tutor agents
  • Self-hosted/local TTS setups with separate models per language

Severity: Medium. Existing auto: "inbound" handles text-vs-voice modality well, but multilingual voice output feels wrong when the voice/model language is fixed.

Frequency: Common for bilingual/multilingual users and language-learning workflows; occurs whenever the user switches language.

Consequences:

  • Wrong-language/accent TTS output
  • Extra manual config/provider switching
  • Need for separate agents per language
  • Less natural voice UX despite correct text replies

Evidence/examples

Example setup:

  • German TTS: local OpenAI-compatible Orpheus endpoint, model orpheus-german-fix, voice jana
  • Spanish TTS: local OpenAI-compatible F5-Spanish endpoint, model f5-spanish, voice sofia
  • Audio transcription: local OpenAI-compatible Whisper endpoint

Current behavior: auto: "inbound" correctly returns audio only when the user sends audio, but the selected TTS provider/voice is fixed.

Desired behavior: A Spanish voice note should produce a Spanish spoken reply with the Spanish TTS route; a German voice note should use the German TTS route; text messages should remain text-only unless explicitly configured otherwise.

Additional information

This should be backward-compatible:

  • If languageRouting is absent or disabled, current TTS behavior remains unchanged.
  • Existing provider, providers, model, voice, and per-agent tts overrides should continue to work.
  • If language detection fails or no route matches, use the current configured provider/model/voice as fallback.

It may also be useful to expose the detected STT language in logs/status or transcript metadata for debugging.

extent analysis

TL;DR

Implement language-aware TTS routing by adding a languageRouting configuration to the messages.tts settings, allowing for dynamic selection of TTS providers and voices based on the detected input language.

Guidance

  • Add a languageRouting section to the messages.tts configuration, including enabled, source, fallbackLanguage, and routes settings.
  • Define routes for each supported language, specifying the corresponding TTS provider, model, and voice.
  • Use the source option to determine whether to use the detected language from inbound audio transcription, reply text, or a fallback language.
  • Ensure backward compatibility by preserving existing TTS behavior when languageRouting is absent or disabled.

Example

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "languageRouting": {
        "enabled": true,
        "source": "input",
        "fallbackLanguage": "de",
        "routes": {
          "de": { "provider": "openai", "model": "orpheus-german-fix", "voice": "jana" },
          "en": { "provider": "openai", "model": "orpheus-en", "voice": "tara" },
          "es": { "provider": "spanish", "model": "f5-spanish", "voice": "sofia" }
        }
      }
    }
  }
}

Notes

The proposed solution requires updates to the TTS configuration and potentially the STT transcription pipeline to preserve detected language metadata. The languageRouting feature should be thoroughly tested to ensure correct behavior for various input languages and scenarios.

Recommendation

Apply the proposed workaround by implementing the languageRouting configuration, as it provides a flexible and scalable solution for multilingual TTS support. This approach allows for easy addition of new languages and TTS

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING