openclaw - 💡(How to fix) Fix [Feature]: Language-aware TTS routing for inbound voice replies [2 comments, 3 participants]

Code Example

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "openai",
      "providers": {
        "openai": {
          "baseUrl": "http://local-german-tts/v1",
          "model": "orpheus-german-fix",
          "voice": "jana"
        },
        "spanish": {
          "provider": "openai",
          "baseUrl": "http://local-spanish-tts/v1",
          "model": "f5-spanish",
          "voice": "sofia"
        }
      },
      "languageRouting": {
        "enabled": true,
        "source": "input",
        "fallbackLanguage": "de",
        "routes": {
          "de": { "provider": "openai", "model": "orpheus-german-fix", "voice": "jana" },
          "en": { "provider": "openai", "model": "orpheus-en", "voice": "tara" },
          "es": { "provider": "spanish", "model": "f5-spanish", "voice": "sofia" }
        }
      }
    }
  }
}

Summary

Add language-aware TTS provider/model/voice routing so auto-TTS replies can use the detected input language instead of a single fixed voice.

Problem to solve

OpenClaw currently supports messages.tts.auto: "inbound", which correctly controls when audio is returned: text input gets text output, voice/audio input gets spoken output.

However, the spoken output still uses one fixed configured TTS provider/model/voice. This is limiting for multilingual agents and language tutors.

Example desired behavior:

German text in → German text out
German voice in → German spoken reply using a German voice
English text in → English text out
English voice in → English spoken reply using an English voice
Spanish text in → Spanish text out
Spanish voice in → Spanish spoken reply using a Spanish voice

Today, an agent can decide whether to reply in the user’s language at the text level, but TTS synthesis remains tied to the fixed configured voice/model. A Spanish voice message may therefore get a Spanish text response synthesized by a German voice, or require manual per-agent/provider switching.

Proposed solution

Add optional language-aware routing to TTS configuration. auto: "inbound" would still decide whether audio should be generated, while a new routing layer decides which TTS provider/model/voice should synthesize the reply.

Possible config shape:

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "openai",
      "providers": {
        "openai": {
          "baseUrl": "http://local-german-tts/v1",
          "model": "orpheus-german-fix",
          "voice": "jana"
        },
        "spanish": {
          "provider": "openai",
          "baseUrl": "http://local-spanish-tts/v1",
          "model": "f5-spanish",
          "voice": "sofia"
        }
      },
      "languageRouting": {
        "enabled": true,
        "source": "input",
        "fallbackLanguage": "de",
        "routes": {
          "de": { "provider": "openai", "model": "orpheus-german-fix", "voice": "jana" },
          "en": { "provider": "openai", "model": "orpheus-en", "voice": "tara" },
          "es": { "provider": "spanish", "model": "f5-spanish", "voice": "sofia" }
        }
      }
    }
  }
}

Routing source options could be:

input: use detected language from inbound audio transcription when available; for text input, use lightweight text language detection.
reply: use detected language of the assistant reply text.
auto: prefer transcript/input language, fall back to reply language.

For audio input, if the STT provider returns detected language metadata, OpenClaw should preserve it in the message context so TTS can route from it. If unavailable, OpenClaw can fall back to text language detection on the transcript or reply.

This should also support per-agent overrides via agents.list[].tts, deep-merging with global messages.tts, consistent with existing TTS override behavior.

Alternatives considered

Create one agent per language with fixed TTS settings.
- Works for dedicated language bots, but not for multilingual conversations or users switching languages mid-chat.
Use prompt instructions to force the assistant to reply in a certain language.
- Helps text generation, but does not select the correct TTS voice/model.
Manually switch /tts provider or config before conversations.
- Too much manual work and not viable for mixed-language voice chats.
Use one multilingual TTS voice for all languages.
- Sometimes acceptable, but quality and accent are worse than dedicated voices/models, especially for local custom TTS stacks.

Impact

Affected users/systems/channels:

Telegram/voice-note users
Multilingual assistants
Language tutor agents
Self-hosted/local TTS setups with separate models per language

Severity: Medium. Existing auto: "inbound" handles text-vs-voice modality well, but multilingual voice output feels wrong when the voice/model language is fixed.

Frequency: Common for bilingual/multilingual users and language-learning workflows; occurs whenever the user switches language.

Consequences:

Wrong-language/accent TTS output
Extra manual config/provider switching
Need for separate agents per language
Less natural voice UX despite correct text replies

Evidence/examples

Example setup:

German TTS: local OpenAI-compatible Orpheus endpoint, model orpheus-german-fix, voice jana
Spanish TTS: local OpenAI-compatible F5-Spanish endpoint, model f5-spanish, voice sofia
Audio transcription: local OpenAI-compatible Whisper endpoint

Current behavior: auto: "inbound" correctly returns audio only when the user sends audio, but the selected TTS provider/voice is fixed.

Desired behavior: A Spanish voice note should produce a Spanish spoken reply with the Spanish TTS route; a German voice note should use the German TTS route; text messages should remain text-only unless explicitly configured otherwise.

Additional information

This should be backward-compatible:

If languageRouting is absent or disabled, current TTS behavior remains unchanged.
Existing provider, providers, model, voice, and per-agent tts overrides should continue to work.
If language detection fails or no route matches, use the current configured provider/model/voice as fallback.

It may also be useful to expose the detected STT language in logs/status or transcript metadata for debugging.

extent analysis

TL;DR

Implement language-aware TTS routing by adding a languageRouting configuration to the messages.tts settings, allowing for dynamic selection of TTS providers and voices based on the detected input language.

Guidance

Add a languageRouting section to the messages.tts configuration, including enabled, source, fallbackLanguage, and routes settings.
Define routes for each supported language, specifying the corresponding TTS provider, model, and voice.
Use the source option to determine whether to use the detected language from inbound audio transcription, reply text, or a fallback language.
Ensure backward compatibility by preserving existing TTS behavior when languageRouting is absent or disabled.

Example

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "languageRouting": {
        "enabled": true,
        "source": "input",
        "fallbackLanguage": "de",
        "routes": {
          "de": { "provider": "openai", "model": "orpheus-german-fix", "voice": "jana" },
          "en": { "provider": "openai", "model": "orpheus-en", "voice": "tara" },
          "es": { "provider": "spanish", "model": "f5-spanish", "voice": "sofia" }
        }
      }
    }
  }
}

Notes

The proposed solution requires updates to the TTS configuration and potentially the STT transcription pipeline to preserve detected language metadata. The languageRouting feature should be thoroughly tested to ensure correct behavior for various input languages and scenarios.

Recommendation

Apply the proposed workaround by implementing the languageRouting configuration, as it provides a flexible and scalable solution for multilingual TTS support. This approach allows for easy addition of new languages and TTS

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Feature]: Language-aware TTS routing for inbound voice replies [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Language-aware TTS routing for inbound voice replies [2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING