openclaw - 💡(How to fix) Fix [Feature]: xAI TTS/STT Speech Provider

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Add native xAI TTS/STT speech provider support for cost-effective voice features

Root Cause

Add native xAI TTS/STT speech provider support for cost-effective voice features

Fix Action

Fix / Workaround

OpenClaw currently lacks native support for xAI's TTS/STT APIs, which offer significantly lower costs (~$2-6/M tokens) and high-quality voices. Users who want cost-effective voice features must implement custom workarounds instead of using a simple config change.

  1. Custom script workaround — Users can call xAI API directly via exec, but this bypasses OpenClaw's TTS tool, requires manual file handling, and doesn't integrate with Discord voice or directive tokens

Consequences:

  • Users pay 3-5x more than necessary for TTS
  • Some users may avoid voice features entirely due to cost
  • Manual workarounds (custom scripts) are needed for xAI integration
  • Missed opportunity for community benefit from xAI's competitive pricing
RAW_BUFFERClick to expand / collapse

Summary

Add native xAI TTS/STT speech provider support for cost-effective voice features

Problem to solve

OpenClaw currently lacks native support for xAI's TTS/STT APIs, which offer significantly lower costs (~$2-6/M tokens) and high-quality voices. Users who want cost-effective voice features must implement custom workarounds instead of using a simple config change.

Proposed solution

Add a new speech provider extension for xAI in the existing xAI plugin (similar to the OpenAI speech provider). The provider would:

  1. Support TTS synthesis with all 5 xAI voices (eve, ara, rex, sal, leo)
  2. Support STT transcription with word-level timestamps and speaker diarization
  3. Allow configuration via messages.tts.providers.xai in openclaw.json
  4. Enable voice selection via directive tokens (e.g., [[voice: rex]])

I have drafted a complete implementation (~300 lines TypeScript) following the existing OpenAI speech provider pattern and am willing to submit a PR.

Alternatives considered

  1. Continue using OpenAI/DeepInfra TTS — Works but more expensive and lower quality

  2. Custom script workaround — Users can call xAI API directly via exec, but this bypasses OpenClaw's TTS tool, requires manual file handling, and doesn't integrate with Discord voice or directive tokens

  3. Use Microsoft Edge TTS — Free but requires different setup, fewer voice personality options, and no STT integration

  4. ElevenLabs — High quality but premium pricing (more expensive than OpenAI)

Impact

Affected users: OpenClaw users who want voice features (TTS for message playback, STT for voice input, future Discord voice channel integration)

Severity: Not a blocker — current TTS providers work fine. However, the high cost of existing providers (OpenAI, ElevenLabs) is a barrier for:

  • Users running high-volume voice features
  • Hobbyists and small projects on tight budgets
  • Anyone wanting to experiment with premium voice at a lower cost

Frequency: Always present for users considering voice features

Consequences:

  • Users pay 3-5x more than necessary for TTS
  • Some users may avoid voice features entirely due to cost
  • Manual workarounds (custom scripts) are needed for xAI integration
  • Missed opportunity for community benefit from xAI's competitive pricing

This is a quality-of-life and cost optimization enhancement rather than a critical fix.

Evidence/examples

xAI TTS Documentation: https://docs.x.ai/developers/model-capabilities/audio/text-to-speech

xAI STT Documentation: https://docs.x.ai/developers/model-capabilities/audio/speech-to-text

Available Voices:

  • eve (energetic, upbeat)
  • ara (warm, friendly)
  • rex (confident, professional)
  • sal (smooth, balanced)
  • leo (authoritative, strong)

Supported Formats: MP3, WAV, PCM, μ-law, A-law (8kHz–48kHz)

Languages: 20+ languages with auto-detect support

Pricing Reference: https://docs.x.ai/developers/model-capabilities/audio/voice-agent#pricing (approximately $2-6 per million tokens vs OpenAI's $15-30/M)

Existing Pattern: OpenClaw already has speech provider extensions for OpenAI, Microsoft Edge, MiniMax, ElevenLabs, and Vydra. The xAI provider would follow the same architecture.

Additional information

I could potentially submit a PR with implementation that I have drafted:

  1. xai-speech-provider.ts — Full speech provider implementation (~300 lines)

    • TTS synthesis with all 5 voices
    • STT transcription with word-level timestamps
    • Speaker diarization support
    • Multiple codec support (MP3, WAV, PCM, μ-law, A-law)
  2. Integration guide with step-by-step instructions for:

    • File placement in the xAI extension
    • Registering the provider in the plugin index
    • Configuration examples
    • Testing commands

This would be my first open-source contribution, so I'd appreciate guidance on:

  • Preferred file structure/location within the extensions/xai/ directory
  • Testing requirements or conventions
  • Any xAI-specific considerations

xAI API Key: Uses existing XAI_API_KEY environment variable (already supported by OpenClaw's xAI plugin for other features like web search and code execution).

And yes, my Openclaw agent helped me draft this, but I am a coder and did discuss and review extensively!

extent analysis

TL;DR

To add native xAI TTS/STT speech provider support, implement a new speech provider extension for xAI in the existing xAI plugin, following the pattern of the OpenAI speech provider.

Guidance

  1. Review the proposed solution: Examine the drafted implementation (~300 lines TypeScript) for the xAI speech provider, ensuring it aligns with OpenClaw's architecture and supports TTS synthesis, STT transcription, and configuration via messages.tts.providers.xai in openclaw.json.
  2. Verify xAI API compatibility: Confirm that the xAI API key environment variable (XAI_API_KEY) is properly utilized and that the implementation supports all 5 xAI voices and required features like word-level timestamps and speaker diarization.
  3. Test the implementation: Follow the provided integration guide and testing commands to ensure the xAI speech provider works as expected, including voice selection via directive tokens and compatibility with different audio formats.
  4. Submit a PR with documentation: Include the implementation, integration guide, and any necessary adjustments to the plugin index, ensuring that the contribution follows OpenClaw's coding standards and testing conventions.

Example

No specific code example is provided due to the complexity and scope of the implementation, but the xai-speech-provider.ts file should contain the core logic for TTS synthesis and STT transcription, utilizing the xAI API and adhering to OpenClaw's speech provider extension pattern.

Notes

The success of this implementation depends on the accuracy of the drafted code and its adherence to OpenClaw's architecture and testing standards. It's also crucial to ensure that the xAI API key is securely handled and that the implementation does not introduce any security vulnerabilities.

Recommendation

Apply the proposed workaround by implementing the xAI speech provider extension, as it offers a cost-effective solution with high-quality voices and integrates well with OpenClaw's existing features. This approach allows for a simple configuration change for users, enhancing their experience without requiring custom workarounds.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: xAI TTS/STT Speech Provider