openclaw - ✅(Solved) Fix [Bug]: Groq Orpheus TTS fails with "response_format must be one of [wav]" — OpenAI provider hardcodes mp3/opus [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62215Fetched 2026-04-08 03:07:33
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1cross-referenced ×1

OpenClaw's TTS tool cannot use Groq's Orpheus text-to-speech models because the OpenAI-compatible speech provider hardcodes response_format to mp3 or opus, but Groq's Orpheus endpoint only accepts wav format.

Error Message

TTS conversion failed: openai: OpenAI TTS API error (400): response_format must be one of [wav] [type=invalid_request_error]

Root Cause

OpenClaw's OpenAI speech provider (speech-provider-FUNXvtFQ.js) hardcodes the response format:

const responseFormat = req.target === "voice-note" ? "opus" : "mp3";

Groq's Orpheus TTS API only supports wav format per official docs:

curl https://api.groq.com/openai/v1/audio/speech \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

Fix Action

Fixed

PR fix notes

PR #62233: fix(tts): use wav for Groq speech on OpenAI provider

Description (problem / solution / changelog)

Summary

  • Problem: the OpenAI speech provider always picked mp3 for audio files and opus for voice notes, so Groq Orpheus requests were sent with unsupported formats.
  • Why it matters: Groq TTS requests fail before audio delivery, which blocks TTS for Groq users configured through the OpenAI-compatible provider path.
  • What changed: the OpenAI speech provider now auto-selects wav for Groq endpoints, honors an explicit responseFormat override for proxied OpenAI-compatible backends, and only marks voice-note output as voice-compatible when the actual format is opus.
  • What did NOT change (scope boundary): this PR does not add a new dedicated Groq TTS provider or change telephony synthesis behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #62215
  • Related #62215
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: extensions/openai/speech-provider.ts derived responseFormat only from the synthesis target and never from the configured endpoint, so Groq-compatible OpenAI TTS requests were always sent as mp3 or opus.
  • Missing detection / guardrail: the provider had no regression test covering Groq-compatible speech endpoints or non-opus voice-note metadata.
  • Contributing context (if known): issue #62215 reported Groq Orpheus failing because the OpenAI-compatible provider path hardcoded unsupported formats.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/openai/speech-provider.test.ts
  • Scenario the test should lock in: Groq-compatible base URLs send wav, and explicit responseFormat: "wav" overrides keep voice-note metadata honest by returning voiceCompatible: false.
  • Why this is the smallest reliable guardrail: the bug is in a small provider-local format selection branch, and the provider unit tests can assert both the outgoing request body and returned metadata directly.
  • Existing test that already covers this (if any): N/A
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Groq TTS configured through the OpenAI-compatible speech provider now requests wav instead of unsupported mp3/opus formats.
  • Proxied OpenAI-compatible TTS backends can now opt into responseFormat: "wav" explicitly.
  • Voice-note metadata is only marked voice-compatible when the returned audio format is actually opus.

Diagram (if applicable)

Before:
[Groq TTS request] -> [OpenAI speech provider picks mp3/opus from target] -> [Groq rejects unsupported format]

After:
[Groq TTS request] -> [OpenAI speech provider picks wav or configured override] -> [request matches backend format requirements]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS 25.3.0
  • Runtime/container: local pnpm/Bun repo workflow
  • Model/provider: OpenAI speech provider with Groq-compatible endpoint (canopylabs/orpheus-v1-english)
  • Integration/channel (if any): TTS
  • Relevant config (redacted): messages.tts.providers.openai with Groq base URL or explicit responseFormat: "wav"

Steps

  1. Configure messages.tts.providers.openai.baseUrl to a Groq-compatible OpenAI endpoint and use an Orpheus model.
  2. Trigger speech synthesis for an audio file or voice note.
  3. Inspect the outgoing response_format and returned metadata.

Expected

  • Groq-compatible endpoints receive response_format: "wav".
  • Voice-note output is only marked voice-compatible when the returned format is opus.

Actual

  • Before this fix, audio-file requests always sent mp3 and voice-note requests always sent opus, regardless of backend requirements.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • pnpm test extensions/openai/speech-provider.test.ts extensions/openai/tts.test.ts
    • pnpm build
    • local provider probe confirmed current Groq-targeted synth requests now use wav
  • Edge cases checked:
    • Groq-compatible base URLs auto-select wav
    • explicit responseFormat: "wav" overrides work for proxied OpenAI-compatible backends
    • voice-note compatibility stays false when output is not opus
  • What you did not verify:
    • live Groq API synthesis against a real key
    • pnpm check, which still fails on unrelated existing repo issues in extensions/acpx and extensions/elevenlabs
    • pnpm test:extension openai, which hit an unrelated timeout in extensions/openai/openai-provider.test.ts

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: some OpenAI-compatible TTS backends may need a non-default audio format without using a Groq hostname.
    • Mitigation: this PR adds an explicit responseFormat override so proxied or custom endpoints can opt into wav without a larger provider redesign.

AI Assistance

  • This PR was prepared with AI assistance.
  • Testing level: locally verified with focused regression tests plus pnpm build; broader unrelated repo failures are noted above.

Made with Cursor

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/openai/speech-provider.test.ts (modified, +93/-1)
  • extensions/openai/speech-provider.ts (modified, +68/-3)
  • extensions/openai/tts.ts (modified, +1/-1)

Code Example

TTS conversion failed: openai: OpenAI TTS API error (400): response_format must be one of [wav] [type=invalid_request_error]

---

{
  "messages": {
    "tts": {
      "provider": "openai",
      "providers": {
        "openai": {
          "apiKey": "gsk_...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "voice": "daniel"
        }
      }
    }
  }
}

---

const responseFormat = req.target === "voice-note" ? "opus" : "mp3";

---

curl https://api.groq.com/openai/v1/audio/speech \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

---

docker exec openclaw-dejx-openclaw-1 curl -s \
  -X POST "https://api.groq.com/openai/v1/audio/speech" \
  -H "Authorization: Bearer gsk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"canopylabs/orpheus-v1-english","input":"Good evening sir.","voice":"daniel","response_format":"wav"}' \
  -o /tmp/test.wav

# Result: 61KB WAV file generated successfully

---

function resolveResponseFormat(baseUrl, target) {
  if (baseUrl?.includes('groq.com')) return 'wav';
  return target === "voice-note" ? "opus" : "mp3";
}

---

{
  "messages": {
    "tts": {
      "providers": {
        "openai": {
          "apiKey": "...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "responseFormat": "wav"
        }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Description

OpenClaw's TTS tool cannot use Groq's Orpheus text-to-speech models because the OpenAI-compatible speech provider hardcodes response_format to mp3 or opus, but Groq's Orpheus endpoint only accepts wav format.

Error

TTS conversion failed: openai: OpenAI TTS API error (400): response_format must be one of [wav] [type=invalid_request_error]

Reproduction Steps

  1. Configure Groq as OpenAI-compatible TTS provider in openclaw.json:
{
  "messages": {
    "tts": {
      "provider": "openai",
      "providers": {
        "openai": {
          "apiKey": "gsk_...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "voice": "daniel"
        }
      }
    }
  }
}
  1. Trigger TTS via /tts command or voice reply
  2. Observe 400 error from Groq API

Root Cause

OpenClaw's OpenAI speech provider (speech-provider-FUNXvtFQ.js) hardcodes the response format:

const responseFormat = req.target === "voice-note" ? "opus" : "mp3";

Groq's Orpheus TTS API only supports wav format per official docs:

curl https://api.groq.com/openai/v1/audio/speech \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

Working Example (Direct API)

docker exec openclaw-dejx-openclaw-1 curl -s \
  -X POST "https://api.groq.com/openai/v1/audio/speech" \
  -H "Authorization: Bearer gsk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"canopylabs/orpheus-v1-english","input":"Good evening sir.","voice":"daniel","response_format":"wav"}' \
  -o /tmp/test.wav

# Result: 61KB WAV file generated successfully

Suggested Fix

Option 1: Detect Groq endpoint and use wav

function resolveResponseFormat(baseUrl, target) {
  if (baseUrl?.includes('groq.com')) return 'wav';
  return target === "voice-note" ? "opus" : "mp3";
}

Option 2: Add responseFormat config option

{
  "messages": {
    "tts": {
      "providers": {
        "openai": {
          "apiKey": "...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "responseFormat": "wav"
        }
      }
    }
  }
}

Environment

  • OpenClaw: 2026.4.5
  • Node: v22.22.2
  • Provider: Groq (Orpheus TTS)
  • Model: canopylabs/orpheus-v1-english

Additional Context

  • Groq TTS works perfectly via direct API calls — issue is purely in OpenClaw's TTS tool wiring
  • ElevenLabs TTS works with OpenClaw's mp3 format (but requires paid subscription for API access)
  • xAI Grok TTS provider (PR #50544) correctly handles response_format — similar fix needed for OpenAI provider when used with Groq

extent analysis

TL;DR

To fix the issue, update the OpenClaw's OpenAI speech provider to use wav format when the Groq endpoint is detected.

Guidance

  • Update the speech-provider-FUNXvtFQ.js file to include a function that detects the Groq endpoint and returns wav format, such as the suggested resolveResponseFormat function.
  • Alternatively, add a responseFormat config option to the openclaw.json file, as shown in the suggested fix, to explicitly set the response format to wav for the Groq provider.
  • Verify that the fix works by triggering the TTS conversion and checking that the response format is indeed wav.
  • Test the fix with different input parameters to ensure that it works correctly in all scenarios.

Example

The suggested resolveResponseFormat function can be used as a replacement for the hardcoded responseFormat variable:

function resolveResponseFormat(baseUrl, target) {
  if (baseUrl?.includes('groq.com')) return 'wav';
  return target === "voice-note" ? "opus" : "mp3";
}

This function can be used to determine the response format based on the baseUrl and target parameters.

Notes

The fix assumes that the Groq endpoint is the only one that requires the wav format. If other endpoints also require wav, additional modifications may be needed.

Recommendation

Apply the workaround by updating the speech-provider-FUNXvtFQ.js file to include the resolveResponseFormat function, as this is a more flexible and maintainable solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING