openclaw - ✅(Solved) Fix [Bug]: Groq Orpheus TTS fails with "response_format must be one of [wav]" — OpenAI provider hardcodes mp3/opus [1 pull requests, 1 comments, 2 participants]

darrenfurr · 2026-04-07T00:55:41Z

[openclaw] OpenClaw's TTS tool cannot use Groq's Orpheus text-to-speech models because the OpenAI-compatible speech provider hardcodes response format to mp3 o… OpenClaw's TTS tool cannot use Groq's Orpheus text-to-speech models because the OpenAI-compatible speech provider hardcodes `response_format` to `mp3` or `opus`, but Groq's Orpheus endpoint **only accepts `wav`** format. # PR #62233: fix(tts): use wav for Groq speech on OpenAI provider - Repository: openclaw/openclaw - Author: neeravmakwana - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/62233 ## Description (problem / solution / changelog) ## Summary - Problem: the OpenAI speech provider always picked `mp3` for audio files and `opus` for voice notes, so Groq Orpheus requests were sent with unsupported formats. - Why it matters: Groq TTS requests fail before audio delivery, which blocks TTS for Groq users configured through the OpenAI-compatible provider path. - What changed: the OpenAI speech provider now auto-selects `wav` for Groq endpoints, honors an explicit `responseFormat` override for proxied OpenAI-compatible backends, and only marks voice-note output as voice-compatible when the actual format is `opus`. - What did NOT change (scope boundary): this PR does not add a new dedicated Groq TTS provider or change telephony synthesis behavior. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #62215 - Related #62215 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - Root cause: `extensions/openai/speech-provider.ts` derived `responseFormat` only from the synthesis target and never from the configured endpoint, so Groq-compatible OpenAI TTS requests were always sent as `mp3` or `opus`. - Missing detection / guardrail: the provider had no regression test covering Groq-compatible speech endpoints or non-`opus` voice-note metadata. - Contributing context (if known): issue #62215 reported Groq Orpheus failing because the OpenAI-compatible provider path hardcoded unsupported formats. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [ ] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `extensions/openai/speech-provider.test.ts` - Scenario the test should lock in: Groq-compatible base URLs send `wav`, and explicit `responseFormat: "wav"` overrides keep voice-note metadata honest by returning `voiceCompatible: false`. - Why this is the smallest reliable guardrail: the bug is in a small provider-local format selection branch, and the provider unit tests can assert both the outgoing request body and returned metadata directly. - Existing test that already covers this (if any): N/A - If no new test is added, why not: N/A ## User-visible / Behavior Changes - Groq TTS configured through the OpenAI-compatible speech provider now requests `wav` instead of unsupported `mp3`/`opus` formats. - Proxied OpenAI-compatible TTS backends can now opt into `responseFormat: "wav"` explicitly. - Voice-note metadata is only marked voice-compatible when the returned audio format is actually `opus`. ## Diagram (if applicable) ```text Before: [Groq TTS request] -> [OpenAI speech provider picks mp3/opus from target] -> [Groq rejects unsupported format] After: [Groq TTS request] -> [OpenAI speech provider picks wav or configured override] -> [request matches backend format requirements] ``` ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) No - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS 25.3.0 - Runtime/container: local pnpm/Bun repo workflow - Model/provider: OpenAI speech provider with Groq-compatible endpoint (`canopylabs/orpheus-v1-english`) - Integration/channel (if any): TTS - Relevant config (redacted): `messages.tts.providers.openai` with Groq base URL or explicit `responseFormat: "wav"` ### Steps 1. Configure `messages.tts.providers.openai.baseUrl` to a Groq-compatible OpenAI endpoint and use an Orpheus model. 2. Trigger speech synthesis for an audio file or voice note. 3. Inspect the outgoing `response_format` and returned metadata. ### Expected - Groq-compatible endpoints receive `response_format: "wav"`. - Voice-note output is only marked voice-compatible when the returned format is `opus`. ### Actual - Before this fix, audio-file requests always sent `mp3` and voice-note requests always sent `opus`, regardles

openclaw2026-04-07 00:55:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62215•Fetched 2026-04-08 03:07:33

View on GitHub

Comments

Participants

Timeline

Reactions

Author

darrenfurr

Participants

darrenfurr

neeravmakwana

Timeline (top)

closed ×1commented ×1cross-referenced ×1

OpenClaw's TTS tool cannot use Groq's Orpheus text-to-speech models because the OpenAI-compatible speech provider hardcodes response_format to mp3 or opus, but Groq's Orpheus endpoint only accepts wav format.

Error Message

TTS conversion failed: openai: OpenAI TTS API error (400): response_format must be one of [wav] [type=invalid_request_error]

Root Cause

OpenClaw's OpenAI speech provider (speech-provider-FUNXvtFQ.js) hardcodes the response format:

const responseFormat = req.target === "voice-note" ? "opus" : "mp3";

Groq's Orpheus TTS API only supports wav format per official docs:

curl https://api.groq.com/openai/v1/audio/speech \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

Fix Action

Fixed

Fixed by PR: fix(tts): use wav for Groq speech on OpenAI provider (https://github.com/openclaw/openclaw/pull/62233)

PR fix notes

PR #62233: fix(tts): use wav for Groq speech on OpenAI provider

Repository: openclaw/openclaw
Author: neeravmakwana
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/62233

Description (problem / solution / changelog)

Summary

Problem: the OpenAI speech provider always picked mp3 for audio files and opus for voice notes, so Groq Orpheus requests were sent with unsupported formats.
Why it matters: Groq TTS requests fail before audio delivery, which blocks TTS for Groq users configured through the OpenAI-compatible provider path.
What changed: the OpenAI speech provider now auto-selects wav for Groq endpoints, honors an explicit responseFormat override for proxied OpenAI-compatible backends, and only marks voice-note output as voice-compatible when the actual format is opus.
What did NOT change (scope boundary): this PR does not add a new dedicated Groq TTS provider or change telephony synthesis behavior.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #62215
Related #62215
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: extensions/openai/speech-provider.ts derived responseFormat only from the synthesis target and never from the configured endpoint, so Groq-compatible OpenAI TTS requests were always sent as mp3 or opus.
Missing detection / guardrail: the provider had no regression test covering Groq-compatible speech endpoints or non-opus voice-note metadata.
Contributing context (if known): issue #62215 reported Groq Orpheus failing because the OpenAI-compatible provider path hardcoded unsupported formats.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: extensions/openai/speech-provider.test.ts
Scenario the test should lock in: Groq-compatible base URLs send wav, and explicit responseFormat: "wav" overrides keep voice-note metadata honest by returning voiceCompatible: false.
Why this is the smallest reliable guardrail: the bug is in a small provider-local format selection branch, and the provider unit tests can assert both the outgoing request body and returned metadata directly.
Existing test that already covers this (if any): N/A
If no new test is added, why not: N/A

User-visible / Behavior Changes

Groq TTS configured through the OpenAI-compatible speech provider now requests wav instead of unsupported mp3/opus formats.
Proxied OpenAI-compatible TTS backends can now opt into responseFormat: "wav" explicitly.
Voice-note metadata is only marked voice-compatible when the returned audio format is actually opus.

Diagram (if applicable)

Before:
[Groq TTS request] -> [OpenAI speech provider picks mp3/opus from target] -> [Groq rejects unsupported format]

After:
[Groq TTS request] -> [OpenAI speech provider picks wav or configured override] -> [request matches backend format requirements]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS 25.3.0
Runtime/container: local pnpm/Bun repo workflow
Model/provider: OpenAI speech provider with Groq-compatible endpoint (canopylabs/orpheus-v1-english)
Integration/channel (if any): TTS
Relevant config (redacted): messages.tts.providers.openai with Groq base URL or explicit responseFormat: "wav"

Steps

Configure messages.tts.providers.openai.baseUrl to a Groq-compatible OpenAI endpoint and use an Orpheus model.
Trigger speech synthesis for an audio file or voice note.
Inspect the outgoing response_format and returned metadata.

Expected

Groq-compatible endpoints receive response_format: "wav".
Voice-note output is only marked voice-compatible when the returned format is opus.

Actual

Before this fix, audio-file requests always sent mp3 and voice-note requests always sent opus, regardless of backend requirements.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- pnpm test extensions/openai/speech-provider.test.ts extensions/openai/tts.test.ts
- pnpm build
- local provider probe confirmed current Groq-targeted synth requests now use wav
Edge cases checked:
- Groq-compatible base URLs auto-select wav
- explicit responseFormat: "wav" overrides work for proxied OpenAI-compatible backends
- voice-note compatibility stays false when output is not opus
What you did not verify:
- live Groq API synthesis against a real key
- pnpm check, which still fails on unrelated existing repo issues in extensions/acpx and extensions/elevenlabs
- pnpm test:extension openai, which hit an unrelated timeout in extensions/openai/openai-provider.test.ts

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps:

Risks and Mitigations

Risk: some OpenAI-compatible TTS backends may need a non-default audio format without using a Groq hostname.
- Mitigation: this PR adds an explicit responseFormat override so proxied or custom endpoints can opt into wav without a larger provider redesign.

AI Assistance

This PR was prepared with AI assistance.
Testing level: locally verified with focused regression tests plus pnpm build; broader unrelated repo failures are noted above.

Made with Cursor

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/openai/speech-provider.test.ts (modified, +93/-1)
extensions/openai/speech-provider.ts (modified, +68/-3)
extensions/openai/tts.ts (modified, +1/-1)

Code Example

TTS conversion failed: openai: OpenAI TTS API error (400): response_format must be one of [wav] [type=invalid_request_error]

---

{
  "messages": {
    "tts": {
      "provider": "openai",
      "providers": {
        "openai": {
          "apiKey": "gsk_...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "voice": "daniel"
        }
      }
    }
  }
}

---

const responseFormat = req.target === "voice-note" ? "opus" : "mp3";

---

curl https://api.groq.com/openai/v1/audio/speech \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

---

docker exec openclaw-dejx-openclaw-1 curl -s \
  -X POST "https://api.groq.com/openai/v1/audio/speech" \
  -H "Authorization: Bearer gsk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"canopylabs/orpheus-v1-english","input":"Good evening sir.","voice":"daniel","response_format":"wav"}' \
  -o /tmp/test.wav

# Result: 61KB WAV file generated successfully

---

function resolveResponseFormat(baseUrl, target) {
  if (baseUrl?.includes('groq.com')) return 'wav';
  return target === "voice-note" ? "opus" : "mp3";
}

---

{
  "messages": {
    "tts": {
      "providers": {
        "openai": {
          "apiKey": "...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "responseFormat": "wav"
        }
      }
    }
  }
}

RAW_BUFFERClick to expand / collapse

Description

Error

TTS conversion failed: openai: OpenAI TTS API error (400): response_format must be one of [wav] [type=invalid_request_error]

Reproduction Steps

Configure Groq as OpenAI-compatible TTS provider in openclaw.json:

{
  "messages": {
    "tts": {
      "provider": "openai",
      "providers": {
        "openai": {
          "apiKey": "gsk_...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "voice": "daniel"
        }
      }
    }
  }
}

Trigger TTS via /tts command or voice reply
Observe 400 error from Groq API

Root Cause

OpenClaw's OpenAI speech provider (speech-provider-FUNXvtFQ.js) hardcodes the response format:

const responseFormat = req.target === "voice-note" ? "opus" : "mp3";

Groq's Orpheus TTS API only supports wav format per official docs:

curl https://api.groq.com/openai/v1/audio/speech \
  -H "Authorization: Bearer $GROQ_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "canopylabs/orpheus-v1-english",
    "input": "Welcome to Orpheus text-to-speech.",
    "voice": "austin",
    "response_format": "wav"
  }' \
  --output orpheus-english.wav

Working Example (Direct API)

docker exec openclaw-dejx-openclaw-1 curl -s \
  -X POST "https://api.groq.com/openai/v1/audio/speech" \
  -H "Authorization: Bearer gsk_..." \
  -H "Content-Type: application/json" \
  -d '{"model":"canopylabs/orpheus-v1-english","input":"Good evening sir.","voice":"daniel","response_format":"wav"}' \
  -o /tmp/test.wav

# Result: 61KB WAV file generated successfully

Suggested Fix

Option 1: Detect Groq endpoint and use wav

function resolveResponseFormat(baseUrl, target) {
  if (baseUrl?.includes('groq.com')) return 'wav';
  return target === "voice-note" ? "opus" : "mp3";
}

Option 2: Add `responseFormat` config option

{
  "messages": {
    "tts": {
      "providers": {
        "openai": {
          "apiKey": "...",
          "baseUrl": "https://api.groq.com/openai/v1",
          "model": "canopylabs/orpheus-v1-english",
          "responseFormat": "wav"
        }
      }
    }
  }
}

Environment

OpenClaw: 2026.4.5
Node: v22.22.2
Provider: Groq (Orpheus TTS)
Model: canopylabs/orpheus-v1-english

Additional Context

Groq TTS works perfectly via direct API calls — issue is purely in OpenClaw's TTS tool wiring
ElevenLabs TTS works with OpenClaw's mp3 format (but requires paid subscription for API access)
xAI Grok TTS provider (PR #50544) correctly handles response_format — similar fix needed for OpenAI provider when used with Groq

extent analysis

TL;DR

To fix the issue, update the OpenClaw's OpenAI speech provider to use wav format when the Groq endpoint is detected.

Guidance

Update the speech-provider-FUNXvtFQ.js file to include a function that detects the Groq endpoint and returns wav format, such as the suggested resolveResponseFormat function.
Alternatively, add a responseFormat config option to the openclaw.json file, as shown in the suggested fix, to explicitly set the response format to wav for the Groq provider.
Verify that the fix works by triggering the TTS conversion and checking that the response format is indeed wav.
Test the fix with different input parameters to ensure that it works correctly in all scenarios.

Example

The suggested resolveResponseFormat function can be used as a replacement for the hardcoded responseFormat variable:

function resolveResponseFormat(baseUrl, target) {
  if (baseUrl?.includes('groq.com')) return 'wav';
  return target === "voice-note" ? "opus" : "mp3";
}

This function can be used to determine the response format based on the baseUrl and target parameters.

Notes

The fix assumes that the Groq endpoint is the only one that requires the wav format. If other endpoints also require wav, additional modifications may be needed.

Recommendation

Apply the workaround by updating the speech-provider-FUNXvtFQ.js file to include the resolveResponseFormat function, as this is a more flexible and maintainable solution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API routing #API middleware #SSR setup #ISR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Groq Orpheus TTS fails with "response_format must be one of [wav]" — OpenAI provider hardcodes mp3/opus [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #62233: fix(tts): use wav for Groq speech on OpenAI provider

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

AI Assistance

Changed files

Code Example

Description

Error

Reproduction Steps

Root Cause

Working Example (Direct API)

Suggested Fix

Option 1: Detect Groq endpoint and use wav

Option 2: Add responseFormat config option

Environment

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Option 2: Add `responseFormat` config option