openclaw - ✅(Solved) Fix Audio auto transcription can prefer local Whisper over API provider and break Groq multipart uploads [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68727Fetched 2026-04-19 15:08:15
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

Telegram/audio transcription can unexpectedly use a local Whisper CLI even when an API provider is configured/available, and Groq OpenAI-compatible audio transcription can fail when OpenClaw passes a proxy-wrapped fetch into the audio provider.

Error Message

Audio transcription failed (HTTP 400): {"error":{"message":"request Content-Type isn't multipart/form-data","type":"invalid_request_error"}}

Root Cause

Telegram/audio transcription can unexpectedly use a local Whisper CLI even when an API provider is configured/available, and Groq OpenAI-compatible audio transcription can fail when OpenClaw passes a proxy-wrapped fetch into the audio provider.

Fix Action

Fix / Workaround

I have a small patch ready with focused Vitest coverage.

PR fix notes

PR #68733: Prefer API audio providers before local fallback

Description (problem / solution / changelog)

Summary

Fixes #68727.

This changes audio media-understanding fallback behavior so API-provider audio transcription wins before local CLI auto-detection, and avoids passing a pre-wrapped proxy fetch into audio providers. Audio providers already route requests through the shared provider HTTP helpers, which handle proxy env vars and NO_PROXY at the request URL boundary.

Why

A gateway with GROQ_API_KEY/Groq audio configured and a local /opt/homebrew/bin/whisper could still take the local audio path in auto mode. When the Groq provider path was selected, passing resolveProxyFetchFromEnv() into OpenAI-compatible audio transcription could break multipart uploads with Groq returning request Content-Type isn't multipart/form-data.

Changes

  • Prefer resolveKeyEntry() before local audio CLI fallback in audio auto mode.
  • Do not pass a direct proxy-wrapped fetchFn into audio providers; leave audio proxy handling to the shared provider HTTP layer.
  • Add regression coverage for provider-key priority when a local whisper executable exists.
  • Update proxy passthrough coverage for the audio path.

Validation

  • pnpm exec vitest run src/media-understanding/runner.auto-audio.test.ts src/media-understanding/runner.proxy.test.ts
  • pre-commit pnpm check passed locally during commit.

Changed files

  • src/media-understanding/runner.auto-audio.test.ts (modified, +33/-1)
  • src/media-understanding/runner.entries.ts (modified, +5/-3)
  • src/media-understanding/runner.proxy.test.ts (modified, +3/-2)
  • src/media-understanding/runner.ts (modified, +4/-0)

Code Example

Audio transcription failed (HTTP 400): {"error":{"message":"request Content-Type isn't multipart/form-data","type":"invalid_request_error"}}
RAW_BUFFERClick to expand / collapse

Summary

Telegram/audio transcription can unexpectedly use a local Whisper CLI even when an API provider is configured/available, and Groq OpenAI-compatible audio transcription can fail when OpenClaw passes a proxy-wrapped fetch into the audio provider.

Local reproduction

Environment:

  • macOS gateway
  • tools.media.audio.models configured with provider: "groq", model: "whisper-large-v3"
  • GROQ_API_KEY set
  • HTTP_PROXY / HTTPS_PROXY set
  • /opt/homebrew/bin/whisper present

Observed behavior:

  1. In the audio auto path, resolveAutoEntries() checks local audio CLIs before resolveKeyEntry(), so an installed local whisper/whisper-cli can win over API-provider transcription.
  2. After forcing the provider path to Groq, the request failed with:
Audio transcription failed (HTTP 400): {"error":{"message":"request Content-Type isn't multipart/form-data","type":"invalid_request_error"}}

Raw curl to https://api.groq.com/openai/v1/audio/transcriptions with the same key/audio file succeeded, and calling the OpenAI-compatible audio provider directly without the proxy-wrapped fetchFn also succeeded.

Expected behavior

If an audio API provider is configured or available via provider key, OpenClaw should prefer that provider before auto-detecting local CLIs. For OpenAI-compatible audio uploads, multipart form requests should not be broken by an EnvHttpProxyAgent fetch wrapper; the shared provider HTTP helper already has URL-aware proxy/NO_PROXY handling.

Proposed fix

  • In audio resolveAutoEntries(), try resolveKeyEntry() before local CLI fallback.
  • For audio provider entries, do not pass resolveProxyFetchFromEnv() directly; let provider HTTP helpers handle proxy env and NO_PROXY at request time.
  • Add regression coverage for provider-key priority when a local whisper binary exists, and for audio proxy fetch behavior.

I have a small patch ready with focused Vitest coverage.

extent analysis

TL;DR

Modify the resolveAutoEntries() function to prioritize resolveKeyEntry() over local CLI fallback and update the audio provider entries to handle proxy env and NO_PROXY at request time.

Guidance

  • Update the resolveAutoEntries() function to try resolveKeyEntry() before falling back to local CLI detection to ensure API providers are preferred.
  • Modify the audio provider entries to not pass resolveProxyFetchFromEnv() directly, allowing provider HTTP helpers to handle proxy env and NO_PROXY at request time.
  • Verify the fix by testing with a local whisper binary installed and an API provider configured, ensuring the API provider is used instead of the local CLI.
  • Test the audio provider with a proxy-wrapped fetch to ensure multipart form requests are not broken.

Example

No code snippet is provided as the issue does not contain sufficient code context.

Notes

The proposed fix assumes that the resolveKeyEntry() function is correctly implemented and that the provider HTTP helpers can handle proxy env and NO_PROXY correctly.

Recommendation

Apply the proposed fix to update the resolveAutoEntries() function and audio provider entries, as it addresses the root cause of the issue and ensures API providers are preferred over local CLIs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If an audio API provider is configured or available via provider key, OpenClaw should prefer that provider before auto-detecting local CLIs. For OpenAI-compatible audio uploads, multipart form requests should not be broken by an EnvHttpProxyAgent fetch wrapper; the shared provider HTTP helper already has URL-aware proxy/NO_PROXY handling.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Audio auto transcription can prefer local Whisper over API provider and break Groq multipart uploads [1 pull requests, 1 participants]