openclaw - ✅(Solved) Fix Audio transcription fails on 2026.4.5: SSRF guard corrupts multipart FormData for Whisper API [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62173Fetched 2026-04-08 03:08:04
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Author
Timeline (top)
subscribed ×3mentioned ×2commented ×1cross-referenced ×1

Root Cause

The postTranscriptionRequest call in media-understanding-*.js uses the SSRF-guarded fetch with DNS pinning enabled. The pinned DNS dispatcher corrupts the multipart boundary in FormData requests, preventing the API from parsing the file and model fields.

Fix Action

Workaround

Patch media-understanding-*.js to add pinDns: false to the postTranscriptionRequest fetch options, matching the existing Google image generation pattern.

PR fix notes

PR #62174: fix(audio): disable DNS pinning for multipart audio transcription requests

Description (problem / solution / changelog)

The SSRF guard's pinned DNS dispatcher corrupts multipart FormData boundaries when posting to OpenAI's /audio/transcriptions endpoint, causing the API to reject requests with "you must provide a model parameter" even though the model field is present in the form data.

This is the same class of issue already acknowledged for other multipart provider paths. Adding pinDns: false to the postTranscriptionRequest call in the OpenAI-compatible audio transcription path bypasses the broken dispatcher for these requests.

Reproduction

  1. Configure tools.media.audio.enabled: true with OpenAI whisper-1
  2. Send a voice note via Telegram
  3. Transcription silently fails — multipart boundary is corrupted by the pinned DNS dispatcher
  4. Direct curl to the same endpoint with the same file works fine

Fix

One-line addition: pass pinDns: false to postTranscriptionRequest in openai-compatible-audio.ts, matching the pattern used for other multipart provider paths.

Fixes #62173

Changed files

  • src/media-understanding/openai-compatible-audio.ts (modified, +1/-0)
RAW_BUFFERClick to expand / collapse

Voice note transcription via OpenAI Whisper API fails on 2026.4.5. The fetchWithSsrFGuard function uses undici's fetch with a pinned DNS dispatcher, which corrupts multipart FormData boundaries. OpenAI rejects the request with "you must provide a model parameter" even though it's present in the form data.

This is the same class of bug already fixed for Google image generation (pinDns: false), just not applied to the audio transcription code path in postTranscriptionRequest.

Steps to Reproduce

  1. Configure tools.media.audio.enabled: true with models: [{provider: "openai", model: "whisper-1"}]
  2. Set OPENAI_API_KEY in environment
  3. Send a voice note via Telegram
  4. Transcription silently fails — no transcript is injected into the message

Root Cause

The postTranscriptionRequest call in media-understanding-*.js uses the SSRF-guarded fetch with DNS pinning enabled. The pinned DNS dispatcher corrupts the multipart boundary in FormData requests, preventing the API from parsing the file and model fields.

Workaround

Patch media-understanding-*.js to add pinDns: false to the postTranscriptionRequest fetch options, matching the existing Google image generation pattern.

Environment

  • OpenClaw: 2026.4.5 (3e72c03)
  • OS: Ubuntu 24.04 / Linux 6.8.0-90-generic (x64)
  • Node: v22.22.0
  • Provider: OpenAI (whisper-1)
  • Channel: Telegram

Note: Direct curl calls to the same OpenAI /audio/transcriptions endpoint with the same audio file and API key work perfectly — confirming the issue is in the fetch/SSRF layer, not the API or audio file.

extent analysis

TL;DR

Apply the workaround by patching media-understanding-*.js to add pinDns: false to the postTranscriptionRequest fetch options to fix the voice note transcription issue via OpenAI Whisper API.

Guidance

  • Verify the issue by checking if the transcription silently fails when sending a voice note via Telegram with the current configuration.
  • Confirm the workaround by applying the patch and checking if the transcription works as expected.
  • Test the postTranscriptionRequest call with pinDns: false to ensure it resolves the multipart boundary corruption issue.
  • Review the existing Google image generation code path to ensure consistency in handling DNS pinning.

Example

No explicit code example is provided, but the patch involves adding pinDns: false to the postTranscriptionRequest fetch options, similar to the existing pattern in the Google image generation code.

Notes

This workaround is specific to the OpenAI Whisper API and may not apply to other providers or endpoints. The issue is isolated to the fetch/SSRF layer, and direct curl calls to the same endpoint work perfectly.

Recommendation

Apply the workaround by patching media-understanding-*.js to add pinDns: false to the postTranscriptionRequest fetch options, as this has been shown to resolve the issue in a similar code path (Google image generation).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Audio transcription fails on 2026.4.5: SSRF guard corrupts multipart FormData for Whisper API [1 pull requests, 1 comments, 2 participants]