openclaw - 💡(How to fix) Fix [whatsapp] Audio transcription replaces body without preserving media marker

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

File: @openclaw/whatsapp/dist/monitor-ClhD-fQ6.js (bundled)

The relevant code path:

const hasAudioBody = params.msg.mediaType?.startsWith("audio/") === true && params.msg.body === "<media:audio>";
if (params.preflightAudioTranscript === void 0 && hasAudioBody && params.msg.mediaPath) {
    const { transcribeFirstAudio } = await import("./audio-preflight.runtime-C_glQhZY.js");
    audioTranscript = await transcribeFirstAudio({ /* ... */ });
}

const msgForAgent = audioTranscript !== void 0 ? {
    ...params.msg,
    body: audioTranscript  // <-- body is fully replaced with plain transcript text
} : params.msg;

The original body: "<media:audio>" is completely overwritten. The [Audio] block that the core gateway uses (per docs/nodes/audio.md) is never injected.

Code Example

const hasAudioBody = params.msg.mediaType?.startsWith("audio/") === true && params.msg.body === "<media:audio>";
if (params.preflightAudioTranscript === void 0 && hasAudioBody && params.msg.mediaPath) {
    const { transcribeFirstAudio } = await import("./audio-preflight.runtime-C_glQhZY.js");
    audioTranscript = await transcribeFirstAudio({ /* ... */ });
}

const msgForAgent = audioTranscript !== void 0 ? {
    ...params.msg,
    body: audioTranscript  // <-- body is fully replaced with plain transcript text
} : params.msg;
RAW_BUFFERClick to expand / collapse

Bug Description

When a WhatsApp audio message is transcribed via preflight STT, the plugin replaces msg.body entirely with the transcript text, losing the original <media:audio> marker. This means the agent receives only plain text and cannot know the original message was audio, so it cannot respond with a voice message as expected by the tools.media.audio.enabled configuration.

Root Cause

File: @openclaw/whatsapp/dist/monitor-ClhD-fQ6.js (bundled)

The relevant code path:

const hasAudioBody = params.msg.mediaType?.startsWith("audio/") === true && params.msg.body === "<media:audio>";
if (params.preflightAudioTranscript === void 0 && hasAudioBody && params.msg.mediaPath) {
    const { transcribeFirstAudio } = await import("./audio-preflight.runtime-C_glQhZY.js");
    audioTranscript = await transcribeFirstAudio({ /* ... */ });
}

const msgForAgent = audioTranscript !== void 0 ? {
    ...params.msg,
    body: audioTranscript  // <-- body is fully replaced with plain transcript text
} : params.msg;

The original body: "<media:audio>" is completely overwritten. The [Audio] block that the core gateway uses (per docs/nodes/audio.md) is never injected.

Expected Behavior

After successful transcription, the body should preserve the audio marker so the agent knows this was a voice message. Per docs/nodes/audio.md:

On success, it replaces Body with an [Audio] block and sets {{Transcript}}.

The WhatsApp plugin should follow the same pattern — either prepend <media:audio> before the transcript text in body, or use the core gateway's [Audio] block format with transcript embedded.

Actual Behavior

Body contains only the plain transcript text. The agent receives no indication that the original message was audio, and therefore responds with text instead of a voice message.

Reproduction Steps

  1. Configure WhatsApp channel with tools.media.audio.enabled: true
  2. Send a voice message to the agent via WhatsApp
  3. Observe the agent receives only plain text transcript (no <media:audio> marker)
  4. Agent responds with text instead of audio

Environment

  • OpenClaw: 2026.5.22
  • Plugin: @openclaw/whatsapp
  • STT: Whisper CLI (local)
  • Channel: WhatsApp direct message

Additional Context

The mediaType and mediaPath fields remain intact on the msg object after transcription, but they are not used to reconstruct the audio marker in the body. The formatInboundEnvelope core function receives the already-modified body and cannot recover the marker.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING