openclaw - 💡(How to fix) Fix Bug: Voice memos delivered twice (duplicate) when audioAsVoice is set [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68603Fetched 2026-04-19 15:09:43
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

The bug is in the tool result handling flow in pi-embedded-runner-DN0VbqlW.js, specifically in handleToolResultEnd (around line 1960).

Fix Action

Fix / Workaround

Status: Confirmed bug in pi-embedded-runner-DN0VbqlW.js Severity: Low (user-facing annoyance, no data loss) Affected channels: Telegram (likely others too) Symptom: Audio files generated by tools (music_generate, etc.) with [[audio_as_voice]] are delivered twice — once as a tool result and once merged into the final reply.

  1. Tool result text is emitted via emitToolOutputemitToolResultMessageparams.onToolResult({ text, mediaUrls }). This parses the tool's text output (which contains MEDIA:<path>) and delivers it as a tool result payload to the dispatcher. First delivery.

Code Example

const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0 
     ? mediaUrls 
     : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

---

// Line ~1968 in pi-embedded-runner-DN0VbqlW.js
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls                                              // ← BUG: uses ALL mediaUrls when audioAsVoice=true
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));  // ← correct dedup

---

// Before (buggy):
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

// After (fixed):
const dedupedMediaUrls = emittedToolOutputMediaUrls.length > 0
    ? mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url))
    : mediaUrls;
const pendingMediaUrls = mediaReply.audioAsVoice
    ? dedupedMediaUrls
    : dedupedMediaUrls.length > 0
        ? dedupedMediaUrls
        : [];
RAW_BUFFERClick to expand / collapse

OpenClaw Bug Report: Voice Memo Double Delivery on Telegram

Status: Confirmed bug in pi-embedded-runner-DN0VbqlW.js Severity: Low (user-facing annoyance, no data loss) Affected channels: Telegram (likely others too) Symptom: Audio files generated by tools (music_generate, etc.) with [[audio_as_voice]] are delivered twice — once as a tool result and once merged into the final reply.

Reproduction

  1. Have TTS disabled (so OpenClaw doesn't auto-generate audio)
  2. Ask the agent to create a voice memo using music_generate or manually generate audio
  3. Agent outputs text containing MEDIA:/path/to/audio.mp3 and [[audio_as_voice]]
  4. Result: Two identical voice messages arrive in Telegram

Root Cause

The bug is in the tool result handling flow in pi-embedded-runner-DN0VbqlW.js, specifically in handleToolResultEnd (around line 1960).

The Flow

When a tool like music_generate completes:

  1. Tool result text is emitted via emitToolOutputemitToolResultMessageparams.onToolResult({ text, mediaUrls }). This parses the tool's text output (which contains MEDIA:<path>) and delivers it as a tool result payload to the dispatcher. First delivery.

  2. Structured media from details.media is extracted via extractToolResultMediaArtifact, which returns { mediaUrls: ["/path/audio.mp3"], audioAsVoice: true }.

  3. The code then decides whether to queue these media URLs as "pending tool media":

    const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0 
      ? mediaUrls 
      : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

    Because audioAsVoice is true, the entire mediaUrls array is used without filtering against emittedToolOutputMediaUrls. This bypasses the deduplication that would normally prevent re-queueing URLs already sent in step 1.

  4. queuePendingToolMedia(ctx, { mediaUrls: pendingMediaUrls, audioAsVoice: true }) adds the URL to state.pendingToolMediaUrls.

  5. When the next block reply is emitted (the agent's final text), consumePendingToolMediaIntoReply merges pendingToolMediaUrls into the payload, attaching the audio file again. Second delivery.

The Flawed Logic

The audioAsVoice flag short-circuits the dedup filter:

// Line ~1968 in pi-embedded-runner-DN0VbqlW.js
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls                                              // ← BUG: uses ALL mediaUrls when audioAsVoice=true
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));  // ← correct dedup

The intent was likely: "If the tool says to send as voice, always queue the media as pending so it gets the voice treatment." But this ignores that the media was already delivered in step 1 via onToolResult.

Why It Only Affects Manual Voice Memos

  • When TTS is on: The maybeApplyTtsToPayload function generates audio natively. The tool result has no MEDIA: line in its text output and no audioAsVoice in structured media. The TTS system handles delivery directly — one message.

  • When TTS is off + manual voice memo: The agent explicitly writes MEDIA:<path> and [[audio_as_voice]] in its output. The tool result text includes MEDIA:<path>. The structured media has audioAsVoice: true. Both paths fire → double delivery.

Proposed Fix

Option A (minimal): Filter against emittedToolOutputMediaUrls even when audioAsVoice is true:

// Before (buggy):
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

// After (fixed):
const dedupedMediaUrls = emittedToolOutputMediaUrls.length > 0
    ? mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url))
    : mediaUrls;
const pendingMediaUrls = mediaReply.audioAsVoice
    ? dedupedMediaUrls
    : dedupedMediaUrls.length > 0
        ? dedupedMediaUrls
        : [];

This ensures that if the URL was already emitted in the tool output text, it's not queued again as pending — regardless of audioAsVoice.

Option B (more robust): Track which media URLs have been delivered via onToolResult in the state object, and use that as a cross-check in consumePendingToolMediaIntoReply. This would also catch edge cases where the same URL appears in multiple tool results.

Files Affected

  • pi-embedded-runner-DN0VbqlW.js — line ~1968 (the pendingMediaUrls calculation)
  • Possibly attempt.tool-run-context-CgVg2Nu2.jsextractToolResultMediaArtifact and filterToolResultMediaUrls

Notes

  • The consumePendingToolMediaIntoReply function does dedup via Set, but only within a single payload. It doesn't know that the URL was already delivered in a previous payload (the tool result).
  • The block reply pipeline's createBlockReplyPayloadKey dedup also can't catch this because the tool result payload and the final reply payload have different text content — they produce different keys.
  • This bug is specific to the combination of: tool-generated media + audioAsVoice: true + the tool output text containing MEDIA: lines.

extent analysis

TL;DR

The most likely fix for the double delivery issue is to filter against emittedToolOutputMediaUrls even when audioAsVoice is true, ensuring that media URLs already delivered in the tool output text are not queued again as pending.

Guidance

  • Review the handleToolResultEnd function in pi-embedded-runner-DN0VbqlW.js to understand how tool results are processed and how media URLs are handled.
  • Implement the proposed fix by modifying the pendingMediaUrls calculation to filter out media URLs that have already been emitted in the tool output text, regardless of the audioAsVoice flag.
  • Consider implementing a more robust solution by tracking delivered media URLs in the state object and using this information to prevent duplicate deliveries.
  • Verify the fix by testing the scenario with manual voice memos and TTS disabled, ensuring that only one voice message is delivered.

Example

The proposed fix can be implemented as follows:

const dedupedMediaUrls = emittedToolOutputMediaUrls.length > 0
    ? mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url))
    : mediaUrls;
const pendingMediaUrls = mediaReply.audioAsVoice
    ? dedupedMediaUrls
    : dedupedMediaUrls.length > 0
        ? dedupedMediaUrls
        : [];

This code ensures that media URLs are deduplicated even when audioAsVoice is true.

Notes

The fix assumes that the emittedToolOutputMediaUrls array is correctly populated with media URLs that have already been delivered in the tool output text. If this array is not correctly populated, the fix may not work as intended.

Recommendation

Apply the proposed fix to pi-embedded-runner-DN0VbqlW.js to prevent double delivery of voice messages. This fix is a minimal change that addresses the specific issue described in the bug report.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: Voice memos delivered twice (duplicate) when audioAsVoice is set [1 participants]