Fix Action

Fix / Workaround

Status: Confirmed bug in pi-embedded-runner-DN0VbqlW.js Severity: Low (user-facing annoyance, no data loss) Affected channels: Telegram (likely others too) Symptom: Audio files generated by tools (music_generate, etc.) with [[audio_as_voice]] are delivered twice — once as a tool result and once merged into the final reply.

Tool result text is emitted via emitToolOutput → emitToolResultMessage → params.onToolResult({ text, mediaUrls }). This parses the tool's text output (which contains MEDIA:<path>) and delivers it as a tool result payload to the dispatcher. First delivery.

Code Example

const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0 
     ? mediaUrls 
     : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

---

// Line ~1968 in pi-embedded-runner-DN0VbqlW.js
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls                                              // ← BUG: uses ALL mediaUrls when audioAsVoice=true
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));  // ← correct dedup

---

// Before (buggy):
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

// After (fixed):
const dedupedMediaUrls = emittedToolOutputMediaUrls.length > 0
    ? mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url))
    : mediaUrls;
const pendingMediaUrls = mediaReply.audioAsVoice
    ? dedupedMediaUrls
    : dedupedMediaUrls.length > 0
        ? dedupedMediaUrls
        : [];

OpenClaw Bug Report: Voice Memo Double Delivery on Telegram

Reproduction

Have TTS disabled (so OpenClaw doesn't auto-generate audio)
Ask the agent to create a voice memo using music_generate or manually generate audio
Agent outputs text containing MEDIA:/path/to/audio.mp3 and [[audio_as_voice]]
Result: Two identical voice messages arrive in Telegram

Root Cause

The bug is in the tool result handling flow in pi-embedded-runner-DN0VbqlW.js, specifically in handleToolResultEnd (around line 1960).

The Flow

When a tool like music_generate completes:

Tool result text is emitted via emitToolOutput → emitToolResultMessage → params.onToolResult({ text, mediaUrls }). This parses the tool's text output (which contains MEDIA:<path>) and delivers it as a tool result payload to the dispatcher. First delivery.
Structured media from details.media is extracted via extractToolResultMediaArtifact, which returns { mediaUrls: ["/path/audio.mp3"], audioAsVoice: true }.
The code then decides whether to queue these media URLs as "pending tool media":
```
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0 
  ? mediaUrls 
  : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));
```
Because audioAsVoice is true, the entire mediaUrls array is used without filtering against emittedToolOutputMediaUrls. This bypasses the deduplication that would normally prevent re-queueing URLs already sent in step 1.
queuePendingToolMedia(ctx, { mediaUrls: pendingMediaUrls, audioAsVoice: true }) adds the URL to state.pendingToolMediaUrls.
When the next block reply is emitted (the agent's final text), consumePendingToolMediaIntoReply merges pendingToolMediaUrls into the payload, attaching the audio file again. Second delivery.

The Flawed Logic

The audioAsVoice flag short-circuits the dedup filter:

// Line ~1968 in pi-embedded-runner-DN0VbqlW.js
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls                                              // ← BUG: uses ALL mediaUrls when audioAsVoice=true
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));  // ← correct dedup

The intent was likely: "If the tool says to send as voice, always queue the media as pending so it gets the voice treatment." But this ignores that the media was already delivered in step 1 via onToolResult.

Why It Only Affects Manual Voice Memos

When TTS is on: The maybeApplyTtsToPayload function generates audio natively. The tool result has no MEDIA: line in its text output and no audioAsVoice in structured media. The TTS system handles delivery directly — one message.
When TTS is off + manual voice memo: The agent explicitly writes MEDIA:<path> and [[audio_as_voice]] in its output. The tool result text includes MEDIA:<path>. The structured media has audioAsVoice: true. Both paths fire → double delivery.

Proposed Fix

Option A (minimal): Filter against emittedToolOutputMediaUrls even when audioAsVoice is true:

// Before (buggy):
const pendingMediaUrls = mediaReply.audioAsVoice || emittedToolOutputMediaUrls.length === 0
    ? mediaUrls
    : mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url));

// After (fixed):
const dedupedMediaUrls = emittedToolOutputMediaUrls.length > 0
    ? mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url))
    : mediaUrls;
const pendingMediaUrls = mediaReply.audioAsVoice
    ? dedupedMediaUrls
    : dedupedMediaUrls.length > 0
        ? dedupedMediaUrls
        : [];

This ensures that if the URL was already emitted in the tool output text, it's not queued again as pending — regardless of audioAsVoice.

Option B (more robust): Track which media URLs have been delivered via onToolResult in the state object, and use that as a cross-check in consumePendingToolMediaIntoReply. This would also catch edge cases where the same URL appears in multiple tool results.

Files Affected

pi-embedded-runner-DN0VbqlW.js — line ~1968 (the pendingMediaUrls calculation)
Possibly attempt.tool-run-context-CgVg2Nu2.js — extractToolResultMediaArtifact and filterToolResultMediaUrls

Notes

The consumePendingToolMediaIntoReply function does dedup via Set, but only within a single payload. It doesn't know that the URL was already delivered in a previous payload (the tool result).
The block reply pipeline's createBlockReplyPayloadKey dedup also can't catch this because the tool result payload and the final reply payload have different text content — they produce different keys.
This bug is specific to the combination of: tool-generated media + audioAsVoice: true + the tool output text containing MEDIA: lines.

extent analysis

TL;DR

The most likely fix for the double delivery issue is to filter against emittedToolOutputMediaUrls even when audioAsVoice is true, ensuring that media URLs already delivered in the tool output text are not queued again as pending.

Guidance

Review the handleToolResultEnd function in pi-embedded-runner-DN0VbqlW.js to understand how tool results are processed and how media URLs are handled.
Implement the proposed fix by modifying the pendingMediaUrls calculation to filter out media URLs that have already been emitted in the tool output text, regardless of the audioAsVoice flag.
Consider implementing a more robust solution by tracking delivered media URLs in the state object and using this information to prevent duplicate deliveries.
Verify the fix by testing the scenario with manual voice memos and TTS disabled, ensuring that only one voice message is delivered.

Example

The proposed fix can be implemented as follows:

const dedupedMediaUrls = emittedToolOutputMediaUrls.length > 0
    ? mediaUrls.filter((url) => !emittedToolOutputMediaUrls.includes(url))
    : mediaUrls;
const pendingMediaUrls = mediaReply.audioAsVoice
    ? dedupedMediaUrls
    : dedupedMediaUrls.length > 0
        ? dedupedMediaUrls
        : [];

This code ensures that media URLs are deduplicated even when audioAsVoice is true.

Notes

The fix assumes that the emittedToolOutputMediaUrls array is correctly populated with media URLs that have already been delivered in the tool output text. If this array is not correctly populated, the fix may not work as intended.

Recommendation

Apply the proposed fix to pi-embedded-runner-DN0VbqlW.js to prevent double delivery of voice messages. This fix is a minimal change that addresses the specific issue described in the bug report.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Bug: Voice memos delivered twice (duplicate) when audioAsVoice is set [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

OpenClaw Bug Report: Voice Memo Double Delivery on Telegram

Reproduction

Root Cause

The Flow

The Flawed Logic

Why It Only Affects Manual Voice Memos

Proposed Fix

Files Affected

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Bug: Voice memos delivered twice (duplicate) when audioAsVoice is set [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

OpenClaw Bug Report: Voice Memo Double Delivery on Telegram

Reproduction

Root Cause

The Flow

The Flawed Logic

Why It Only Affects Manual Voice Memos

Proposed Fix

Files Affected

Notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING