openclaw - ✅(Solved) Fix [Bug]: MEDIA directive delivers attachments twice on Telegram (non-streaming) [2 pull requests, 2 comments, 3 participants]

milklion · 2026-05-06T08:42:53Z

[openclaw] PR 78355: fix agents : deliver agent TTS audio when block streaming is off - Repository: openclaw/openclaw - Author: Conan-Scott - State: closed | m… # PR #78355: fix(agents): deliver agent TTS audio when block streaming is off - Repository: openclaw/openclaw - Author: Conan-Scott - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/78355 ## Description (problem / solution / changelog) ## Summary Describe the problem and fix in 2–5 bullets: If this PR fixes a plugin beta-release blocker, title it `fix( ): beta blocker - ` and link the matching `Beta blocker: - ` issue labeled `beta-blocker`. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation. - Problem: Agent-generated `tts` tool audio can be generated successfully but never delivered on non-streaming block-reply channels when the block reply has both text and media. In Telegram this shows up as a “bare TTS” request producing no received voice/media message, even though `/tts audio ...` works and the speech provider produced an `.opus` file. - Why it matters: This makes the agent `tts` tool look flaky or provider/channel-specific, but the failing path is actually OpenClaw reply delivery fallback. Users can waste time debugging Fish Audio or Telegram even though media generation and direct Telegram media delivery are healthy. - What changed: When block streaming is disabled, send any media-bearing block reply directly, not only media-only block replies. Text-only block replies still accumulate into the final assistant text as before. - What did NOT change (scope boundary): This does not change TTS generation, speech providers, Telegram upload logic, `/tts` command behavior, block streaming behavior, or text-only fallback behavior. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes # - Related # - [x] This PR fixes a bug or regression ## Real behavior proof External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. - **Behavior or issue addressed:** Agent `tts` tool audio was generated but not delivered as Telegram voice/media for a bare TTS-style agent response when Telegram block streaming was off. `/tts audio ...` worked because the slash command returns a direct media reply through the command path. Mixed text + TTS scenarios were not the observed failure; the problematic case was the agent/tool block fallback for media-bearing block replies that were not media-only. - **Real environment tested:** OpenClaw `2026.5.5-beta.2` running in a container/pod, Telegram integration, Fish Audio speech provider, non-streaming block reply delivery. Runtime was patched with the same one-line delivery condition change via boot-time loader hook because `/app` was immutable in the pod. - **Exact steps or command run after this patch:** 1. Apply equivalent runtime patch changing the non-streaming fallback from `blockHasMedia && !blockPayload.text` to `blockHasMedia`. 2. Restart OpenClaw with the boot-time loader hook active. 3. Trigger an agent-sent TTS-only response to Telegram. 4. Confirm Telegram receives the generated voice/media message. - **Evidence after fix:** redacted runtime log and copied live user confirmation from the patched real Telegram setup: ```text 2026-05-06T17:29:11.453+10:00 [discord] client initialized as 1492070782170431548; awaiting gateway readiness [tts-media-patch] WARNING: patch target not found in file:///app/dist/agent-runner.runtime.js [tts-media-patch] ✅ text+media non-streaming block fallback patch applied to agent-runner.runtime-BsTUYqAI.js 2026-05-06T17:30:10.204+10:00 [ws] ⇄ res ✓ health 117ms cached... ``` Live user confirmation after the patched agent path loaded lazily: ```text a tts only message was received and I saw the patch applied dynamically in the log after it was requested ``` - **Observed result after fix:** The TTS-only agent message was received by Telegram after the delivery fallback patch applied dynamically on first lazy import of the agent runner. - **What was not tested:** Other non-streaming channels besides Telegram; block-streaming-enabled channels; every speech provider. The fix is provider-independent because it only changes reply fallback delivery after media already exists. - **Before evidence:** Before the patch, `/tts audio ...` worked and Fish Audio produced an `.opus` file, but a bare agent `tts` response did not arrive as Telegram vo

openclaw2026-05-06 08:42:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#78372•Fetched 2026-05-07 03:37:40

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×2cross-referenced ×2labeled ×1mentioned ×1

Root Cause

#70085 — same root cause pattern on WhatsApp: "duplicate delivery from separate reply-media normalizer instances/caches between block delivery and final reply delivery"
#68475 — Discord MEDIA duplicate delivery

Fix Action

Fixed

Fixed by PR: fix(agents): deliver agent TTS audio when block streaming is off (https://github.com/openclaw/openclaw/pull/78355)
Fixed by PR: fix(telegram): deduplicate MEDIA attachments in non-streaming mode (https://github.com/openclaw/openclaw/pull/78420)

PR fix notes

PR #78355: fix(agents): deliver agent TTS audio when block streaming is off

Repository: openclaw/openclaw
Author: Conan-Scott
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/78355

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

Problem: Agent-generated tts tool audio can be generated successfully but never delivered on non-streaming block-reply channels when the block reply has both text and media. In Telegram this shows up as a “bare TTS” request producing no received voice/media message, even though /tts audio ... works and the speech provider produced an .opus file.
Why it matters: This makes the agent tts tool look flaky or provider/channel-specific, but the failing path is actually OpenClaw reply delivery fallback. Users can waste time debugging Fish Audio or Telegram even though media generation and direct Telegram media delivery are healthy.
What changed: When block streaming is disabled, send any media-bearing block reply directly, not only media-only block replies. Text-only block replies still accumulate into the final assistant text as before.
What did NOT change (scope boundary): This does not change TTS generation, speech providers, Telegram upload logic, /tts command behavior, block streaming behavior, or text-only fallback behavior.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Real behavior proof

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count.

Behavior or issue addressed: Agent tts tool audio was generated but not delivered as Telegram voice/media for a bare TTS-style agent response when Telegram block streaming was off. /tts audio ... worked because the slash command returns a direct media reply through the command path. Mixed text + TTS scenarios were not the observed failure; the problematic case was the agent/tool block fallback for media-bearing block replies that were not media-only.
Real environment tested: OpenClaw 2026.5.5-beta.2 running in a container/pod, Telegram integration, Fish Audio speech provider, non-streaming block reply delivery. Runtime was patched with the same one-line delivery condition change via boot-time loader hook because /app was immutable in the pod.
Exact steps or command run after this patch:
1. Apply equivalent runtime patch changing the non-streaming fallback from blockHasMedia && !blockPayload.text to blockHasMedia.
2. Restart OpenClaw with the boot-time loader hook active.
3. Trigger an agent-sent TTS-only response to Telegram.
4. Confirm Telegram receives the generated voice/media message.

Evidence after fix: redacted runtime log and copied live user confirmation from the patched real Telegram setup:

2026-05-06T17:29:11.453+10:00 [discord] client initialized as 1492070782170431548; awaiting gateway readiness
[tts-media-patch] WARNING: patch target not found in file:///app/dist/agent-runner.runtime.js
[tts-media-patch] ✅ text+media non-streaming block fallback patch applied to agent-runner.runtime-BsTUYqAI.js
2026-05-06T17:30:10.204+10:00 [ws] ⇄ res ✓ health 117ms cached...

Live user confirmation after the patched agent path loaded lazily:

a tts only message was received and I saw the patch applied dynamically in the log after it was requested

Observed result after fix: The TTS-only agent message was received by Telegram after the delivery fallback patch applied dynamically on first lazy import of the agent runner.
What was not tested: Other non-streaming channels besides Telegram; block-streaming-enabled channels; every speech provider. The fix is provider-independent because it only changes reply fallback delivery after media already exists.
Before evidence: Before the patch, /tts audio ... worked and Fish Audio produced an .opus file, but a bare agent tts response did not arrive as Telegram voice/media. The investigated generated audio path included /tmp/openclaw/tts-wvm6vt/voice-1778050523564.opus, confirming generation succeeded before delivery failed.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause: createBlockReplyDeliveryHandler() only direct-sent non-streaming block replies when blockHasMedia && !blockPayload.text. That preserves media-only orphaned tool attachments, but drops the direct-send fallback for tool block replies that contain both text/caption metadata and media. Final assistant text can still be reconstructed later, but the media attachment cannot be reconstructed from final text, so the generated audio is effectively consumed before channel delivery.
Missing detection / guardrail: Existing tests covered media-only block fallback and expected captioned media-bearing blocks to remain buffered when block streaming was disabled. There was no regression test asserting that text+media or audio-as-voice block replies must still be delivered directly in the non-streaming fallback path.
Contributing context (if known): /tts audio ... takes a different command path and returns a direct media reply, so it continued to work. The failing agent tts path uses tool/media block reply delivery, where TTS-generated audio can carry audioAsVoice and a text/caption-bearing block payload.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/auto-reply/reply/reply-delivery.test.ts
Scenario the test should lock in: With block streaming disabled, captioned media-bearing block replies and captioned audioAsVoice replies are sent via onBlockReply and tracked in directlySentBlockKeys, while text-only blocks continue to accumulate into final text.
Why this is the smallest reliable guardrail: The regression is the fallback branch predicate in createBlockReplyDeliveryHandler(). A focused unit test directly exercises that branch without needing a Telegram or speech-provider fixture.
Existing test that already covers this (if any): Media-only non-streaming block replies were already covered; captioned media-bearing replies were covered with the opposite expectation and are updated here.
If no new test is added, why not: N/A; new coverage is added.

User-visible / Behavior Changes

Agent/tool replies that include both text and media now deliver their media on channels where block streaming is disabled. In practice, agent-sent TTS audio can arrive as Telegram voice/media instead of silently disappearing after successful generation.

Diagram (if applicable)

Before:
agent tts tool -> text+audio block reply -> block streaming disabled
  -> predicate requires media AND no text
  -> direct media fallback skipped
  -> final text may remain, generated audio is not delivered

After:
agent tts tool -> text+audio block reply -> block streaming disabled
  -> predicate requires media
  -> direct block reply sends text+audio payload
  -> Telegram receives generated voice/media

Security Impact (required)

New permissions/capabilities? (Yes/No): No
Secrets/tokens handling changed? (Yes/No): No
New/changed network calls? (Yes/No): No
Command/tool execution surface changed? (Yes/No): No
Data access scope changed? (Yes/No): No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux container/pod
Runtime/container: OpenClaw 2026.5.5-beta.2
Model/provider: Agent runtime with Fish Audio TTS provider
Integration/channel (if any): Telegram; non-streaming block reply delivery
Relevant config (redacted): Telegram block streaming disabled/default; TTS enabled; secrets redacted

Steps

Confirm /tts audio hello sends a Telegram voice/media message.
Ask the agent to send a TTS-only/bare spoken response using the tts tool.
Observe that speech generation succeeds and produces a local audio file, but the Telegram voice/media message is not delivered before the fix.
Apply this patch or equivalent runtime monkey patch.
Repeat the TTS-only agent request.

Expected

Generated agent tts audio is delivered to Telegram as voice/media when media generation succeeds.

Actual

Before fix: /tts audio ... worked, but bare agent tts audio was generated and not delivered as Telegram voice/media.
After fix: bare agent tts audio is received by Telegram once the patched agent runner path is loaded.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Targeted local test after source patch:

✓ auto-reply-reply src/auto-reply/reply/reply-delivery.test.ts (11 tests) 17ms
Test Files  1 passed (1)
Tests       11 passed (11)

Runtime patch proof/log is included above. A Telegram screenshot can be added before submission if desired.

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- /tts audio ... worked before this change, proving the speech provider and direct media delivery path were healthy.
- Runtime-equivalent patch applied dynamically to the lazily loaded agent runner module.
- A TTS-only agent message was received by Telegram after the patch.
- Unit tests pass for the updated fallback behavior.
Edge cases checked:
- Text-only non-streaming blocks remain accumulated into final text.
- Media-only non-streaming block replies remain direct-sent.
- Captioned audioAsVoice media replies are now direct-sent and dedupe-tracked.
What you did not verify:
- Non-Telegram non-streaming integrations.
- Block-streaming-enabled channels.
- Every speech provider.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No): Yes
Config/env changes? (Yes/No): No
Migration needed? (Yes/No): No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: A text+media block reply could be sent directly and also represented in final assistant text, creating a duplicate text caption on some channels.
- Mitigation: Direct sends are tracked with directlySentBlockKeys, matching existing media-only fallback behavior. Text-only blocks still use the existing final-text accumulation path.
Risk: Some channels may handle captioned audio/media differently than media-only payloads.
- Mitigation: The patch preserves the existing ReplyPayload shape and only extends the same direct fallback already used for media-only payloads to media-bearing payloads.

Changed files

CHANGELOG.md (modified, +1/-0)
src/auto-reply/reply/agent-runner.media-paths.test.ts (modified, +2/-2)
src/auto-reply/reply/reply-delivery.test.ts (modified, +49/-3)
src/auto-reply/reply/reply-delivery.ts (modified, +1/-3)

PR #78420: fix(telegram): deduplicate MEDIA attachments in non-streaming mode

Repository: openclaw/openclaw
Author: rogerdigital
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/78420

Description (problem / solution / changelog)

Summary

Non-streaming Telegram delivers each MEDIA: attachment twice — once from the media-only block reply and once from the final reply
Track media URLs sent via block replies in a Set, then filter duplicates from final reply payloads
Extract deduplicateBlockSentMedia into a standalone pure function for testability

Closes #78372

Root cause

When streaming: "off", the deliver callback in bot-message-dispatch.ts receives both:

A block reply with mediaUrls populated (sent directly via sendDirectBlockReply in reply-delivery.ts:160, which sends media-only blocks even when streaming is off)
A final reply with the same mediaUrls (the MEDIA: directive text persists in the agent's complete output)

Unlike WebChat (which has appendedWebchatAgentMedia guard at chat.ts:2512) or the streaming path (which deduplicates via blockReplyPipeline), Telegram's non-streaming deliver callback had no mechanism to detect that the same media was already delivered.

Changes

bot-message-dispatch.ts: Add sentBlockMediaUrls tracking set inside runDispatch. In the deliver callback, record block-reply media URLs and call deduplicateBlockSentMedia for final replies. All downstream usage switches from payload to effectivePayload.
bot-message-dispatch.media-dedup.ts (new): Pure function deduplicateBlockSentMedia — returns deduplicated payload, or undefined to skip entirely.
bot-message-dispatch.media-dedup.test.ts (new): 7 test cases covering no-media, no-overlap, partial overlap, full overlap with/without text.

Test plan

Unit tests pass: pnpm test extensions/telegram/src/bot-message-dispatch.media-dedup.test.ts — 7/7
Format check passes: pnpm exec oxfmt --check on all 3 files
Existing Telegram tests: pnpm test extensions/telegram (requires full test environment)
Manual: configure Telegram bot with streaming: "off", trigger media generation, verify each attachment delivered once
Manual: verify streaming mode is unaffected (uses separate blockReplyPipeline dedup path)

Changed files

extensions/telegram/src/bot-message-dispatch.media-dedup.test.ts (added, +49/-0)
extensions/telegram/src/bot-message-dispatch.media-dedup.ts (added, +22/-0)
extensions/telegram/src/bot-message-dispatch.ts (modified, +40/-11)

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Describe the bug

When an agent outputs MEDIA:<path> tags for generated images/voice messages, Telegram receives each attachment twice, while the Web UI correctly shows only one delivery.

This affects non-streaming Telegram bot accounts (streaming mode is off for the affected agent).

To Reproduce

Configure an agent with streaming: "off" on Telegram channel
Use a plugin tool that generates media (e.g., TTS audio, generated images)
Agent outputs MEDIA:<audioPath> and/or MEDIA:<imagePath>
Observe Telegram delivers each attachment twice

Expected behavior

Each MEDIA: tag delivers the attachment exactly once, matching the Web UI behavior.

Diagnostic evidence

Trajectory confirms model outputs MEDIA once: assistantTexts: ["MEDIA:/path/to/voice.ogg\nMEDIA:/path/to/image.jpg"]
didSendViaMessagingTool: false, messagingToolSentTexts: [] — no duplicate from message tool
Plugin's built-in delivery has RUNTIME_CAPABILITY_MISSING, falls back to MEDIA tag
Web UI shows single delivery; only Telegram shows duplicates

Environment

OpenClaw: 2026.5.4 (325df3e)
OS: macOS 26.2 (arm64)
Node: v22.22.1
Channel: Telegram (bot token, polling mode)
Streaming: off (per-agent override)

Version regression

This issue was not present in v2026.4.27. It was likely introduced in v2026.4.29 (major agent reply runtime restructuring), potentially worsened by v2026.5.2 ("tighten Telegram delivery/recovery behavior").

v2026.5.4's fix "Agents/media: avoid sending generated attachments twice when streamed reply text arrives before final MEDIA directive" only covers streaming scenarios and does not resolve this non-streaming case.

Related issues

#70085 — same root cause pattern on WhatsApp: "duplicate delivery from separate reply-media normalizer instances/caches between block delivery and final reply delivery"
#68475 — Discord MEDIA duplicate delivery

Steps to reproduce

Configure an agent with streaming: "off" on Telegram channel
Use a plugin tool that generates media (e.g., TTS audio, generated images)
Agent outputs MEDIA:<audioPath> and/or MEDIA:<imagePath>
Observe Telegram delivers each attachment twice

Expected behavior

Each MEDIA: tag should result in exactly one attachment delivered to the Telegram chat:

• MEDIA:/path/to/image.jpg → Telegram receives the image once • MEDIA:/path/to/voice.ogg → Telegram receives the voice once • No duplicate file deliveries in chat history • Behavior identical to Web UI, which delivers attachments only once

Actual behavior

Each MEDIA: tag results in attachments being delivered twice on Telegram:

• MEDIA:/path/to/image.jpg → Telegram receives the image twice (two photo messages appear in chat) • MEDIA:/path/to/voice.ogg → Telegram receives the voice twice (two voice messages appear in chat) • This happens for both image and audio media types, not limited to one format • The issue is consistent across multiple requests — every media generation triggers double delivery • Plugin's built-in delivery mechanism shows RUNTIME_CAPABILITY_MISSING and correctly falls back to MEDIA: tag, ruling out the plugin itself as the source of duplication • Web UI shows only one delivery per attachment, confirming the problem is Telegram-channel-specific

OpenClaw version

2026.5.4 (325df3e)

Operating system

macOS 26.2

Install method

No response

Model

DeepSeek V4 Flash

Provider / routing chain

openclaw -> DeepSeek

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Each MEDIA: tag should result in exactly one attachment delivered to the Telegram chat:

#serialization error #model compatibility #GPU setup #container setup #orchestration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: MEDIA directive delivers attachments twice on Telegram (non-streaming) [2 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #78355: fix(agents): deliver agent TTS audio when block streaming is off

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Real behavior proof

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

PR #78420: fix(telegram): deduplicate MEDIA attachments in non-streaming mode

Description (problem / solution / changelog)

Summary

Root cause

Changes

Test plan

Changed files

Bug type

Beta release blocker

Summary

Describe the bug

To Reproduce

Expected behavior

Diagnostic evidence

Environment

Version regression

Related issues

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING