openclaw - ✅(Solved) Fix [Bug]: MEDIA directive delivers attachments twice on Telegram (non-streaming) [2 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78372Fetched 2026-05-07 03:37:40
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
3
Author
Timeline (top)
commented ×2cross-referenced ×2labeled ×1mentioned ×1

Root Cause

  • #70085 — same root cause pattern on WhatsApp: "duplicate delivery from separate reply-media normalizer instances/caches between block delivery and final reply delivery"
  • #68475 — Discord MEDIA duplicate delivery

Fix Action

Fixed

PR fix notes

PR #78355: fix(agents): deliver agent TTS audio when block streaming is off

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

  • Problem: Agent-generated tts tool audio can be generated successfully but never delivered on non-streaming block-reply channels when the block reply has both text and media. In Telegram this shows up as a “bare TTS” request producing no received voice/media message, even though /tts audio ... works and the speech provider produced an .opus file.
  • Why it matters: This makes the agent tts tool look flaky or provider/channel-specific, but the failing path is actually OpenClaw reply delivery fallback. Users can waste time debugging Fish Audio or Telegram even though media generation and direct Telegram media delivery are healthy.
  • What changed: When block streaming is disabled, send any media-bearing block reply directly, not only media-only block replies. Text-only block replies still accumulate into the final assistant text as before.
  • What did NOT change (scope boundary): This does not change TTS generation, speech providers, Telegram upload logic, /tts command behavior, block streaming behavior, or text-only fallback behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Real behavior proof

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count.

  • Behavior or issue addressed: Agent tts tool audio was generated but not delivered as Telegram voice/media for a bare TTS-style agent response when Telegram block streaming was off. /tts audio ... worked because the slash command returns a direct media reply through the command path. Mixed text + TTS scenarios were not the observed failure; the problematic case was the agent/tool block fallback for media-bearing block replies that were not media-only.

  • Real environment tested: OpenClaw 2026.5.5-beta.2 running in a container/pod, Telegram integration, Fish Audio speech provider, non-streaming block reply delivery. Runtime was patched with the same one-line delivery condition change via boot-time loader hook because /app was immutable in the pod.

  • Exact steps or command run after this patch:

    1. Apply equivalent runtime patch changing the non-streaming fallback from blockHasMedia && !blockPayload.text to blockHasMedia.
    2. Restart OpenClaw with the boot-time loader hook active.
    3. Trigger an agent-sent TTS-only response to Telegram.
    4. Confirm Telegram receives the generated voice/media message.
  • Evidence after fix: redacted runtime log and copied live user confirmation from the patched real Telegram setup:

    2026-05-06T17:29:11.453+10:00 [discord] client initialized as 1492070782170431548; awaiting gateway readiness
    [tts-media-patch] WARNING: patch target not found in file:///app/dist/agent-runner.runtime.js
    [tts-media-patch] ✅ text+media non-streaming block fallback patch applied to agent-runner.runtime-BsTUYqAI.js
    2026-05-06T17:30:10.204+10:00 [ws] ⇄ res ✓ health 117ms cached...

    Live user confirmation after the patched agent path loaded lazily:

    a tts only message was received and I saw the patch applied dynamically in the log after it was requested
  • Observed result after fix: The TTS-only agent message was received by Telegram after the delivery fallback patch applied dynamically on first lazy import of the agent runner.

  • What was not tested: Other non-streaming channels besides Telegram; block-streaming-enabled channels; every speech provider. The fix is provider-independent because it only changes reply fallback delivery after media already exists.

  • Before evidence: Before the patch, /tts audio ... worked and Fish Audio produced an .opus file, but a bare agent tts response did not arrive as Telegram voice/media. The investigated generated audio path included /tmp/openclaw/tts-wvm6vt/voice-1778050523564.opus, confirming generation succeeded before delivery failed.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause: createBlockReplyDeliveryHandler() only direct-sent non-streaming block replies when blockHasMedia && !blockPayload.text. That preserves media-only orphaned tool attachments, but drops the direct-send fallback for tool block replies that contain both text/caption metadata and media. Final assistant text can still be reconstructed later, but the media attachment cannot be reconstructed from final text, so the generated audio is effectively consumed before channel delivery.
  • Missing detection / guardrail: Existing tests covered media-only block fallback and expected captioned media-bearing blocks to remain buffered when block streaming was disabled. There was no regression test asserting that text+media or audio-as-voice block replies must still be delivered directly in the non-streaming fallback path.
  • Contributing context (if known): /tts audio ... takes a different command path and returns a direct media reply, so it continued to work. The failing agent tts path uses tool/media block reply delivery, where TTS-generated audio can carry audioAsVoice and a text/caption-bearing block payload.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/auto-reply/reply/reply-delivery.test.ts
  • Scenario the test should lock in: With block streaming disabled, captioned media-bearing block replies and captioned audioAsVoice replies are sent via onBlockReply and tracked in directlySentBlockKeys, while text-only blocks continue to accumulate into final text.
  • Why this is the smallest reliable guardrail: The regression is the fallback branch predicate in createBlockReplyDeliveryHandler(). A focused unit test directly exercises that branch without needing a Telegram or speech-provider fixture.
  • Existing test that already covers this (if any): Media-only non-streaming block replies were already covered; captioned media-bearing replies were covered with the opposite expectation and are updated here.
  • If no new test is added, why not: N/A; new coverage is added.

User-visible / Behavior Changes

Agent/tool replies that include both text and media now deliver their media on channels where block streaming is disabled. In practice, agent-sent TTS audio can arrive as Telegram voice/media instead of silently disappearing after successful generation.

Diagram (if applicable)

Before:
agent tts tool -> text+audio block reply -> block streaming disabled
  -> predicate requires media AND no text
  -> direct media fallback skipped
  -> final text may remain, generated audio is not delivered

After:
agent tts tool -> text+audio block reply -> block streaming disabled
  -> predicate requires media
  -> direct block reply sends text+audio payload
  -> Telegram receives generated voice/media

Security Impact (required)

  • New permissions/capabilities? (Yes/No): No
  • Secrets/tokens handling changed? (Yes/No): No
  • New/changed network calls? (Yes/No): No
  • Command/tool execution surface changed? (Yes/No): No
  • Data access scope changed? (Yes/No): No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux container/pod
  • Runtime/container: OpenClaw 2026.5.5-beta.2
  • Model/provider: Agent runtime with Fish Audio TTS provider
  • Integration/channel (if any): Telegram; non-streaming block reply delivery
  • Relevant config (redacted): Telegram block streaming disabled/default; TTS enabled; secrets redacted

Steps

  1. Confirm /tts audio hello sends a Telegram voice/media message.
  2. Ask the agent to send a TTS-only/bare spoken response using the tts tool.
  3. Observe that speech generation succeeds and produces a local audio file, but the Telegram voice/media message is not delivered before the fix.
  4. Apply this patch or equivalent runtime monkey patch.
  5. Repeat the TTS-only agent request.

Expected

  • Generated agent tts audio is delivered to Telegram as voice/media when media generation succeeds.

Actual

  • Before fix: /tts audio ... worked, but bare agent tts audio was generated and not delivered as Telegram voice/media.
  • After fix: bare agent tts audio is received by Telegram once the patched agent runner path is loaded.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Targeted local test after source patch:

✓ auto-reply-reply src/auto-reply/reply/reply-delivery.test.ts (11 tests) 17ms
Test Files  1 passed (1)
Tests       11 passed (11)

Runtime patch proof/log is included above. A Telegram screenshot can be added before submission if desired.

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • /tts audio ... worked before this change, proving the speech provider and direct media delivery path were healthy.
    • Runtime-equivalent patch applied dynamically to the lazily loaded agent runner module.
    • A TTS-only agent message was received by Telegram after the patch.
    • Unit tests pass for the updated fallback behavior.
  • Edge cases checked:
    • Text-only non-streaming blocks remain accumulated into final text.
    • Media-only non-streaming block replies remain direct-sent.
    • Captioned audioAsVoice media replies are now direct-sent and dedupe-tracked.
  • What you did not verify:
    • Non-Telegram non-streaming integrations.
    • Block-streaming-enabled channels.
    • Every speech provider.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No): Yes
  • Config/env changes? (Yes/No): No
  • Migration needed? (Yes/No): No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: A text+media block reply could be sent directly and also represented in final assistant text, creating a duplicate text caption on some channels.
    • Mitigation: Direct sends are tracked with directlySentBlockKeys, matching existing media-only fallback behavior. Text-only blocks still use the existing final-text accumulation path.
  • Risk: Some channels may handle captioned audio/media differently than media-only payloads.
    • Mitigation: The patch preserves the existing ReplyPayload shape and only extends the same direct fallback already used for media-only payloads to media-bearing payloads.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/auto-reply/reply/agent-runner.media-paths.test.ts (modified, +2/-2)
  • src/auto-reply/reply/reply-delivery.test.ts (modified, +49/-3)
  • src/auto-reply/reply/reply-delivery.ts (modified, +1/-3)

PR #78420: fix(telegram): deduplicate MEDIA attachments in non-streaming mode

Description (problem / solution / changelog)

Summary

  • Non-streaming Telegram delivers each MEDIA: attachment twice — once from the media-only block reply and once from the final reply
  • Track media URLs sent via block replies in a Set, then filter duplicates from final reply payloads
  • Extract deduplicateBlockSentMedia into a standalone pure function for testability

Closes #78372

Root cause

When streaming: "off", the deliver callback in bot-message-dispatch.ts receives both:

  1. A block reply with mediaUrls populated (sent directly via sendDirectBlockReply in reply-delivery.ts:160, which sends media-only blocks even when streaming is off)
  2. A final reply with the same mediaUrls (the MEDIA: directive text persists in the agent's complete output)

Unlike WebChat (which has appendedWebchatAgentMedia guard at chat.ts:2512) or the streaming path (which deduplicates via blockReplyPipeline), Telegram's non-streaming deliver callback had no mechanism to detect that the same media was already delivered.

Changes

  • bot-message-dispatch.ts: Add sentBlockMediaUrls tracking set inside runDispatch. In the deliver callback, record block-reply media URLs and call deduplicateBlockSentMedia for final replies. All downstream usage switches from payload to effectivePayload.
  • bot-message-dispatch.media-dedup.ts (new): Pure function deduplicateBlockSentMedia — returns deduplicated payload, or undefined to skip entirely.
  • bot-message-dispatch.media-dedup.test.ts (new): 7 test cases covering no-media, no-overlap, partial overlap, full overlap with/without text.

Test plan

  • Unit tests pass: pnpm test extensions/telegram/src/bot-message-dispatch.media-dedup.test.ts — 7/7
  • Format check passes: pnpm exec oxfmt --check on all 3 files
  • Existing Telegram tests: pnpm test extensions/telegram (requires full test environment)
  • Manual: configure Telegram bot with streaming: "off", trigger media generation, verify each attachment delivered once
  • Manual: verify streaming mode is unaffected (uses separate blockReplyPipeline dedup path)

Changed files

  • extensions/telegram/src/bot-message-dispatch.media-dedup.test.ts (added, +49/-0)
  • extensions/telegram/src/bot-message-dispatch.media-dedup.ts (added, +22/-0)
  • extensions/telegram/src/bot-message-dispatch.ts (modified, +40/-11)
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Describe the bug

When an agent outputs MEDIA:<path> tags for generated images/voice messages, Telegram receives each attachment twice, while the Web UI correctly shows only one delivery.

This affects non-streaming Telegram bot accounts (streaming mode is off for the affected agent).

To Reproduce

  1. Configure an agent with streaming: "off" on Telegram channel
  2. Use a plugin tool that generates media (e.g., TTS audio, generated images)
  3. Agent outputs MEDIA:<audioPath> and/or MEDIA:<imagePath>
  4. Observe Telegram delivers each attachment twice

Expected behavior

Each MEDIA: tag delivers the attachment exactly once, matching the Web UI behavior.

Diagnostic evidence

  • Trajectory confirms model outputs MEDIA once: assistantTexts: ["MEDIA:/path/to/voice.ogg\nMEDIA:/path/to/image.jpg"]
  • didSendViaMessagingTool: false, messagingToolSentTexts: [] — no duplicate from message tool
  • Plugin's built-in delivery has RUNTIME_CAPABILITY_MISSING, falls back to MEDIA tag
  • Web UI shows single delivery; only Telegram shows duplicates

Environment

  • OpenClaw: 2026.5.4 (325df3e)
  • OS: macOS 26.2 (arm64)
  • Node: v22.22.1
  • Channel: Telegram (bot token, polling mode)
  • Streaming: off (per-agent override)

Version regression

This issue was not present in v2026.4.27. It was likely introduced in v2026.4.29 (major agent reply runtime restructuring), potentially worsened by v2026.5.2 ("tighten Telegram delivery/recovery behavior").

v2026.5.4's fix "Agents/media: avoid sending generated attachments twice when streamed reply text arrives before final MEDIA directive" only covers streaming scenarios and does not resolve this non-streaming case.

Related issues

  • #70085 — same root cause pattern on WhatsApp: "duplicate delivery from separate reply-media normalizer instances/caches between block delivery and final reply delivery"
  • #68475 — Discord MEDIA duplicate delivery

Steps to reproduce

  1. Configure an agent with streaming: "off" on Telegram channel
  2. Use a plugin tool that generates media (e.g., TTS audio, generated images)
  3. Agent outputs MEDIA:<audioPath> and/or MEDIA:<imagePath>
  4. Observe Telegram delivers each attachment twice

Expected behavior

Each MEDIA: tag should result in exactly one attachment delivered to the Telegram chat:

• MEDIA:/path/to/image.jpg → Telegram receives the image once • MEDIA:/path/to/voice.ogg → Telegram receives the voice once • No duplicate file deliveries in chat history • Behavior identical to Web UI, which delivers attachments only once

Actual behavior

Each MEDIA: tag results in attachments being delivered twice on Telegram:

• MEDIA:/path/to/image.jpg → Telegram receives the image twice (two photo messages appear in chat) • MEDIA:/path/to/voice.ogg → Telegram receives the voice twice (two voice messages appear in chat) • This happens for both image and audio media types, not limited to one format • The issue is consistent across multiple requests — every media generation triggers double delivery • Plugin's built-in delivery mechanism shows RUNTIME_CAPABILITY_MISSING and correctly falls back to MEDIA: tag, ruling out the plugin itself as the source of duplication • Web UI shows only one delivery per attachment, confirming the problem is Telegram-channel-specific

OpenClaw version

2026.5.4 (325df3e)

Operating system

macOS 26.2

Install method

No response

Model

DeepSeek V4 Flash

Provider / routing chain

openclaw -> DeepSeek

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Each MEDIA: tag should result in exactly one attachment delivered to the Telegram chat:

• MEDIA:/path/to/image.jpg → Telegram receives the image once • MEDIA:/path/to/voice.ogg → Telegram receives the voice once • No duplicate file deliveries in chat history • Behavior identical to Web UI, which delivers attachments only once

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: MEDIA directive delivers attachments twice on Telegram (non-streaming) [2 pull requests, 2 comments, 3 participants]