openclaw - ✅(Solved) Fix [Bug]: Agent-sent TTS audio not delivered via webchat — block vs final payload mismatch [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63033Fetched 2026-04-09 07:59:17
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
cross-referenced ×2labeled ×1referenced ×1

Agent-initiated TTS audio (via the tts tool or [[tts]] tags) is never delivered to the client on webchat. The /tts audio slash command works correctly. The issue is in the reply delivery pipeline, not the TTS provider or audio generation.

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

// server.impl — webchat dispatch post-processing
const finalPayloads = deliveredReplies
    .filter((entry) => entry.kind === "final")  // ← blocks excluded
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(finalPayloads);

There is a synthetic fallback in dispatch that attempts to TTS the accumulated block text after the run completes:

// dispatch — after embedded run
if (resolveConfiguredTtsMode(cfg) === "final" && replies.length === 0 
    && blockCount > 0 && accumulatedBlockText.trim()) {
    const ttsSyntheticReply = await maybeApplyTtsToReplyPayload({
        payload: { text: accumulatedBlockText },
        cfg, channel: ttsChannel, kind: "final", ...
    });

PR fix notes

PR #63064: fix(webchat): deliver agent TTS audio from block payloads #63033

Description (problem / solution / changelog)

Summary

  • Problem: Agent-generated TTS audio could be produced on disk but never delivered to webchat when it arrived via block payloads (pending tool media flush / streaming paths).
  • Why it matters: This fully breaks agent-initiated TTS delivery on webchat, while /tts audio continues to work.
  • What changed:
    • Webchat post-processing now extracts embeddable local audio from all delivered reply payloads (block + final), while still aggregating text from final-only payloads.
    • The final-mode “accumulated block text → synthetic TTS” fallback now triggers based on whether a final reply was actually delivered, not whether the embedded run produced any final replies.
  • What did NOT change (scope boundary):
    • No changes to TTS provider behavior or audio synthesis.
    • No change to text aggregation semantics (still final-only).

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Fixes #63033

User-visible / Behavior Changes

  • Webchat now delivers agent-generated TTS audio even when the audio reference is emitted via streaming/tool block payloads.
  • Final-mode TTS fallback after streaming is more reliable when no final reply was actually delivered.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Debian/Linux
  • Runtime/container: local repo test run
  • Integration/channel: webchat (gateway post-processing)

Steps

  1. Have an embedded run emit TTS audio as a pending tool media flush (block-kind payload).
  2. Observe webchat post-processing previously only extracted audio from final-kind payloads.
  3. Confirm audio is now embedded/delivered in the final webchat message.

Expected

  • Audio is delivered to webchat consistently for agent-initiated TTS.

Actual (before)

  • Audio file exists, but webchat never embeds/delivers it.

Evidence

  • Added/updated unit tests covering:
    • mediaUrls array audio embedding for webchat audio blocks
    • Final-mode synthetic TTS triggered after streaming when finals exist but none were delivered

Human Verification (required)

  • Verified scenarios:
    • Targeted tests passing:
      • pnpm test src/auto-reply/reply/dispatch-from-config.test.ts src/gateway/server-methods/chat-webchat-media.test.ts
  • Edge cases checked:
    • Keep text aggregation final-only while extracting audio from block+final payloads
  • What I did not verify:
    • End-to-end web UI playback (relies on existing transcript/audio rendering)

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Risks and Mitigations

  • Risk: Duplicated audio embedding if both block and final include the same path.
    • Mitigation: buildWebchatAudioContentBlocksFromReplyPayloads already dedupes resolved paths.

Changed files

  • src/auto-reply/reply/dispatch-from-config.test.ts (modified, +29/-0)
  • src/auto-reply/reply/dispatch-from-config.ts (modified, +2/-7)
  • src/gateway/server-methods/chat-webchat-media.test.ts (modified, +11/-0)
  • src/gateway/server-methods/chat.ts (modified, +5/-1)

PR #63514: fix: Fix webchat TTS tool audio delivery

Description (problem / solution / changelog)

Summary

  • Problem: The tts tool generates audio successfully, but webchat never delivers it to the client. The /tts slash command works fine.
  • Why it matters: Agent-driven TTS is completely broken in webchat.
  • What changed: Added a isMediaBearingPayload helper and updated the webchat deliver callback to accept tool results that carry media by promoting them to "final" kind.
  • What did NOT change: Non-media tool results are still dropped. Block and final payloads behave the same. Other channels are not touched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #63033
  • This PR fixes a bug or regression

Root Cause

  • Root cause: The webchat deliver callback drops all "tool" kind payloads. TTS tool audio arrives as "tool" kind, so it gets silently discarded. The downstream audio extraction only looks at "final" kind, which makes this a double gate.
  • Missing detection / guardrail: No test covers media-bearing tool result delivery in webchat.
  • Contributing context: The /tts slash command bypasses this because it enters the pipeline as "final" kind directly.

Regression Test Plan

  • Coverage level that should have caught this:
    • Seam / integration test
  • Target test or file: src/gateway/server-methods/chat.ts
  • Scenario the test should lock in: Tool result with mediaUrl appears in delivered replies as "final".
  • If no new test is added, why not: Keeping this PR minimal. Test can follow separately.

User-visible / Behavior Changes

TTS tool audio now reaches webchat clients. Previously it was silently dropped.

Diagram

Before:
TTS tool -> deliver(payload, kind:"tool") -> dropped

After:
TTS tool -> deliver(payload, kind:"tool") -> has media? -> promote to "final" -> audio delivered

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: Linux
  • Runtime: Node 24
  • Channel: Webchat

Steps

  1. Configure TTS with a working provider
  2. Send a webchat message that makes the agent call the tts tool
  3. Check if audio appears in the response

Expected

Audio delivered to webchat client.

Actual (before fix)

No audio. MP3 exists on disk but never sent.

Evidence

  • Trace/log snippets

Code trace confirmed: deliver callback received kind: "tool" and returned early. After fix, media-bearing tool payloads enter the case "tool" branch and get promoted to "final".

Human Verification

  • Verified: Full dispatch path traced from tool execution to webchat audio extraction. Lint, format, and type checks all pass.
  • Edge cases: Empty mediaUrl, undefined mediaUrls, non-media tool results, BTW reply filtering, text assembly for media-only payloads.
  • Not verified: Live end-to-end with a running gateway.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: Non-audio media tool results also get promoted to "final".
    • Mitigation: The audio extraction only picks up audio files via isAudioFileName. Non-audio media is ignored. No side effect.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +20/-0)
  • src/gateway/server-methods/chat.directive-tags.test.ts (modified, +86/-1)
  • src/gateway/server-methods/chat.ts (modified, +70/-3)

Code Example

// pi-embedded — handleAgentEnd → flushPendingMediaAndChannel
const pendingToolMediaReply = consumePendingToolMediaReply(ctx.state);
if (pendingToolMediaReply && hasAssistantVisibleReply(pendingToolMediaReply))
    ctx.emitBlockReply(pendingToolMediaReply);  // ← emitted as "block" kind

---

// server.impl — webchat dispatch post-processing
const finalPayloads = deliveredReplies
    .filter((entry) => entry.kind === "final")  // ← blocks excluded
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(finalPayloads);

---

// speech-core/runtime-api.js — maybeApplyTtsToPayload
if ((config.mode ?? "final") === "final" && params.kind && params.kind !== "final")
    return nextPayload;  // ← blocks skip TTS processing entirely

---

// dispatch — after embedded run
if (resolveConfiguredTtsMode(cfg) === "final" && replies.length === 0 
    && blockCount > 0 && accumulatedBlockText.trim()) {
    const ttsSyntheticReply = await maybeApplyTtsToReplyPayload({
        payload: { text: accumulatedBlockText },
        cfg, channel: ttsChannel, kind: "final", ...
    });

---
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Agent-initiated TTS audio (via the tts tool or [[tts]] tags) is never delivered to the client on webchat. The /tts audio slash command works correctly. The issue is in the reply delivery pipeline, not the TTS provider or audio generation.

Steps to reproduce

Path 1: tts tool

  1. Agent calls the tts tool with text
  2. textToSpeech() succeeds, audio file is created on disk (confirmed valid MP3)
  3. Agent replies with NO_REPLY (as instructed by tool description)
  4. No audio is delivered to the webchat client

Path 2: [[tts]] tags

  1. Agent includes [[tts]] or [[tts:text]]...[[/tts:text]] tags in a reply
  2. During streaming, raw tags are visible in the UI (not processed)
  3. After the run completes, no audio is delivered
  4. Tags may remain visible as raw text in the final message

Working path: /tts audio

  1. User runs /tts audio Hello world
  2. Audio is generated and delivered correctly on both webchat and Telegram

Root Cause Analysis

tts tool path

The tts tool returns audio via details.media.mediaUrl. This is picked up by extractToolResultMediaArtifact() and queued in state.pendingToolMediaUrls via queuePendingToolMedia().

At end-of-run, the embedded handler (pi-embedded) flushes pending tool media as a block reply:

// pi-embedded — handleAgentEnd → flushPendingMediaAndChannel
const pendingToolMediaReply = consumePendingToolMediaReply(ctx.state);
if (pendingToolMediaReply && hasAssistantVisibleReply(pendingToolMediaReply))
    ctx.emitBlockReply(pendingToolMediaReply);  // ← emitted as "block" kind

However, the webchat delivery path only extracts audio content blocks from final-kind payloads:

// server.impl — webchat dispatch post-processing
const finalPayloads = deliveredReplies
    .filter((entry) => entry.kind === "final")  // ← blocks excluded
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(finalPayloads);

The block payload containing the audio file path is collected by the deliver callback but never processed for audio extraction. The audio file sits on disk, unused.

[[tts]] tag path

The auto-TTS mode defaults to "final", which skips block-kind replies:

// speech-core/runtime-api.js — maybeApplyTtsToPayload
if ((config.mode ?? "final") === "final" && params.kind && params.kind !== "final")
    return nextPayload;  // ← blocks skip TTS processing entirely

For streaming models, agent text is delivered incrementally as block replies — so [[tts]] tags pass through unprocessed during streaming.

There is a synthetic fallback in dispatch that attempts to TTS the accumulated block text after the run completes:

// dispatch — after embedded run
if (resolveConfiguredTtsMode(cfg) === "final" && replies.length === 0 
    && blockCount > 0 && accumulatedBlockText.trim()) {
    const ttsSyntheticReply = await maybeApplyTtsToReplyPayload({
        payload: { text: accumulatedBlockText },
        cfg, channel: ttsChannel, kind: "final", ...
    });

But this only fires when replies.length === 0 (no final replies returned from the embedded run). Whether the embedded run returns final replies depends on the model/provider implementation, making this path unreliable.

Expected behavior

Audio generated by the agent (via tool or tags) should be delivered to the client, matching the behavior of /tts audio.

Actual behavior

No TTS message is delivered (tested channels: web, telegram)

OpenClaw version

2026.3.7/2026.3.8

Operating system

"Debian GNU/Linux 12 (bookworm)"

Install method

docker

Model

claude-opus-4.6/gpt-codex-5.4

Provider / routing chain

openclaw -> telegram/web

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: All TTS users Severity: complete feature disablement Frequency: always

Additional information

No response

extent analysis

TL;DR

The issue can be fixed by modifying the webchat delivery path to extract audio content blocks from both "final" and "block" kind payloads.

Guidance

  • Identify the server.impl file and update the webchat dispatch post-processing function to include "block" kind payloads when extracting audio content blocks.
  • Review the speech-core/runtime-api.js file and consider updating the maybeApplyTtsToPayload function to process block-kind replies when the auto-TTS mode is set to "final".
  • Verify that the dispatch function's synthetic fallback for TTS processing is working correctly and firing when expected.
  • Test the updated code with both the tts tool and [[tts]] tags to ensure audio is delivered correctly to the client.

Example

// server.impl — webchat dispatch post-processing (updated)
const payloads = deliveredReplies
    .filter((entry) => entry.kind === "final" || entry.kind === "block")
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(payloads);

Notes

The provided code snippets and analysis suggest that the issue lies in the way audio content blocks are extracted and processed for webchat delivery. However, without more information about the specific requirements and constraints of the project, it's difficult to provide a more detailed solution.

Recommendation

Apply a workaround by updating the server.impl file to include "block" kind payloads when extracting audio content blocks, as this is the most direct way to address the issue. This change should allow audio generated by the agent to be delivered to the client, matching the behavior of /tts audio.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Audio generated by the agent (via tool or tags) should be delivered to the client, matching the behavior of /tts audio.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING