openclaw - ✅(Solved) Fix [Bug]: Agent-sent TTS audio not delivered via webchat — block vs final payload mismatch [2 pull requests, 1 participants]

Conan-Scott · 2026-04-08T08:30:47Z

[openclaw] Agent-initiated TTS audio via the tts tool or tts tags is never delivered to the client on webchat. The /tts audio slash command works correctly. Th… Agent-initiated TTS audio (via the `tts` tool or `[[tts]]` tags) is never delivered to the client on webchat. The `/tts audio` slash command works correctly. The issue is in the reply delivery pipeline, not the TTS provider or audio generation. # PR #63064: fix(webchat): deliver agent TTS audio from block payloads #63033 - Repository: openclaw/openclaw - Author: jepson-liu - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/63064 ## Description (problem / solution / changelog) ## Summary - Problem: Agent-generated TTS audio could be produced on disk but never delivered to webchat when it arrived via `block` payloads (pending tool media flush / streaming paths). - Why it matters: This fully breaks agent-initiated TTS delivery on webchat, while `/tts audio` continues to work. - What changed: - Webchat post-processing now extracts embeddable local audio from all delivered reply payloads (block + final), while still aggregating text from final-only payloads. - The final-mode “accumulated block text → synthetic TTS” fallback now triggers based on whether a final reply was actually delivered, not whether the embedded run produced any final replies. - What did NOT change (scope boundary): - No changes to TTS provider behavior or audio synthesis. - No change to text aggregation semantics (still final-only). ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Fixes #63033 ## User-visible / Behavior Changes - Webchat now delivers agent-generated TTS audio even when the audio reference is emitted via streaming/tool block payloads. - Final-mode TTS fallback after streaming is more reliable when no final reply was actually delivered. ## Security Impact (required) - New permissions/capabilities? (No) - Secrets/tokens handling changed? (No) - New/changed network calls? (No) - Command/tool execution surface changed? (No) - Data access scope changed? (No) ## Repro + Verification ### Environment - OS: Debian/Linux - Runtime/container: local repo test run - Integration/channel: webchat (gateway post-processing) ### Steps 1. Have an embedded run emit TTS audio as a pending tool media flush (block-kind payload). 2. Observe webchat post-processing previously only extracted audio from final-kind payloads. 3. Confirm audio is now embedded/delivered in the final webchat message. ### Expected - Audio is delivered to webchat consistently for agent-initiated TTS. ### Actual (before) - Audio file exists, but webchat never embeds/delivers it. ## Evidence - [x] Added/updated unit tests covering: - `mediaUrls` array audio embedding for webchat audio blocks - Final-mode synthetic TTS triggered after streaming when finals exist but none were delivered ## Human Verification (required) - Verified scenarios: - Targeted tests passing: - `pnpm test src/auto-reply/reply/dispatch-from-config.test.ts src/gateway/server-methods/chat-webchat-media.test.ts` - Edge cases checked: - Keep text aggregation final-only while extracting audio from block+final payloads - What I did **not** verify: - End-to-end web UI playback (relies on existing transcript/audio rendering) ## Compatibility / Migration - Backward compatible? (Yes) - Config/env changes? (No) - Migration needed? (No) ## Risks and Mitigations - Risk: Duplicated audio embedding if both block and final include the same path. - Mitigation: `buildWebchatAudioContentBlocksFromReplyPayloads` already dedupes resolved paths. ## Changed files - `src/auto-reply/reply/dispatch-from-config.test.ts` (modified, +29/-0) - `src/auto-reply/reply/dispatch-from-config.ts` (modified, +2/-7) - `src/gateway/server-methods/chat-webchat-media.test.ts` (modified, +11/-0) - `src/gateway/server-methods/chat.ts` (modified, +5/-1) --- # PR #63514: fix: Fix webchat TTS tool audio delivery - Repository: openclaw/openclaw - Author: bittoby - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/63514 ## Description (problem / solution / changelog) ## Summary - **Problem:** The `tts` tool generates audio successfully, but webchat never delivers it to the client. The `/tts` slash command works fine. - **Why it matters:** Agent-driven TTS is completely broken in webchat. - **What changed:** Added a `isMediaBearingPayload` helper and updated the webchat deliver callback to accept tool results that carry media by promoting them to `"final"` kind. - **What did NOT change:** Non-media tool results are still dropped. Block and final payloads behave the same. Other channels are not touched. ## Change T

openclaw2026-04-08 08:30:47

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#63033•Fetched 2026-04-09 07:59:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Conan-Scott

Participants

Conan-Scott

Timeline (top)

cross-referenced ×2labeled ×1referenced ×1

Agent-initiated TTS audio (via the tts tool or [[tts]] tags) is never delivered to the client on webchat. The /tts audio slash command works correctly. The issue is in the reply delivery pipeline, not the TTS provider or audio generation.

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

// server.impl — webchat dispatch post-processing
const finalPayloads = deliveredReplies
    .filter((entry) => entry.kind === "final")  // ← blocks excluded
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(finalPayloads);

There is a synthetic fallback in dispatch that attempts to TTS the accumulated block text after the run completes:

// dispatch — after embedded run
if (resolveConfiguredTtsMode(cfg) === "final" && replies.length === 0 
    && blockCount > 0 && accumulatedBlockText.trim()) {
    const ttsSyntheticReply = await maybeApplyTtsToReplyPayload({
        payload: { text: accumulatedBlockText },
        cfg, channel: ttsChannel, kind: "final", ...
    });

PR fix notes

PR #63064: fix(webchat): deliver agent TTS audio from block payloads #63033

Repository: openclaw/openclaw
Author: jepson-liu
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/63064

Description (problem / solution / changelog)

Summary

Problem: Agent-generated TTS audio could be produced on disk but never delivered to webchat when it arrived via block payloads (pending tool media flush / streaming paths).
Why it matters: This fully breaks agent-initiated TTS delivery on webchat, while /tts audio continues to work.
What changed:
- Webchat post-processing now extracts embeddable local audio from all delivered reply payloads (block + final), while still aggregating text from final-only payloads.
- The final-mode “accumulated block text → synthetic TTS” fallback now triggers based on whether a final reply was actually delivered, not whether the embedded run produced any final replies.
What did NOT change (scope boundary):
- No changes to TTS provider behavior or audio synthesis.
- No change to text aggregation semantics (still final-only).

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Fixes #63033

User-visible / Behavior Changes

Webchat now delivers agent-generated TTS audio even when the audio reference is emitted via streaming/tool block payloads.
Final-mode TTS fallback after streaming is more reliable when no final reply was actually delivered.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)

Repro + Verification

Environment

OS: Debian/Linux
Runtime/container: local repo test run
Integration/channel: webchat (gateway post-processing)

Steps

Have an embedded run emit TTS audio as a pending tool media flush (block-kind payload).
Observe webchat post-processing previously only extracted audio from final-kind payloads.
Confirm audio is now embedded/delivered in the final webchat message.

Expected

Audio is delivered to webchat consistently for agent-initiated TTS.

Actual (before)

Audio file exists, but webchat never embeds/delivers it.

Evidence

Added/updated unit tests covering:
- mediaUrls array audio embedding for webchat audio blocks
- Final-mode synthetic TTS triggered after streaming when finals exist but none were delivered

Human Verification (required)

Verified scenarios:
- Targeted tests passing:
  - pnpm test src/auto-reply/reply/dispatch-from-config.test.ts src/gateway/server-methods/chat-webchat-media.test.ts
Edge cases checked:
- Keep text aggregation final-only while extracting audio from block+final payloads
What I did not verify:
- End-to-end web UI playback (relies on existing transcript/audio rendering)

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)

Risks and Mitigations

Risk: Duplicated audio embedding if both block and final include the same path.
- Mitigation: buildWebchatAudioContentBlocksFromReplyPayloads already dedupes resolved paths.

Changed files

src/auto-reply/reply/dispatch-from-config.test.ts (modified, +29/-0)
src/auto-reply/reply/dispatch-from-config.ts (modified, +2/-7)
src/gateway/server-methods/chat-webchat-media.test.ts (modified, +11/-0)
src/gateway/server-methods/chat.ts (modified, +5/-1)

PR #63514: fix: Fix webchat TTS tool audio delivery

Repository: openclaw/openclaw
Author: bittoby
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/63514

Description (problem / solution / changelog)

Summary

Problem: The tts tool generates audio successfully, but webchat never delivers it to the client. The /tts slash command works fine.
Why it matters: Agent-driven TTS is completely broken in webchat.
What changed: Added a isMediaBearingPayload helper and updated the webchat deliver callback to accept tool results that carry media by promoting them to "final" kind.
What did NOT change: Non-media tool results are still dropped. Block and final payloads behave the same. Other channels are not touched.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #63033
This PR fixes a bug or regression

Root Cause

Root cause: The webchat deliver callback drops all "tool" kind payloads. TTS tool audio arrives as "tool" kind, so it gets silently discarded. The downstream audio extraction only looks at "final" kind, which makes this a double gate.
Missing detection / guardrail: No test covers media-bearing tool result delivery in webchat.
Contributing context: The /tts slash command bypasses this because it enters the pipeline as "final" kind directly.

Regression Test Plan

Coverage level that should have caught this:
- Seam / integration test
Target test or file: src/gateway/server-methods/chat.ts
Scenario the test should lock in: Tool result with mediaUrl appears in delivered replies as "final".
If no new test is added, why not: Keeping this PR minimal. Test can follow separately.

User-visible / Behavior Changes

TTS tool audio now reaches webchat clients. Previously it was silently dropped.

Diagram

Before:
TTS tool -> deliver(payload, kind:"tool") -> dropped

After:
TTS tool -> deliver(payload, kind:"tool") -> has media? -> promote to "final" -> audio delivered

Security Impact

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: Linux
Runtime: Node 24
Channel: Webchat

Steps

Configure TTS with a working provider
Send a webchat message that makes the agent call the tts tool
Check if audio appears in the response

Expected

Audio delivered to webchat client.

Actual (before fix)

No audio. MP3 exists on disk but never sent.

Evidence

Trace/log snippets

Code trace confirmed: deliver callback received kind: "tool" and returned early. After fix, media-bearing tool payloads enter the case "tool" branch and get promoted to "final".

Human Verification

Verified: Full dispatch path traced from tool execution to webchat audio extraction. Lint, format, and type checks all pass.
Edge cases: Empty mediaUrl, undefined mediaUrls, non-media tool results, BTW reply filtering, text assembly for media-only payloads.
Not verified: Live end-to-end with a running gateway.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Risks and Mitigations

Risk: Non-audio media tool results also get promoted to "final".
- Mitigation: The audio extraction only picks up audio files via isAudioFileName. Non-audio media is ignored. No side effect.

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +20/-0)
src/gateway/server-methods/chat.directive-tags.test.ts (modified, +86/-1)
src/gateway/server-methods/chat.ts (modified, +70/-3)

Code Example

// pi-embedded — handleAgentEnd → flushPendingMediaAndChannel
const pendingToolMediaReply = consumePendingToolMediaReply(ctx.state);
if (pendingToolMediaReply && hasAssistantVisibleReply(pendingToolMediaReply))
    ctx.emitBlockReply(pendingToolMediaReply);  // ← emitted as "block" kind

---

// server.impl — webchat dispatch post-processing
const finalPayloads = deliveredReplies
    .filter((entry) => entry.kind === "final")  // ← blocks excluded
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(finalPayloads);

---

// speech-core/runtime-api.js — maybeApplyTtsToPayload
if ((config.mode ?? "final") === "final" && params.kind && params.kind !== "final")
    return nextPayload;  // ← blocks skip TTS processing entirely

---

// dispatch — after embedded run
if (resolveConfiguredTtsMode(cfg) === "final" && replies.length === 0 
    && blockCount > 0 && accumulatedBlockText.trim()) {
    const ttsSyntheticReply = await maybeApplyTtsToReplyPayload({
        payload: { text: accumulatedBlockText },
        cfg, channel: ttsChannel, kind: "final", ...
    });

---

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Steps to reproduce

Path 1: `tts` tool

Agent calls the tts tool with text
textToSpeech() succeeds, audio file is created on disk (confirmed valid MP3)
Agent replies with NO_REPLY (as instructed by tool description)
No audio is delivered to the webchat client

Path 2: `[[tts]]` tags

Agent includes [[tts]] or [[tts:text]]...[[/tts:text]] tags in a reply
During streaming, raw tags are visible in the UI (not processed)
After the run completes, no audio is delivered
Tags may remain visible as raw text in the final message

Working path: `/tts audio`

User runs /tts audio Hello world
Audio is generated and delivered correctly on both webchat and Telegram

Root Cause Analysis

`tts` tool path

The tts tool returns audio via details.media.mediaUrl. This is picked up by extractToolResultMediaArtifact() and queued in state.pendingToolMediaUrls via queuePendingToolMedia().

At end-of-run, the embedded handler (pi-embedded) flushes pending tool media as a block reply:

// pi-embedded — handleAgentEnd → flushPendingMediaAndChannel
const pendingToolMediaReply = consumePendingToolMediaReply(ctx.state);
if (pendingToolMediaReply && hasAssistantVisibleReply(pendingToolMediaReply))
    ctx.emitBlockReply(pendingToolMediaReply);  // ← emitted as "block" kind

However, the webchat delivery path only extracts audio content blocks from final-kind payloads:

// server.impl — webchat dispatch post-processing
const finalPayloads = deliveredReplies
    .filter((entry) => entry.kind === "final")  // ← blocks excluded
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(finalPayloads);

The block payload containing the audio file path is collected by the deliver callback but never processed for audio extraction. The audio file sits on disk, unused.

`[[tts]]` tag path

The auto-TTS mode defaults to "final", which skips block-kind replies:

// speech-core/runtime-api.js — maybeApplyTtsToPayload
if ((config.mode ?? "final") === "final" && params.kind && params.kind !== "final")
    return nextPayload;  // ← blocks skip TTS processing entirely

For streaming models, agent text is delivered incrementally as block replies — so [[tts]] tags pass through unprocessed during streaming.

There is a synthetic fallback in dispatch that attempts to TTS the accumulated block text after the run completes:

// dispatch — after embedded run
if (resolveConfiguredTtsMode(cfg) === "final" && replies.length === 0 
    && blockCount > 0 && accumulatedBlockText.trim()) {
    const ttsSyntheticReply = await maybeApplyTtsToReplyPayload({
        payload: { text: accumulatedBlockText },
        cfg, channel: ttsChannel, kind: "final", ...
    });

But this only fires when replies.length === 0 (no final replies returned from the embedded run). Whether the embedded run returns final replies depends on the model/provider implementation, making this path unreliable.

Expected behavior

Audio generated by the agent (via tool or tags) should be delivered to the client, matching the behavior of /tts audio.

Actual behavior

No TTS message is delivered (tested channels: web, telegram)

OpenClaw version

2026.3.7/2026.3.8

Operating system

"Debian GNU/Linux 12 (bookworm)"

Install method

docker

Model

claude-opus-4.6/gpt-codex-5.4

Provider / routing chain

openclaw -> telegram/web

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: All TTS users Severity: complete feature disablement Frequency: always

Additional information

No response

extent analysis

TL;DR

The issue can be fixed by modifying the webchat delivery path to extract audio content blocks from both "final" and "block" kind payloads.

Guidance

Identify the server.impl file and update the webchat dispatch post-processing function to include "block" kind payloads when extracting audio content blocks.
Review the speech-core/runtime-api.js file and consider updating the maybeApplyTtsToPayload function to process block-kind replies when the auto-TTS mode is set to "final".
Verify that the dispatch function's synthetic fallback for TTS processing is working correctly and firing when expected.
Test the updated code with both the tts tool and [[tts]] tags to ensure audio is delivered correctly to the client.

Example

// server.impl — webchat dispatch post-processing (updated)
const payloads = deliveredReplies
    .filter((entry) => entry.kind === "final" || entry.kind === "block")
    .map((entry) => entry.payload);
const audioBlocks = buildWebchatAudioContentBlocksFromReplyPayloads(payloads);

Notes

The provided code snippets and analysis suggest that the issue lies in the way audio content blocks are extracted and processed for webchat delivery. However, without more information about the specific requirements and constraints of the project, it's difficult to provide a more detailed solution.

Recommendation

Apply a workaround by updating the server.impl file to include "block" kind payloads when extracting audio content blocks, as this is the most direct way to address the issue. This change should allow audio generated by the agent to be delivered to the client, matching the behavior of /tts audio.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Audio generated by the agent (via tool or tags) should be delivered to the client, matching the behavior of /tts audio.

#api #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Agent-sent TTS audio not delivered via webchat — block vs final payload mismatch [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause Analysis

Fix Action

Fix / Workaround

PR fix notes

PR #63064: fix(webchat): deliver agent TTS audio from block payloads #63033

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

User-visible / Behavior Changes

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual (before)

Evidence

Human Verification (required)

Compatibility / Migration

Risks and Mitigations

Changed files

PR #63514: fix: Fix webchat TTS tool audio delivery

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause

Regression Test Plan

User-visible / Behavior Changes

Diagram

Security Impact

Repro + Verification

Environment

Steps

Expected

Actual (before fix)

Evidence

Human Verification

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Path 1: tts tool

Path 2: [[tts]] tags

Working path: /tts audio

Root Cause Analysis

tts tool path

[[tts]] tag path

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

Path 1: `tts` tool

Path 2: `[[tts]]` tags

Working path: `/tts audio`

`tts` tool path

`[[tts]]` tag path