openclaw - ✅(Solved) Fix [Bug]: TTS directives ( + [[tts:text]]) silently dropped on claude-cli backend; no voice delivere [1 pull requests, 1 comments, 2 participants]

openclaw2026-04-28 19:13:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#73758•Fetched 2026-04-29 06:15:29

View on GitHub

Comments

Participants

Timeline

Reactions

Author

asidko

Participants

asidko

clawsweeper[bot]

Timeline (top)

labeled ×2commented ×1cross-referenced ×1

On openclaw 2026.4.26 with the claude-cli backend, model output containing and [[tts:text]]…[[/tts:text]] is parsed but never synthesized, so Telegram receives no voice message and the reply is dropped.

Error Message

silently broken with no user-visible error (only an internal "skipping"

Root Cause

Last known good route: openclaw -> openai-codex (and earlier sessions on 2026-04-26 routed through the ACP TTS dispatch path successfully produced ElevenLabs voice notes from the same [[tts:text]] / markup). First known bad route: openclaw -> claude-cli (claude-cli/claude-opus-4-7) on build 2026.4.26 (gitSha 3e94bd9). Last known good build for claude-cli specifically: NOT_ENOUGH_INFO — the TTS path on this backend has not been observed working in the available evidence; cannot identify a prior build where it succeeded. Workaround: switch the active model away from claude-cli/* (e.g. openclaw models set openai-codex/gpt-5.5) before relying on inline TTS markup. Direct ElevenLabs synthesis via openclaw tts convert ... (or curl) is unaffected. Likely root cause hypothesis (not confirmed): the [[tts:text]] auto-synthesis pipeline is wired through dist/dispatch-acp-tts.runtime.js and reachable from the ACP / qqbot dispatchers, but the claude-cli adapter does not invoke it before delivery, so audioAsVoice arrives at the Telegram sender with no media attached and is dropped at delivery-BmuKO0Rm.js:577.

Fix Action

Fix / Workaround

Earlier sessions in this same workspace (2026-04-26, when traffic was routed through the ACP TTS dispatch path) delivered voice messages from identical [[tts:text]] / markup. The same markup should produce a synthesized ElevenLabs voice note when emitted on the claude-cli backend.

Default model is openai-codex/gpt-5.5; this session was switched to claude-cli/claude-opus-4-7 via the runtime model switcher.
claude-cli backend uses /home/user/.local/bin/claude with --output-format json and existing --session-id passthrough.
TTS config in ~/.openclaw/openclaw.json under messages.tts: provider=elevenlabs, auto="tagged", mode="final", modelId="eleven_multilingual_v2", voiceId=<redacted>, apiKey=<redacted>.
A dist/dispatch-acp-tts.runtime.js exists, suggesting TTS auto-synthesis on the [[tts:text]] tag is wired through the ACP dispatcher; no equivalent hookup appears reachable from the claude-cli adapter, which seems to be the root of the dropped synthesis.
No issue with the elevenlabs extension itself: direct synthesis (curl to api.elevenlabs.io/v1/text-to-speech/<voiceId>) succeeds.

PR fix notes

PR #73911: fix(tts): honor short explicit tagged speech text

Repository: openclaw/openclaw
Author: yfge
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/73911

Description (problem / solution / changelog)

Summary

honor explicitly tagged [[tts:text]]...[[/tts:text]] content even when it is short
keep the existing short-text guard for untagged auto-TTS replies
add regression coverage for both tagged and untagged short text

Closes #73758

Testing

pnpm test extensions/speech-core/src/tts.test.ts
pnpm check:changed

Changed files

extensions/speech-core/src/tts.test.ts (modified, +63/-0)
extensions/speech-core/src/tts.ts (modified, +7/-3)

Code Example

Citations from the installed runtime (paths under
plugin-runtime-deps/openclaw-2026.4.26-4eca5026e977/dist/):
- parse-DbkqxPau.js — parseAudioTag() recognizes ,
 returns { audioAsVoice, hadTag }.
- directives-CUHtjAI4.js — recognizes [[tts:text]] / [[/tts:text]] as
 hidden-open / hidden-close, captures inner content into overrides.ttsText.
- extensions/telegram/delivery-BmuKO0Rm.js:374 — wantsVoice =
 reply.audioAsVoice === true.
- extensions/telegram/delivery-BmuKO0Rm.js:576-577 — when audioAsVoice is
 set without media/text it logs:
 "telegram reply has audioAsVoice without media/text; skipping"
 and the reply is dropped.
- Session trajectory for run ad97ce77-575f-42cb-a9e3-ec36f7e9b2c5 contains
 4 occurrences of [[tts:text]] /  in model output but no
 trace events with type containing "tts".
External confirmation (TTS provider is healthy):

$ curl -sS -X POST \
 "https://api.elevenlabs.io/v1/text-to-speech/<voiceId>?output_format=mp3_44100_64" \
 -H "xi-api-key: <redacted>" -H "Content-Type: application/json" \
 -d '{"text":"тест","model_id":"eleven_multilingual_v2"}' \
 -o /tmp/tts-test.mp3 -w "HTTP %{http_code}\n"
HTTP 200
$ file /tmp/tts-test.mp3
/tmp/tts-test.mp3: Audio file with ID3 v2.4.0, MPEG ADTS layer III v1, 64 kbps, 44.1 kHz, mono

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Steps to reproduce

Configure messages.tts in ~/.openclaw/openclaw.json with provider=elevenlabs, auto="tagged", a valid ElevenLabs apiKey + voiceId, modelId="eleven_multilingual_v2".
Set the active model to claude-cli/claude-opus-4-7 (openclaw models set claude-cli/claude-opus-4-7).
From a Telegram chat, send any prompt that causes the assistant to reply with:

[[tts:text]]hello[[/tts:text]] 4. Observe that no voice message is delivered.

Expected behavior

Actual behavior

Directive parser (dist/parse-DbkqxPau.js, dist/directives-CUHtjAI4.js) recognizes both (sets audioAsVoice=true) and [[tts:text]]…[[/tts:text]] (captured into overrides.ttsText).
No tts.* trace events appear in the session trajectory (.openclaw/agents/main/sessions/<sid>.trajectory.jsonl) — TTS synthesis is never invoked.
Telegram delivery logs the skip path from dist/extensions/telegram/delivery-BmuKO0Rm.js:577 "telegram reply has audioAsVoice without media/text; skipping".
Direct curl to ElevenLabs with the same API key + voiceId returns HTTP 200 and a valid MP3, so the credentials and provider path are healthy.

OpenClaw version

2026.4.26 (gitSha 3e94bd9)

Operating system

Ubuntu 24.04

Install method

npm global

Model

claude-opus-4-7

Provider / routing chain

claude-cli/claude-opus-4-7

Additional provider/model setup details

Default model is openai-codex/gpt-5.5; this session was switched to claude-cli/claude-opus-4-7 via the runtime model switcher.
claude-cli backend uses /home/user/.local/bin/claude with --output-format json and existing --session-id passthrough.
TTS config in ~/.openclaw/openclaw.json under messages.tts: provider=elevenlabs, auto="tagged", mode="final", modelId="eleven_multilingual_v2", voiceId=<redacted>, apiKey=<redacted>.
A dist/dispatch-acp-tts.runtime.js exists, suggesting TTS auto-synthesis on the [[tts:text]] tag is wired through the ACP dispatcher; no equivalent hookup appears reachable from the claude-cli adapter, which seems to be the root of the dropped synthesis.
No issue with the elevenlabs extension itself: direct synthesis (curl to api.elevenlabs.io/v1/text-to-speech/<voiceId>) succeeds.

Logs, screenshots, and evidence

Citations from the installed runtime (paths under
plugin-runtime-deps/openclaw-2026.4.26-4eca5026e977/dist/):
- parse-DbkqxPau.js — parseAudioTag() recognizes ,
 returns { audioAsVoice, hadTag }.
- directives-CUHtjAI4.js — recognizes [[tts:text]] / [[/tts:text]] as
 hidden-open / hidden-close, captures inner content into overrides.ttsText.
- extensions/telegram/delivery-BmuKO0Rm.js:374 — wantsVoice =
 reply.audioAsVoice === true.
- extensions/telegram/delivery-BmuKO0Rm.js:576-577 — when audioAsVoice is
 set without media/text it logs:
 "telegram reply has audioAsVoice without media/text; skipping"
 and the reply is dropped.
- Session trajectory for run ad97ce77-575f-42cb-a9e3-ec36f7e9b2c5 contains
 4 occurrences of [[tts:text]] /  in model output but no
 trace events with type containing "tts".
External confirmation (TTS provider is healthy):

$ curl -sS -X POST \
 "https://api.elevenlabs.io/v1/text-to-speech/<voiceId>?output_format=mp3_44100_64" \
 -H "xi-api-key: <redacted>" -H "Content-Type: application/json" \
 -d '{"text":"тест","model_id":"eleven_multilingual_v2"}' \
 -o /tmp/tts-test.mp3 -w "HTTP %{http_code}\n"
HTTP 200
$ file /tmp/tts-test.mp3
/tmp/tts-test.mp3: Audio file with ID3 v2.4.0, MPEG ADTS layer III v1, 64 kbps, 44.1 kHz, mono

Impact and severity

Affects: any workspace that routes assistant output through the claude-cli backend and relies on inline TTS markup.
Severity: medium — text replies still work, but the voice channel is silently broken with no user-visible error (only an internal "skipping" log line). Users see a plain-text reply where a voice note was expected.
Frequency: 100% reproducible while the active model is claude-cli/*.
Practical consequence: the auto="tagged" TTS flow is unusable on the claude-cli backend, forcing users to switch to openai-codex / ACP routes to get voice output.

Additional information

extent analysis

TL;DR

The most likely fix is to modify the claude-cli backend to invoke the TTS synthesis pipeline before delivery, ensuring that audioAsVoice is accompanied by the synthesized media.

Guidance

Investigate the dist/dispatch-acp-tts.runtime.js file to understand how the TTS auto-synthesis pipeline is wired through the ACP dispatcher.
Check the claude-cli adapter code to see why it does not invoke the TTS synthesis pipeline before delivery.
Consider adding a temporary workaround to switch the active model away from claude-cli/* when relying on inline TTS markup.
Verify that the elevenlabs extension is properly configured and that direct synthesis via curl or openclaw tts convert works as expected.

Example

No code snippet is provided as the issue requires investigation of the existing codebase.

Notes

The root cause of the issue is likely related to the claude-cli adapter not invoking the TTS synthesis pipeline, but further investigation is needed to confirm this hypothesis.

Recommendation

Apply a workaround by switching the active model away from claude-cli/* when relying on inline TTS markup, as this has been confirmed to work in the past.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #model download #tokenizer error #prompt formatting #chain error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: TTS directives ( + [[tts:text]]) silently dropped on claude-cli backend; no voice delivere [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #73911: fix(tts): honor short explicit tagged speech text

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING