openclaw - ✅(Solved) Fix [Bug]: TTS directives ( + [[tts:text]]) silently dropped on claude-cli backend; no voice delivere [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73758Fetched 2026-04-29 06:15:29
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

On openclaw 2026.4.26 with the claude-cli backend, model output containing and [[tts:text]]…[[/tts:text]] is parsed but never synthesized, so Telegram receives no voice message and the reply is dropped.

Error Message

silently broken with no user-visible error (only an internal "skipping"

Root Cause

Last known good route: openclaw -> openai-codex (and earlier sessions on 2026-04-26 routed through the ACP TTS dispatch path successfully produced ElevenLabs voice notes from the same [[tts:text]] / markup). First known bad route: openclaw -> claude-cli (claude-cli/claude-opus-4-7) on build 2026.4.26 (gitSha 3e94bd9). Last known good build for claude-cli specifically: NOT_ENOUGH_INFO — the TTS path on this backend has not been observed working in the available evidence; cannot identify a prior build where it succeeded. Workaround: switch the active model away from claude-cli/* (e.g. openclaw models set openai-codex/gpt-5.5) before relying on inline TTS markup. Direct ElevenLabs synthesis via openclaw tts convert ... (or curl) is unaffected. Likely root cause hypothesis (not confirmed): the [[tts:text]] auto-synthesis pipeline is wired through dist/dispatch-acp-tts.runtime.js and reachable from the ACP / qqbot dispatchers, but the claude-cli adapter does not invoke it before delivery, so audioAsVoice arrives at the Telegram sender with no media attached and is dropped at delivery-BmuKO0Rm.js:577.

Fix Action

Fix / Workaround

Earlier sessions in this same workspace (2026-04-26, when traffic was routed through the ACP TTS dispatch path) delivered voice messages from identical [[tts:text]] / markup. The same markup should produce a synthesized ElevenLabs voice note when emitted on the claude-cli backend.

  • Default model is openai-codex/gpt-5.5; this session was switched to claude-cli/claude-opus-4-7 via the runtime model switcher.
  • claude-cli backend uses /home/user/.local/bin/claude with --output-format json and existing --session-id passthrough.
  • TTS config in ~/.openclaw/openclaw.json under messages.tts: provider=elevenlabs, auto="tagged", mode="final", modelId="eleven_multilingual_v2", voiceId=<redacted>, apiKey=<redacted>.
  • A dist/dispatch-acp-tts.runtime.js exists, suggesting TTS auto-synthesis on the [[tts:text]] tag is wired through the ACP dispatcher; no equivalent hookup appears reachable from the claude-cli adapter, which seems to be the root of the dropped synthesis.
  • No issue with the elevenlabs extension itself: direct synthesis (curl to api.elevenlabs.io/v1/text-to-speech/<voiceId>) succeeds.

Last known good route: openclaw -> openai-codex (and earlier sessions on 2026-04-26 routed through the ACP TTS dispatch path successfully produced ElevenLabs voice notes from the same [[tts:text]] / markup). First known bad route: openclaw -> claude-cli (claude-cli/claude-opus-4-7) on build 2026.4.26 (gitSha 3e94bd9). Last known good build for claude-cli specifically: NOT_ENOUGH_INFO — the TTS path on this backend has not been observed working in the available evidence; cannot identify a prior build where it succeeded. Workaround: switch the active model away from claude-cli/* (e.g. openclaw models set openai-codex/gpt-5.5) before relying on inline TTS markup. Direct ElevenLabs synthesis via openclaw tts convert ... (or curl) is unaffected. Likely root cause hypothesis (not confirmed): the [[tts:text]] auto-synthesis pipeline is wired through dist/dispatch-acp-tts.runtime.js and reachable from the ACP / qqbot dispatchers, but the claude-cli adapter does not invoke it before delivery, so audioAsVoice arrives at the Telegram sender with no media attached and is dropped at delivery-BmuKO0Rm.js:577.

PR fix notes

PR #73911: fix(tts): honor short explicit tagged speech text

Description (problem / solution / changelog)

Summary

  • honor explicitly tagged [[tts:text]]...[[/tts:text]] content even when it is short
  • keep the existing short-text guard for untagged auto-TTS replies
  • add regression coverage for both tagged and untagged short text

Closes #73758

Testing

  • pnpm test extensions/speech-core/src/tts.test.ts
  • pnpm check:changed

Changed files

  • extensions/speech-core/src/tts.test.ts (modified, +63/-0)
  • extensions/speech-core/src/tts.ts (modified, +7/-3)

Code Example

Citations from the installed runtime (paths under
plugin-runtime-deps/openclaw-2026.4.26-4eca5026e977/dist/):
- parse-DbkqxPau.jsparseAudioTag() recognizes ,
 returns { audioAsVoice, hadTag }.
- directives-CUHtjAI4.js — recognizes [[tts:text]] / [[/tts:text]] as
 hidden-open / hidden-close, captures inner content into overrides.ttsText.
- extensions/telegram/delivery-BmuKO0Rm.js:374 — wantsVoice =
 reply.audioAsVoice === true.
- extensions/telegram/delivery-BmuKO0Rm.js:576-577 — when audioAsVoice is
 set without media/text it logs:
 "telegram reply has audioAsVoice without media/text; skipping"
 and the reply is dropped.
- Session trajectory for run ad97ce77-575f-42cb-a9e3-ec36f7e9b2c5 contains
 4 occurrences of [[tts:text]] /  in model output but no
 trace events with type containing "tts".
External confirmation (TTS provider is healthy):

$ curl -sS -X POST \
 "https://api.elevenlabs.io/v1/text-to-speech/<voiceId>?output_format=mp3_44100_64" \
 -H "xi-api-key: <redacted>" -H "Content-Type: application/json" \
 -d '{"text":"тест","model_id":"eleven_multilingual_v2"}' \
 -o /tmp/tts-test.mp3 -w "HTTP %{http_code}\n"
HTTP 200
$ file /tmp/tts-test.mp3
/tmp/tts-test.mp3: Audio file with ID3 v2.4.0, MPEG ADTS layer III v1, 64 kbps, 44.1 kHz, mono
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

On openclaw 2026.4.26 with the claude-cli backend, model output containing and [[tts:text]]…[[/tts:text]] is parsed but never synthesized, so Telegram receives no voice message and the reply is dropped.

Steps to reproduce

  1. Configure messages.tts in ~/.openclaw/openclaw.json with provider=elevenlabs, auto="tagged", a valid ElevenLabs apiKey + voiceId, modelId="eleven_multilingual_v2".
  2. Set the active model to claude-cli/claude-opus-4-7 (openclaw models set claude-cli/claude-opus-4-7).
  3. From a Telegram chat, send any prompt that causes the assistant to reply with:

[[tts:text]]hello[[/tts:text]] 4. Observe that no voice message is delivered.

Expected behavior

Earlier sessions in this same workspace (2026-04-26, when traffic was routed through the ACP TTS dispatch path) delivered voice messages from identical [[tts:text]] / markup. The same markup should produce a synthesized ElevenLabs voice note when emitted on the claude-cli backend.

Actual behavior

  • Directive parser (dist/parse-DbkqxPau.js, dist/directives-CUHtjAI4.js) recognizes both (sets audioAsVoice=true) and [[tts:text]]…[[/tts:text]] (captured into overrides.ttsText).
  • No tts.* trace events appear in the session trajectory (.openclaw/agents/main/sessions/<sid>.trajectory.jsonl) — TTS synthesis is never invoked.
  • Telegram delivery logs the skip path from dist/extensions/telegram/delivery-BmuKO0Rm.js:577 "telegram reply has audioAsVoice without media/text; skipping".
  • Direct curl to ElevenLabs with the same API key + voiceId returns HTTP 200 and a valid MP3, so the credentials and provider path are healthy.

OpenClaw version

2026.4.26 (gitSha 3e94bd9)

Operating system

Ubuntu 24.04

Install method

npm global

Model

claude-opus-4-7

Provider / routing chain

claude-cli/claude-opus-4-7

Additional provider/model setup details

  • Default model is openai-codex/gpt-5.5; this session was switched to claude-cli/claude-opus-4-7 via the runtime model switcher.
  • claude-cli backend uses /home/user/.local/bin/claude with --output-format json and existing --session-id passthrough.
  • TTS config in ~/.openclaw/openclaw.json under messages.tts: provider=elevenlabs, auto="tagged", mode="final", modelId="eleven_multilingual_v2", voiceId=<redacted>, apiKey=<redacted>.
  • A dist/dispatch-acp-tts.runtime.js exists, suggesting TTS auto-synthesis on the [[tts:text]] tag is wired through the ACP dispatcher; no equivalent hookup appears reachable from the claude-cli adapter, which seems to be the root of the dropped synthesis.
  • No issue with the elevenlabs extension itself: direct synthesis (curl to api.elevenlabs.io/v1/text-to-speech/<voiceId>) succeeds.

Logs, screenshots, and evidence

Citations from the installed runtime (paths under
plugin-runtime-deps/openclaw-2026.4.26-4eca5026e977/dist/):
- parse-DbkqxPau.js — parseAudioTag() recognizes ,
 returns { audioAsVoice, hadTag }.
- directives-CUHtjAI4.js — recognizes [[tts:text]] / [[/tts:text]] as
 hidden-open / hidden-close, captures inner content into overrides.ttsText.
- extensions/telegram/delivery-BmuKO0Rm.js:374 — wantsVoice =
 reply.audioAsVoice === true.
- extensions/telegram/delivery-BmuKO0Rm.js:576-577 — when audioAsVoice is
 set without media/text it logs:
 "telegram reply has audioAsVoice without media/text; skipping"
 and the reply is dropped.
- Session trajectory for run ad97ce77-575f-42cb-a9e3-ec36f7e9b2c5 contains
 4 occurrences of [[tts:text]] /  in model output but no
 trace events with type containing "tts".
External confirmation (TTS provider is healthy):

$ curl -sS -X POST \
 "https://api.elevenlabs.io/v1/text-to-speech/<voiceId>?output_format=mp3_44100_64" \
 -H "xi-api-key: <redacted>" -H "Content-Type: application/json" \
 -d '{"text":"тест","model_id":"eleven_multilingual_v2"}' \
 -o /tmp/tts-test.mp3 -w "HTTP %{http_code}\n"
HTTP 200
$ file /tmp/tts-test.mp3
/tmp/tts-test.mp3: Audio file with ID3 v2.4.0, MPEG ADTS layer III v1, 64 kbps, 44.1 kHz, mono

Impact and severity

  • Affects: any workspace that routes assistant output through the claude-cli backend and relies on inline TTS markup.
  • Severity: medium — text replies still work, but the voice channel is silently broken with no user-visible error (only an internal "skipping" log line). Users see a plain-text reply where a voice note was expected.
  • Frequency: 100% reproducible while the active model is claude-cli/*.
  • Practical consequence: the auto="tagged" TTS flow is unusable on the claude-cli backend, forcing users to switch to openai-codex / ACP routes to get voice output.

Additional information

Last known good route: openclaw -> openai-codex (and earlier sessions on 2026-04-26 routed through the ACP TTS dispatch path successfully produced ElevenLabs voice notes from the same [[tts:text]] / markup). First known bad route: openclaw -> claude-cli (claude-cli/claude-opus-4-7) on build 2026.4.26 (gitSha 3e94bd9). Last known good build for claude-cli specifically: NOT_ENOUGH_INFO — the TTS path on this backend has not been observed working in the available evidence; cannot identify a prior build where it succeeded. Workaround: switch the active model away from claude-cli/* (e.g. openclaw models set openai-codex/gpt-5.5) before relying on inline TTS markup. Direct ElevenLabs synthesis via openclaw tts convert ... (or curl) is unaffected. Likely root cause hypothesis (not confirmed): the [[tts:text]] auto-synthesis pipeline is wired through dist/dispatch-acp-tts.runtime.js and reachable from the ACP / qqbot dispatchers, but the claude-cli adapter does not invoke it before delivery, so audioAsVoice arrives at the Telegram sender with no media attached and is dropped at delivery-BmuKO0Rm.js:577.

extent analysis

TL;DR

The most likely fix is to modify the claude-cli backend to invoke the TTS synthesis pipeline before delivery, ensuring that audioAsVoice is accompanied by the synthesized media.

Guidance

  • Investigate the dist/dispatch-acp-tts.runtime.js file to understand how the TTS auto-synthesis pipeline is wired through the ACP dispatcher.
  • Check the claude-cli adapter code to see why it does not invoke the TTS synthesis pipeline before delivery.
  • Consider adding a temporary workaround to switch the active model away from claude-cli/* when relying on inline TTS markup.
  • Verify that the elevenlabs extension is properly configured and that direct synthesis via curl or openclaw tts convert works as expected.

Example

No code snippet is provided as the issue requires investigation of the existing codebase.

Notes

The root cause of the issue is likely related to the claude-cli adapter not invoking the TTS synthesis pipeline, but further investigation is needed to confirm this hypothesis.

Recommendation

Apply a workaround by switching the active model away from claude-cli/* when relying on inline TTS markup, as this has been confirmed to work in the past.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Earlier sessions in this same workspace (2026-04-26, when traffic was routed through the ACP TTS dispatch path) delivered voice messages from identical [[tts:text]] / markup. The same markup should produce a synthesized ElevenLabs voice note when emitted on the claude-cli backend.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING