openclaw - 💡(How to fix) Fix Auto-TTS: add voice-only delivery mode + improve system prompt for natural speech output [1 participants]

openclaw2026-04-28 03:24:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#73210•Fetched 2026-04-29 06:22:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

altierac

Participants

altierac

Timeline (top)

renamed ×1

Root Cause

When the model uses the tts tool explicitly, it naturally writes conversationally because it knows it's producing speech. But in auto-TTS mode, the model thinks it's writing text and OpenClaw silently converts it — producing robotic, unnatural voice messages.

Code Example

Voice (TTS) is enabled.
Keep spoken text ≤1500 chars to avoid auto-summary (summary on).
Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.

---

Voice (TTS) is enabled — your replies will be spoken aloud as voice messages.
Write conversationally for speech: use natural sentences instead of bullet points,
spell out numbers and times ("five thirty" not "5:30"), avoid code blocks, URLs,
and markdown formatting. If technical details are needed, use the message tool
to send them as a separate text follow-up.

---

{
  messages: {
    tts: {
      auto: "always",
      textDelivery: "suppress",  // "normal" (default) | "suppress" | "caption"
    },
  },
}

RAW_BUFFERClick to expand / collapse

Feature Request

Problem 1: No voice-only delivery mode

When TTS is enabled (messages.tts.auto: "always" / /tts on), OpenClaw sends both a voice message and the text reply on channels that support voice notes (Telegram, WhatsApp, Matrix, Feishu). There is no way to suppress the text and deliver voice-only.

Use cases for voice-only:

Privacy — when using voice messages in public or around other people, visible text on the chat screen exposes the conversation content to bystanders. Voice-only lets you listen with earbuds without people nearby seeing what you're doing.
Cleaner UX — double delivery (voice + text) clutters the chat. Many users who enable TTS prefer voice as the primary medium, not as an add-on to text.
Parity with human behavior — when humans send voice messages on Telegram/WhatsApp, there is no accompanying text transcript. Agent replies should be able to match that pattern.

Note: the explicit tts tool path already handles this correctly since #70092 — audioAsVoice: true suppresses duplicate text. This request is specifically about the auto-TTS path (/tts on).

Problem 2: System prompt hint is too weak for natural speech

When auto-TTS is active, buildTtsSystemPromptHint() injects this into the system prompt:

Voice (TTS) is enabled.
Keep spoken text ≤1500 chars to avoid auto-summary (summary on).
Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.

This tells the model about char limits and directives, but nothing about writing for speech. The model has no idea its text output will be spoken aloud, so it produces text-optimized replies: bullet points, code blocks, cron expressions, URLs, markdown formatting. These sound terrible when read by TTS ("cron thirty five star star star at Europe slash Amsterdam").

Suggested improvement: The system prompt hint should include guidance like:

Voice (TTS) is enabled — your replies will be spoken aloud as voice messages.
Write conversationally for speech: use natural sentences instead of bullet points,
spell out numbers and times ("five thirty" not "5:30"), avoid code blocks, URLs,
and markdown formatting. If technical details are needed, use the message tool
to send them as a separate text follow-up.

This is essentially what issue #22028 ("Spoken rewrite pass for TTS") proposed before it went stale.

Proposed Solution for Problem 1

Add a config option under messages.tts to control text delivery when TTS audio is successfully generated:

{
  messages: {
    tts: {
      auto: "always",
      textDelivery: "suppress",  // "normal" (default) | "suppress" | "caption"
    },
  },
}

"normal" (default): current behavior — text is sent alongside voice (as caption or separate message)
"suppress": when TTS audio succeeds, do not deliver the text reply. If TTS fails, fall back to text.
"caption": send text only as a voice message caption (Telegram supports up to 1024 chars), skip the separate text message for overflow

This could also be exposed via /tts text off as a local preference, similar to /tts summary off.

Channel Considerations

Telegram: voice messages support captions (up to 1024 chars). "suppress" would skip the caption entirely; "caption" would keep it but not send overflow as a separate message.
WhatsApp: already sends text separately from PTT audio (clients don't render captions on voice notes), so "suppress" would skip that separate text send.
Other channels: where voice notes are not supported and audio goes as file attachment, "suppress" should still send text as context.

extent analysis

TL;DR

To address the issue of no voice-only delivery mode when TTS is enabled, add a textDelivery config option under messages.tts to control text delivery when TTS audio is successfully generated.

Guidance

Add a textDelivery option to the messages.tts config with possible values: "normal", "suppress", or "caption".
Set textDelivery to "suppress" to prevent text from being sent alongside voice messages when TTS audio is successfully generated.
Consider exposing this option via a /tts text off command as a local preference.
Update the system prompt hint to guide the model to write conversationally for speech when auto-TTS is active.

Example

{
  messages: {
    tts: {
      auto: "always",
      textDelivery: "suppress"
    },
  },
}

Notes

The proposed solution only addresses the issue of voice-only delivery mode and does not cover the suggested improvement to the system prompt hint. Implementing the textDelivery option will require consideration of channel-specific behaviors, such as Telegram's support for voice message captions.

Recommendation

Apply the proposed solution by adding the textDelivery config option to control text delivery when TTS audio is successfully generated, as it provides a flexible way to handle voice-only delivery mode across different channels.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Auto-TTS: add voice-only delivery mode + improve system prompt for natural speech output [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Feature Request

Problem 1: No voice-only delivery mode

Problem 2: System prompt hint is too weak for natural speech

Proposed Solution for Problem 1

Channel Considerations

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Auto-TTS: add voice-only delivery mode + improve system prompt for natural speech output [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Feature Request

Problem 1: No voice-only delivery mode

Problem 2: System prompt hint is too weak for natural speech

Proposed Solution for Problem 1

Channel Considerations

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING