openclaw - 💡(How to fix) Fix Auto-TTS: add voice-only delivery mode + improve system prompt for natural speech output [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73210Fetched 2026-04-29 06:22:10
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
renamed ×1

Root Cause

When the model uses the tts tool explicitly, it naturally writes conversationally because it knows it's producing speech. But in auto-TTS mode, the model thinks it's writing text and OpenClaw silently converts it — producing robotic, unnatural voice messages.

Code Example

Voice (TTS) is enabled.
Keep spoken text ≤1500 chars to avoid auto-summary (summary on).
Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.

---

Voice (TTS) is enabled — your replies will be spoken aloud as voice messages.
Write conversationally for speech: use natural sentences instead of bullet points,
spell out numbers and times ("five thirty" not "5:30"), avoid code blocks, URLs,
and markdown formatting. If technical details are needed, use the message tool
to send them as a separate text follow-up.

---

{
  messages: {
    tts: {
      auto: "always",
      textDelivery: "suppress",  // "normal" (default) | "suppress" | "caption"
    },
  },
}
RAW_BUFFERClick to expand / collapse

Feature Request

Problem 1: No voice-only delivery mode

When TTS is enabled (messages.tts.auto: "always" / /tts on), OpenClaw sends both a voice message and the text reply on channels that support voice notes (Telegram, WhatsApp, Matrix, Feishu). There is no way to suppress the text and deliver voice-only.

Use cases for voice-only:

  1. Privacy — when using voice messages in public or around other people, visible text on the chat screen exposes the conversation content to bystanders. Voice-only lets you listen with earbuds without people nearby seeing what you're doing.
  2. Cleaner UX — double delivery (voice + text) clutters the chat. Many users who enable TTS prefer voice as the primary medium, not as an add-on to text.
  3. Parity with human behavior — when humans send voice messages on Telegram/WhatsApp, there is no accompanying text transcript. Agent replies should be able to match that pattern.

Note: the explicit tts tool path already handles this correctly since #70092 — audioAsVoice: true suppresses duplicate text. This request is specifically about the auto-TTS path (/tts on).

Problem 2: System prompt hint is too weak for natural speech

When auto-TTS is active, buildTtsSystemPromptHint() injects this into the system prompt:

Voice (TTS) is enabled.
Keep spoken text ≤1500 chars to avoid auto-summary (summary on).
Use [[tts:...]] and optional [[tts:text]]...[[/tts:text]] to control voice/expressiveness.

This tells the model about char limits and directives, but nothing about writing for speech. The model has no idea its text output will be spoken aloud, so it produces text-optimized replies: bullet points, code blocks, cron expressions, URLs, markdown formatting. These sound terrible when read by TTS ("cron thirty five star star star at Europe slash Amsterdam").

When the model uses the tts tool explicitly, it naturally writes conversationally because it knows it's producing speech. But in auto-TTS mode, the model thinks it's writing text and OpenClaw silently converts it — producing robotic, unnatural voice messages.

Suggested improvement: The system prompt hint should include guidance like:

Voice (TTS) is enabled — your replies will be spoken aloud as voice messages.
Write conversationally for speech: use natural sentences instead of bullet points,
spell out numbers and times ("five thirty" not "5:30"), avoid code blocks, URLs,
and markdown formatting. If technical details are needed, use the message tool
to send them as a separate text follow-up.

This is essentially what issue #22028 ("Spoken rewrite pass for TTS") proposed before it went stale.

Proposed Solution for Problem 1

Add a config option under messages.tts to control text delivery when TTS audio is successfully generated:

{
  messages: {
    tts: {
      auto: "always",
      textDelivery: "suppress",  // "normal" (default) | "suppress" | "caption"
    },
  },
}
  • "normal" (default): current behavior — text is sent alongside voice (as caption or separate message)
  • "suppress": when TTS audio succeeds, do not deliver the text reply. If TTS fails, fall back to text.
  • "caption": send text only as a voice message caption (Telegram supports up to 1024 chars), skip the separate text message for overflow

This could also be exposed via /tts text off as a local preference, similar to /tts summary off.

Channel Considerations

  • Telegram: voice messages support captions (up to 1024 chars). "suppress" would skip the caption entirely; "caption" would keep it but not send overflow as a separate message.
  • WhatsApp: already sends text separately from PTT audio (clients don't render captions on voice notes), so "suppress" would skip that separate text send.
  • Other channels: where voice notes are not supported and audio goes as file attachment, "suppress" should still send text as context.

extent analysis

TL;DR

To address the issue of no voice-only delivery mode when TTS is enabled, add a textDelivery config option under messages.tts to control text delivery when TTS audio is successfully generated.

Guidance

  • Add a textDelivery option to the messages.tts config with possible values: "normal", "suppress", or "caption".
  • Set textDelivery to "suppress" to prevent text from being sent alongside voice messages when TTS audio is successfully generated.
  • Consider exposing this option via a /tts text off command as a local preference.
  • Update the system prompt hint to guide the model to write conversationally for speech when auto-TTS is active.

Example

{
  messages: {
    tts: {
      auto: "always",
      textDelivery: "suppress"
    },
  },
}

Notes

The proposed solution only addresses the issue of voice-only delivery mode and does not cover the suggested improvement to the system prompt hint. Implementing the textDelivery option will require consideration of channel-specific behaviors, such as Telegram's support for voice message captions.

Recommendation

Apply the proposed solution by adding the textDelivery config option to control text delivery when TTS audio is successfully generated, as it provides a flexible way to handle voice-only delivery mode across different channels.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Auto-TTS: add voice-only delivery mode + improve system prompt for natural speech output [1 participants]