openclaw - 💡(How to fix) Fix [Bug]: OpenAI TTS provider doesn't trigger sendVoice in Telegram channel on 2026.5.20 — Google provider works

Q: Expected behavior

With openai provider, `openclaw infer tts convert --gateway --channel telegram --text "..."` already returns `format: opus, voiceCompatible: true` and writes a valid Ogg Opus 24kHz mono file. The Telegram channel registers `capabilities.tts.voice.synthesisTarget = "voice-note"`. Per the bundled speech-provider/payload normalization code, the transport layer should propagate `audioAsVoice: true` from the tool result to the outbound and trigger `api.sendVoice(chatId, file)` — same as the Google provider does in the working scenario above.

openclaw2026-05-25 13:27:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

With messages.tts.provider = "openai", the tts tool generates a valid Ogg Opus file marked voiceCompatible: true in /tmp/openclaw/tts-*/, but the Telegram plugin never emits sendVoice or sendAudio — only sendMessage text. Switching the same install to provider = "google" (Gemini TTS) resolves the end-to-end flow on the next turn with no other config change.

Root Cause

Fix Action

Fix / Workaround

LLM stack unchanged between failing and working tests (Codex gpt-5.5 primary, anthropic/claude-opus-4-6 fallback).
- All plugins enabled: anthropic, browser, codex, google, openai, openclaw-mem0, telegram. Mem0 in platform mode.
- OPENAI_API_KEY valid (mem0 + image gen + infer tts convert all work; the convert command returns Opus correctly even isolated).
- GEMINI_API_KEY valid (working scenario uses it).
- Applied a local patch to dist/channel.setup-*.js adding transcodesAudio: true to the Telegram capabilities.tts.voice (matching Feishu's declaration in the same file). Patch is required for BOTH providers to even attempt voice delivery — without it, Google provider also fails. With it: Google works, OpenAI still doesn't. Suggests the gap is not in the channel capability but in the OpenAI speech-provider's outbound payload propagation specifically.
Affected: users running 2026.5.20 with Telegram channel + OpenAI TTS provider.
- Severity: blocks "send voice note via OpenAI TTS" entirely. Workaround viable (switch provider to google).
- Frequency: 5/5 attempts in 24h, deterministic.
- Consequence: agent's "Áudio enviado" reply becomes a false positive (user gets no audio). Confusing UX. Wasted API spend on TTS synthesis without delivery.

Code Example

# OpenAI provider — gateway log (FAILING)
  May 24 19:36:34 srv openclaw[401900]: [telegram] Inbound message direct, 33 chars
  ... (mem0 recall, codex thread, tool tts execution) ...
  May 24 19:36:37 srv openclaw[401900]: [telegram] outbound send ok accountId=default messageId=619 operation=sendMessage deliveryKind=text chunkCount=1
  # zero sendVoice/sendAudio lines. Session jsonl shows toolCall=tts → toolResult "(spoken) Tudo certo..." → assistant text "Áudio enviado, Alex." but the audio was never attached to outbound.
  
  # Google provider — same install, same chat, same agent (WORKING)
  May 25 10:02:00 srv openclaw[538712]: [telegram] Inbound message direct, 36 chars
  May 25 10:02:03 srv openclaw[538712]: [telegram] outbound send ok accountId=default messageId=640 operation=sendVoice deliveryKind=voice
  
  # Isolated TTS convert with OpenAI provider (the pipeline up to file generation IS correct)
  $ openclaw infer tts convert --gateway --channel telegram --text "test" --voice shimmer --json
  { 
    "ok": true,
    "provider": "openai",
    "transport": "gateway",
    "outputs": [ 
      { "path": "/tmp/openclaw/tts-XXX/voice-NNNN.opus", "format": "opus", "voiceCompatible": true }
    ] 
  } 
  $ file /tmp/openclaw/tts-XXX/voice-NNNN.opus
  Ogg data, Opus audio, version 0.1, mono, 24000 Hz

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Fresh [email protected] install, plugins enabled: anthropic, browser, codex, google, openai, openclaw-mem0, telegram.
Set:
- messages.tts.auto = "tagged"
- messages.tts.provider = "openai"
- messages.tts.providers.openai = { voice: "shimmer", model: "tts-1", responseFormat: "opus" }
In a Telegram DM to the bot, send: "responde em áudio: tudo certo?"
Codex agent inlines [[tts:text]]Tudo certo, Alex.[[/tts:text]] in its reply (confirmed in session jsonl).
Observe in journal: only outbound send ok operation=sendMessage deliveryKind=text — no sendVoice or sendAudio call.
Change one config key only: messages.tts.provider = "google" with providers.google = { voice: "Aoede", model: "gemini-2.5-flash-preview-tts" } (and GEMINI_API_KEY in env).
Same prompt → journal logs outbound send ok operation=sendVoice deliveryKind=voice. Voice note arrives in Telegram.

Expected behavior

With openai provider, openclaw infer tts convert --gateway --channel telegram --text "..." already returns format: opus, voiceCompatible: true and writes a valid Ogg Opus 24kHz mono file. The Telegram channel registers capabilities.tts.voice.synthesisTarget = "voice-note". Per the bundled speech-provider/payload normalization code, the transport layer should propagate audioAsVoice: true from the tool result to the outbound and trigger api.sendVoice(chatId, file) — same as the Google provider does in the working scenario above.

Actual behavior

Session jsonl logs (spoken) ... toolResult correctly, with details.media.mediaUrl pointing to /tmp/openclaw/tts-XXX/voice-NNNN.opus. File exists, valid Ogg Opus 24kHz mono, ~12KB.
Journal shows only outbound send ok operation=sendMessage deliveryKind=text chunkCount=1 — zero sendVoice or sendAudio lines.
Agent's subsequent text reply ("Áudio enviado, Alex.") is delivered as plain sendMessage. User receives text only, no voice note.
Reproducible: 5/5 attempts across 24h with openai provider all fail the same way. Switching to google provider — same install, same chat, same agent — works on first try.

OpenClaw version

2026.5.20 (e510042)

Operating system

Ubuntu 25.10 (kernel 6.17)

Install method

npm global via nvm node v22.22.3

Model

openai-codex/gpt-5.5 (Codex harness via OAuth ChatGPT Plus subscription)

Provider / routing chain

Telegram channel → Codex agent (gpt-5.5) → tool tts → speech-provider openai → ❌ outbound emits only sendMessage text (never sendVoice/sendAudio). Working comparison: same flow with speech-provider google → sendVoice fires correctly.

Additional provider/model setup details

LLM stack unchanged between failing and working tests (Codex gpt-5.5 primary, anthropic/claude-opus-4-6 fallback).
All plugins enabled: anthropic, browser, codex, google, openai, openclaw-mem0, telegram. Mem0 in platform mode.
OPENAI_API_KEY valid (mem0 + image gen + infer tts convert all work; the convert command returns Opus correctly even isolated).
GEMINI_API_KEY valid (working scenario uses it).
Applied a local patch to dist/channel.setup-*.js adding transcodesAudio: true to the Telegram capabilities.tts.voice (matching Feishu's declaration in the same file). Patch is required for BOTH providers to even attempt voice delivery — without it, Google provider also fails. With it: Google works, OpenAI still doesn't. Suggests the gap is not in the channel capability but in the OpenAI speech-provider's outbound payload propagation specifically.

Logs, screenshots, and evidence

# OpenAI provider — gateway log (FAILING)
  May 24 19:36:34 srv openclaw[401900]: [telegram] Inbound message direct, 33 chars
  ... (mem0 recall, codex thread, tool tts execution) ...
  May 24 19:36:37 srv openclaw[401900]: [telegram] outbound send ok accountId=default messageId=619 operation=sendMessage deliveryKind=text chunkCount=1
  # zero sendVoice/sendAudio lines. Session jsonl shows toolCall=tts → toolResult "(spoken) Tudo certo..." → assistant text "Áudio enviado, Alex." but the audio was never attached to outbound.
  
  # Google provider — same install, same chat, same agent (WORKING)
  May 25 10:02:00 srv openclaw[538712]: [telegram] Inbound message direct, 36 chars
  May 25 10:02:03 srv openclaw[538712]: [telegram] outbound send ok accountId=default messageId=640 operation=sendVoice deliveryKind=voice
  
  # Isolated TTS convert with OpenAI provider (the pipeline up to file generation IS correct)
  $ openclaw infer tts convert --gateway --channel telegram --text "test" --voice shimmer --json
  { 
    "ok": true,
    "provider": "openai",
    "transport": "gateway",
    "outputs": [ 
      { "path": "/tmp/openclaw/tts-XXX/voice-NNNN.opus", "format": "opus", "voiceCompatible": true }
    ] 
  } 
  $ file /tmp/openclaw/tts-XXX/voice-NNNN.opus
  Ogg data, Opus audio, version 0.1, mono, 24000 Hz

Impact and severity

Affected: users running 2026.5.20 with Telegram channel + OpenAI TTS provider.
Severity: blocks "send voice note via OpenAI TTS" entirely. Workaround viable (switch provider to google).
Frequency: 5/5 attempts in 24h, deterministic.
Consequence: agent's "Áudio enviado" reply becomes a false positive (user gets no audio). Confusing UX. Wasted API spend on TTS synthesis without delivery.

Additional information

Likely related: closed #65443 (older serie, marked duplicate without published fix) and #81598 (landed in 5.x, but does not appear to cover this OpenAI-specific outbound path).
Hypothesis from local code reading: in the bundled speech-provider runtime, the OpenAI synthesize returns voiceCompatible: true correctly, but may not propagate details.media.audioAsVoice: true (or equivalent) into the outbound payload normalization. Google's speech-provider does propagate it (verified by inspecting the runtime files side by side). Cross-checking the two providers' synthesize return shapes side-by-side likely reveals the gap.
Happy to attach full session jsonl excerpts or run targeted instrumentation if useful.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: OpenAI TTS provider doesn't trigger sendVoice in Telegram channel on 2026.5.20 — Google provider works

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING