openclaw - 💡(How to fix) Fix [Bug]: OpenAI TTS provider doesn't trigger sendVoice in Telegram channel on 2026.5.20 — Google provider works

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

With messages.tts.provider = "openai", the tts tool generates a valid Ogg Opus file marked voiceCompatible: true in /tmp/openclaw/tts-*/, but the Telegram plugin never emits sendVoice or sendAudio — only sendMessage text. Switching the same install to provider = "google" (Gemini TTS) resolves the end-to-end flow on the next turn with no other config change.

Root Cause

With messages.tts.provider = "openai", the tts tool generates a valid Ogg Opus file marked voiceCompatible: true in /tmp/openclaw/tts-*/, but the Telegram plugin never emits sendVoice or sendAudio — only sendMessage text. Switching the same install to provider = "google" (Gemini TTS) resolves the end-to-end flow on the next turn with no other config change.

Fix Action

Fix / Workaround

  • LLM stack unchanged between failing and working tests (Codex gpt-5.5 primary, anthropic/claude-opus-4-6 fallback).

    • All plugins enabled: anthropic, browser, codex, google, openai, openclaw-mem0, telegram. Mem0 in platform mode.
    • OPENAI_API_KEY valid (mem0 + image gen + infer tts convert all work; the convert command returns Opus correctly even isolated).
    • GEMINI_API_KEY valid (working scenario uses it).
    • Applied a local patch to dist/channel.setup-*.js adding transcodesAudio: true to the Telegram capabilities.tts.voice (matching Feishu's declaration in the same file). Patch is required for BOTH providers to even attempt voice delivery — without it, Google provider also fails. With it: Google works, OpenAI still doesn't. Suggests the gap is not in the channel capability but in the OpenAI speech-provider's outbound payload propagation specifically.
  • Affected: users running 2026.5.20 with Telegram channel + OpenAI TTS provider.

    • Severity: blocks "send voice note via OpenAI TTS" entirely. Workaround viable (switch provider to google).
    • Frequency: 5/5 attempts in 24h, deterministic.
    • Consequence: agent's "Áudio enviado" reply becomes a false positive (user gets no audio). Confusing UX. Wasted API spend on TTS synthesis without delivery.

Code Example

# OpenAI provider — gateway log (FAILING)
  May 24 19:36:34 srv openclaw[401900]: [telegram] Inbound message direct, 33 chars
  ... (mem0 recall, codex thread, tool tts execution) ...
  May 24 19:36:37 srv openclaw[401900]: [telegram] outbound send ok accountId=default messageId=619 operation=sendMessage deliveryKind=text chunkCount=1
  # zero sendVoice/sendAudio lines. Session jsonl shows toolCall=tts → toolResult "(spoken) Tudo certo..." → assistant text "Áudio enviado, Alex." but the audio was never attached to outbound.
  
  # Google provider — same install, same chat, same agent (WORKING)
  May 25 10:02:00 srv openclaw[538712]: [telegram] Inbound message direct, 36 chars
  May 25 10:02:03 srv openclaw[538712]: [telegram] outbound send ok accountId=default messageId=640 operation=sendVoice deliveryKind=voice
  
  # Isolated TTS convert with OpenAI provider (the pipeline up to file generation IS correct)
  $ openclaw infer tts convert --gateway --channel telegram --text "test" --voice shimmer --json
  { 
    "ok": true,
    "provider": "openai",
    "transport": "gateway",
    "outputs": [ 
      { "path": "/tmp/openclaw/tts-XXX/voice-NNNN.opus", "format": "opus", "voiceCompatible": true }
    ] 
  } 
  $ file /tmp/openclaw/tts-XXX/voice-NNNN.opus
  Ogg data, Opus audio, version 0.1, mono, 24000 Hz
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

With messages.tts.provider = "openai", the tts tool generates a valid Ogg Opus file marked voiceCompatible: true in /tmp/openclaw/tts-*/, but the Telegram plugin never emits sendVoice or sendAudio — only sendMessage text. Switching the same install to provider = "google" (Gemini TTS) resolves the end-to-end flow on the next turn with no other config change.

Steps to reproduce

  1. Fresh [email protected] install, plugins enabled: anthropic, browser, codex, google, openai, openclaw-mem0, telegram.
  2. Set:
    • messages.tts.auto = "tagged"
    • messages.tts.provider = "openai"
    • messages.tts.providers.openai = { voice: "shimmer", model: "tts-1", responseFormat: "opus" }
  3. In a Telegram DM to the bot, send: "responde em áudio: tudo certo?"
  4. Codex agent inlines [[tts:text]]Tudo certo, Alex.[[/tts:text]] in its reply (confirmed in session jsonl).
  5. Observe in journal: only outbound send ok operation=sendMessage deliveryKind=text — no sendVoice or sendAudio call.
  6. Change one config key only: messages.tts.provider = "google" with providers.google = { voice: "Aoede", model: "gemini-2.5-flash-preview-tts" } (and GEMINI_API_KEY in env).
  7. Same prompt → journal logs outbound send ok operation=sendVoice deliveryKind=voice. Voice note arrives in Telegram.

Expected behavior

With openai provider, openclaw infer tts convert --gateway --channel telegram --text "..." already returns format: opus, voiceCompatible: true and writes a valid Ogg Opus 24kHz mono file. The Telegram channel registers capabilities.tts.voice.synthesisTarget = "voice-note". Per the bundled speech-provider/payload normalization code, the transport layer should propagate audioAsVoice: true from the tool result to the outbound and trigger api.sendVoice(chatId, file) — same as the Google provider does in the working scenario above.

Actual behavior

  • Session jsonl logs (spoken) ... toolResult correctly, with details.media.mediaUrl pointing to /tmp/openclaw/tts-XXX/voice-NNNN.opus. File exists, valid Ogg Opus 24kHz mono, ~12KB.
  • Journal shows only outbound send ok operation=sendMessage deliveryKind=text chunkCount=1 — zero sendVoice or sendAudio lines.
  • Agent's subsequent text reply ("Áudio enviado, Alex.") is delivered as plain sendMessage. User receives text only, no voice note.
  • Reproducible: 5/5 attempts across 24h with openai provider all fail the same way. Switching to google provider — same install, same chat, same agent — works on first try.

OpenClaw version

2026.5.20 (e510042)

Operating system

Ubuntu 25.10 (kernel 6.17)

Install method

npm global via nvm node v22.22.3

Model

openai-codex/gpt-5.5 (Codex harness via OAuth ChatGPT Plus subscription)

Provider / routing chain

Telegram channel → Codex agent (gpt-5.5) → tool tts → speech-provider openai → ❌ outbound emits only sendMessage text (never sendVoice/sendAudio). Working comparison: same flow with speech-provider google → sendVoice fires correctly.

Additional provider/model setup details

  • LLM stack unchanged between failing and working tests (Codex gpt-5.5 primary, anthropic/claude-opus-4-6 fallback).
  • All plugins enabled: anthropic, browser, codex, google, openai, openclaw-mem0, telegram. Mem0 in platform mode.
  • OPENAI_API_KEY valid (mem0 + image gen + infer tts convert all work; the convert command returns Opus correctly even isolated).
  • GEMINI_API_KEY valid (working scenario uses it).
  • Applied a local patch to dist/channel.setup-*.js adding transcodesAudio: true to the Telegram capabilities.tts.voice (matching Feishu's declaration in the same file). Patch is required for BOTH providers to even attempt voice delivery — without it, Google provider also fails. With it: Google works, OpenAI still doesn't. Suggests the gap is not in the channel capability but in the OpenAI speech-provider's outbound payload propagation specifically.

Logs, screenshots, and evidence

# OpenAI provider — gateway log (FAILING)
  May 24 19:36:34 srv openclaw[401900]: [telegram] Inbound message direct, 33 chars
  ... (mem0 recall, codex thread, tool tts execution) ...
  May 24 19:36:37 srv openclaw[401900]: [telegram] outbound send ok accountId=default messageId=619 operation=sendMessage deliveryKind=text chunkCount=1
  # zero sendVoice/sendAudio lines. Session jsonl shows toolCall=tts → toolResult "(spoken) Tudo certo..." → assistant text "Áudio enviado, Alex." but the audio was never attached to outbound.
  
  # Google provider — same install, same chat, same agent (WORKING)
  May 25 10:02:00 srv openclaw[538712]: [telegram] Inbound message direct, 36 chars
  May 25 10:02:03 srv openclaw[538712]: [telegram] outbound send ok accountId=default messageId=640 operation=sendVoice deliveryKind=voice
  
  # Isolated TTS convert with OpenAI provider (the pipeline up to file generation IS correct)
  $ openclaw infer tts convert --gateway --channel telegram --text "test" --voice shimmer --json
  { 
    "ok": true,
    "provider": "openai",
    "transport": "gateway",
    "outputs": [ 
      { "path": "/tmp/openclaw/tts-XXX/voice-NNNN.opus", "format": "opus", "voiceCompatible": true }
    ] 
  } 
  $ file /tmp/openclaw/tts-XXX/voice-NNNN.opus
  Ogg data, Opus audio, version 0.1, mono, 24000 Hz

Impact and severity

  • Affected: users running 2026.5.20 with Telegram channel + OpenAI TTS provider.
  • Severity: blocks "send voice note via OpenAI TTS" entirely. Workaround viable (switch provider to google).
  • Frequency: 5/5 attempts in 24h, deterministic.
  • Consequence: agent's "Áudio enviado" reply becomes a false positive (user gets no audio). Confusing UX. Wasted API spend on TTS synthesis without delivery.

Additional information

  • Likely related: closed #65443 (older serie, marked duplicate without published fix) and #81598 (landed in 5.x, but does not appear to cover this OpenAI-specific outbound path).
  • Hypothesis from local code reading: in the bundled speech-provider runtime, the OpenAI synthesize returns voiceCompatible: true correctly, but may not propagate details.media.audioAsVoice: true (or equivalent) into the outbound payload normalization. Google's speech-provider does propagate it (verified by inspecting the runtime files side by side). Cross-checking the two providers' synthesize return shapes side-by-side likely reveals the gap.
  • Happy to attach full session jsonl excerpts or run targeted instrumentation if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

With openai provider, openclaw infer tts convert --gateway --channel telegram --text "..." already returns format: opus, voiceCompatible: true and writes a valid Ogg Opus 24kHz mono file. The Telegram channel registers capabilities.tts.voice.synthesisTarget = "voice-note". Per the bundled speech-provider/payload normalization code, the transport layer should propagate audioAsVoice: true from the tool result to the outbound and trigger api.sendVoice(chatId, file) — same as the Google provider does in the working scenario above.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING