openclaw - 💡(How to fix) Fix [Bug]:Telegram voice delivery is unstable across model runtimes because voice generation depends on model-generated media tags

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Telegram voice delivery is unstable across model runtimes: GPT/Codex can resend a stale local voice file, and MiniMax can surface No response generated. Please try again. after Telegram delivery errors, while DeepSeek works reliably in the same Telegram voice setup.

Error Message

Evidence from observed session behavior:

Runtime: OpenAI Codex Voice: tagged · provider=tts-local-cli Queue: collect

Observed stale media reuse:

voice_5323.ogg

The same old voice file was sent twice after a delayed GPT/Codex voice generation turn: once for the previous message and once for the next message.

Observed Codex runtime/tooling error:

Native hook relay unavailable

Observed MiniMax/Telegram delivery error evidence:

sendMessage failed OutboundDeliveryError send_attempt_started No response generated. Please try again.

Local TTS context:

Voice provider: tts-local-cli Local TTS backend: mlx-indextts / IndexTTS Expected output: .ogg / Opus voice file Delivery target: Telegram native voice/sendAudio

Root Cause

• DeepSeek appears stable because its path is shorter: model output -> OpenClaw media handling -> Telegram delivery. • GPT/Codex has extra layers such as Codex runtime, message tool delivery, local hook relay, and tagged media handling. • MiniMax appears to have different fallback behavior after Telegram delivery failures. • In tagged voice mode, the model/runtime appears to participate in producing MEDIA: / voice tags or otherwise managing the media delivery flow, making voice delivery behavior model-runtime-dependent.

Code Example

Native hook relay unavailable

This affects GPT/Codex voice generation/debug paths and may leave the runtime in a state where it reuses stale media instead of generating/sending the current reply.

### OpenClaw version

OpenClaw 2026.5.26 (10ad3aa)

### Operating system

macOS 26.5

### Install method

Launched via local OpenClaw Gateway on macOS. The active LaunchAgent is ai.openclaw.gateway.plist.

### Model

Observed across multiple model runtimes:GPT / OpenAI Codex /gpt5.5 runtime • MiniMax/M2.7 runtime • DeepSeek/deepseek V4 flash runtime  The failure is runtime-dependent rather than model-text-dependent.

### Provider / routing chain

Observed session status:

---

Evidence from observed session behavior:


Runtime: OpenAI Codex
Voice: tagged · provider=tts-local-cli
Queue: collect

Observed stale media reuse:

voice_5323.ogg

The same old voice file was sent twice after a delayed GPT/Codex voice generation turn: once for the previous message and once for the next message.

Observed Codex runtime/tooling error:

Native hook relay unavailable

Observed MiniMax/Telegram delivery error evidence:

sendMessage failed
OutboundDeliveryError
send_attempt_started
No response generated. Please try again.

Local TTS context:

Voice provider: tts-local-cli
Local TTS backend: mlx-indextts / IndexTTS
Expected output: .ogg / Opus voice file
Delivery target: Telegram native voice/sendAudio
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Telegram voice delivery is unstable across model runtimes: GPT/Codex can resend a stale local voice file, and MiniMax can surface No response generated. Please try again. after Telegram delivery errors, while DeepSeek works reliably in the same Telegram voice setup.

Steps to reproduce

  1. Configure Telegram replies to use voice delivery with Voice: tagged · provider=tts-local-cli.
  2. Use local TTS voice generation via tts-local-cli, backed by local mlx-indextts / IndexTTS, producing .ogg / Opus voice files for Telegram delivery.
  3. Send Telegram messages while switching between model runtimes: GPT/Codex, MiniMax, and DeepSeek.
  4. In the GPT/Codex runtime, trigger or observe a slow voice generation turn. One observed GPT voice generation took about 76 seconds.
  5. Send another Telegram message before the previous voice task has fully completed.
  6. Observe whether the generated voice file is unique to the current message or whether an old generated file is reused.
  7. In the MiniMax runtime, observe Telegram delivery behavior when delivery errors occur.

Expected behavior

For Telegram chats configured as voice-only, all model runtimes should produce the same delivery behavior:

  1. The model returns plain text only.
  2. OpenClaw runs local TTS after model output using tts-local-cli / local mlx-indextts.
  3. OpenClaw generates a fresh .ogg / Opus voice file uniquely bound to the current inbound message or delivery id.
  4. OpenClaw sends that file through one unified Telegram native voice/sendAudio delivery path.
  5. OpenClaw never reuses a previous voice file for a different message unless it is explicitly verified to belong to that message.
  6. Telegram delivery failures should retry or report a clear delivery/TTS error for the current message, not swallow the real reply or replace it with a generic fallback.

Known-good reference from observation: DeepSeek replies in the same Telegram voice setup produced fresh voice replies and delivered reliably.

Actual behavior

Observed behavior differs by model runtime:

GPT / Codex runtime

  • One GPT voice generation took about 76 seconds.
  • A second Telegram message arrived before the first voice task fully finished.
  • After recovery, the same old local voice file, for example voice_5323.ogg, was sent twice: once for the previous message and once for the next message.
  • This indicates cross-turn voice task overlap plus stale media reuse.

MiniMax runtime

  • User-visible result can be No response generated. Please try again.
  • Logs showed Telegram delivery errors such as sendMessage failed, OutboundDeliveryError, and delivery stuck around send_attempt_started.
  • The model may have produced a valid response, but Telegram delivery failure/fallback appears to swallow the real reply and surface the generic no-response message.

DeepSeek runtime

  • Replies were converted to voice normally.
  • Each reply appeared to produce a fresh voice file.
  • Telegram delivery succeeded consistently in the observed tests.

Codex runtime additional evidence

Local commands were repeatedly blocked by:

Native hook relay unavailable

This affects GPT/Codex voice generation/debug paths and may leave the runtime in a state where it reuses stale media instead of generating/sending the current reply.

### OpenClaw version

OpenClaw 2026.5.26 (10ad3aa)

### Operating system

macOS 26.5

### Install method

Launched via local OpenClaw Gateway on macOS. The active LaunchAgent is ai.openclaw.gateway.plist.

### Model

Observed across multiple model runtimes:  • GPT / OpenAI Codex /gpt5.5 runtime • MiniMax/M2.7 runtime • DeepSeek/deepseek V4 flash runtime  The failure is runtime-dependent rather than model-text-dependent.

### Provider / routing chain

Observed session status:  ```text Runtime: OpenAI Codex Voice: tagged · provider=tts-local-cli Queue: collect  Telegram voice replies are generated locally, not by the model provider. The voice provider is tts-local-cli, backed by local mlx-indextts / IndexTTS. The generated output is an .ogg / Opus voice file, then delivered through Telegram native voice/sendAudio.

### Additional provider/model setup details

The issue is not about TTS model quality. The local TTS stack can generate voice files. The problem appears to be how generated media files are bound to model turns and delivered across different runtimes.

Current hypothesis:

• DeepSeek appears stable because its path is shorter: model output -> OpenClaw media handling -> Telegram delivery.
• GPT/Codex has extra layers such as Codex runtime, message tool delivery, local hook relay, and tagged media handling.
• MiniMax appears to have different fallback behavior after Telegram delivery failures.
• In tagged voice mode, the model/runtime appears to participate in producing MEDIA: / voice tags or otherwise managing the media delivery flow, making voice delivery behavior model-runtime-dependent.

Suggested fix:

Add or support a runtime-owned voice mode, for example:

voice.mode = post_model | auto_voice | force_voice

Suggested behavior:

1. The model always returns plain text only.
2. OpenClaw checks channel/session policy, for example Telegram chat has voice_only=true.
3. OpenClaw runs tts-local-cli / local mlx-indextts after final model text is produced.
4. The generated audio filename is uniquely bound to the current inbound message or delivery id, for example voice_<message_id>.ogg.
5. OpenClaw sends the audio through one unified Telegram native voice/sendAudio queue, independent of model runtime.
6. On failure, OpenClaw retries or reports a clear delivery/TTS error for the current message.
7. OpenClaw must never reuse a previous voice file for a new message unless it is explicitly verified to belong to that message id.
8. Generic fallback like No response generated. Please try again. should not replace a valid model response when the actual failure is Telegram delivery.

### Logs, screenshots, and evidence

```shell
Evidence from observed session behavior:


Runtime: OpenAI Codex
Voice: tagged · provider=tts-local-cli
Queue: collect

Observed stale media reuse:

voice_5323.ogg

The same old voice file was sent twice after a delayed GPT/Codex voice generation turn: once for the previous message and once for the next message.

Observed Codex runtime/tooling error:

Native hook relay unavailable

Observed MiniMax/Telegram delivery error evidence:

sendMessage failed
OutboundDeliveryError
send_attempt_started
No response generated. Please try again.

Local TTS context:

Voice provider: tts-local-cli
Local TTS backend: mlx-indextts / IndexTTS
Expected output: .ogg / Opus voice file
Delivery target: Telegram native voice/sendAudio

Impact and severity

Affected users/systems/channels: Telegram users relying on voice-only replies, especially when switching between GPT/Codex, MiniMax, and DeepSeek runtimes.

Severity: Blocks reliable Telegram voice workflow. The assistant can resend an old voice message for a new user message, or fail with a generic no-response fallback.

Frequency: Intermittent, but reproducible under observed cross-runtime Telegram voice testing. GPT/Codex failure was observed when one voice generation turn took about 76 seconds and another message arrived before completion. MiniMax failure was observed when Telegram delivery errors occurred.

Consequence:

• Wrong voice reply can be sent for the current message. • Old/stale generated media can be reused. • Valid model replies may be swallowed by delivery fallback. • User sees No response generated. Please try again. instead of the actual response. • Voice-only Telegram workflows become unreliable across model runtimes.

Additional information

This is not a request for GPT/MiniMax to imitate DeepSeek.

The request is to move Telegram voice delivery out of model-generated tagged output and into a runtime-level post-model voice transform pipeline:

model final text -> OpenClaw voice transform -> tts-local-cli / local mlx-indextts -> unique .ogg / Opus file bound to inbound message_id / delivery_id -> Telegram native voice/sendAudio delivery

Regression information: Last known good version: NOT_ENOUGH_INFO First known bad version: OpenClaw 2026.5.26 (10ad3aa)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

For Telegram chats configured as voice-only, all model runtimes should produce the same delivery behavior:

  1. The model returns plain text only.
  2. OpenClaw runs local TTS after model output using tts-local-cli / local mlx-indextts.
  3. OpenClaw generates a fresh .ogg / Opus voice file uniquely bound to the current inbound message or delivery id.
  4. OpenClaw sends that file through one unified Telegram native voice/sendAudio delivery path.
  5. OpenClaw never reuses a previous voice file for a different message unless it is explicitly verified to belong to that message.
  6. Telegram delivery failures should retry or report a clear delivery/TTS error for the current message, not swallow the real reply or replace it with a generic fallback.

Known-good reference from observation: DeepSeek replies in the same Telegram voice setup produced fresh voice replies and delivered reliably.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING