openclaw - 💡(How to fix) Fix [Bug]: Telegram auto TTS not triggered when ASR transcript replaces <media:audio> — inboundAudio detection fails (2026.4.15) [1 participants]

openclaw2026-04-21 09:05:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#69653•Fetched 2026-04-22 07:49:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Bowerai

Participants

Bowerai

Auto TTS (messages.tts.auto: "inbound") is not triggered on Telegram when ASR transcription is enabled. The agent receives the transcribed text correctly, but the TTS pipeline never fires.

This appears to be the same root cause as #65951, but I can confirm it is still present in version 2026.4.15.

Root Cause

This appears to be the same root cause as #65951, but I can confirm it is still present in version 2026.4.15.

Code Example

{
     "messages": {
       "tts": {
         "enabled": true,
         "auto": "inbound",
         "mode": "final",
         "provider": "minimax",
         "providers": {
           "minimax": {
             "baseUrl": "https://api.minimaxi.com",
             "model": "speech-2.8-hd",
             "voiceId": "Chinese (Mandarin)_Warm_Girl",
             "apiKey": "sk-xxx"
           }
         }
       }
     }
   }

RAW_BUFFERClick to expand / collapse

Description

Auto TTS (messages.tts.auto: "inbound") is not triggered on Telegram when ASR transcription is enabled. The agent receives the transcribed text correctly, but the TTS pipeline never fires.

This appears to be the same root cause as #65951, but I can confirm it is still present in version 2026.4.15.

Steps to Reproduce

Configure TTS:

{
  "messages": {
    "tts": {
      "enabled": true,
      "auto": "inbound",
      "mode": "final",
      "provider": "minimax",
      "providers": {
        "minimax": {
          "baseUrl": "https://api.minimaxi.com",
          "model": "speech-2.8-hd",
          "voiceId": "Chinese (Mandarin)_Warm_Girl",
          "apiKey": "sk-xxx"
        }
      }
    }
  }
}

Configure ASR (whisper-transcribe.sh works correctly)
Send a voice message on Telegram
ASR transcribes successfully → agent replies with text
Expected: Agent text reply is automatically converted to TTS audio
Actual: No TTS generation event occurs. Gateway logs show zero TTS trigger events.

Environment

OpenClaw version: 2026.4.15 (041266a)
Channel: Telegram (polling surface)
ASR: whisper-transcribe.sh (working)
TTS provider: MiniMax (API verified working manually)
Runtime: macOS Darwin 25.4.0 (arm64), Node v25.8.1

Additional Context

Manual TTS generation via MiniMax API + message tool with asVoice: true works correctly
The auto: "inbound" mode is recognized (validated against TTS_AUTO_MODES set)
Gateway dynamic reload confirms messages.tts.enabled and messages.tts.mode are applied
Gateway logs contain zero TTS generation events for the Telegram session
ASR replaces <media:audio> placeholder with transcript text, after which inboundAudio context flag becomes false
This matches the diagnosis in #65951: isInboundAudioContext() only checks MediaType/MediaTypes from the original media metadata, but after transcript replacement these fields may be lost

Related Issues

#65951 — original report of this bug (2026.4.11)
#66553 — race condition between async STT and agent turn (may be related)

Suggested Fix

Harden isInboundAudioContext() to detect audio context from:

Media paths/URLs (not just MIME type placeholders)
The presence of a transcript that replaced <media:audio>
Any media:audio markers in the message metadata

This way, even after ASR replaces the audio placeholder with text, the system still knows the inbound message was originally audio and should trigger TTS on the reply.

extent analysis

TL;DR

The issue can be resolved by modifying the isInboundAudioContext() function to detect audio context from media paths/URLs, the presence of a transcript, and media:audio markers in the message metadata.

Guidance

Review the isInboundAudioContext() function to understand its current implementation and how it checks for audio context.
Modify the function to include checks for media paths/URLs, the presence of a transcript that replaced <media:audio>, and media:audio markers in the message metadata.
Verify that the updated function correctly identifies audio context after ASR transcription.
Test the changes with the provided Telegram configuration and ASR setup to ensure TTS is triggered correctly.

Example

function isInboundAudioContext(message) {
  // Existing checks for MediaType/MediaTypes
  // ...

  // Additional checks for media paths/URLs
  if (message.media && message.media.path) {
    // Check if the media path indicates an audio file
    if (message.media.path.endsWith('.wav') || message.media.path.endsWith('.mp3')) {
      return true;
    }
  }

  // Check for presence of a transcript that replaced <media:audio>
  if (message.transcript && message.transcript.includes('<media:audio>')) {
    return true;
  }

  // Check for media:audio markers in the message metadata
  if (message.metadata && message.metadata['media:audio']) {
    return true;
  }

  return false;
}

Notes

The suggested fix assumes that the isInboundAudioContext() function is the root cause of the issue. However, the related issue #66553 mentions a potential race condition between async STT and agent turn, which may also need to be addressed.

Recommendation

Apply the suggested fix to the isInboundAudioContext() function, as it directly addresses the identified issue and provides a clear solution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Telegram auto TTS not triggered when ASR transcript replaces <media:audio> — inboundAudio detection fails (2026.4.15) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Description

Steps to Reproduce

Environment

Additional Context

Related Issues

Suggested Fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Telegram auto TTS not triggered when ASR transcript replaces <media:audio> — inboundAudio detection fails (2026.4.15) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Description

Steps to Reproduce

Environment

Additional Context

Related Issues

Suggested Fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING