openclaw - 💡(How to fix) Fix [Bug]: Telegram auto TTS not triggered when ASR transcript replaces <media:audio> — inboundAudio detection fails (2026.4.15) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#69653Fetched 2026-04-22 07:49:41
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Auto TTS (messages.tts.auto: "inbound") is not triggered on Telegram when ASR transcription is enabled. The agent receives the transcribed text correctly, but the TTS pipeline never fires.

This appears to be the same root cause as #65951, but I can confirm it is still present in version 2026.4.15.

Root Cause

This appears to be the same root cause as #65951, but I can confirm it is still present in version 2026.4.15.

Code Example

{
     "messages": {
       "tts": {
         "enabled": true,
         "auto": "inbound",
         "mode": "final",
         "provider": "minimax",
         "providers": {
           "minimax": {
             "baseUrl": "https://api.minimaxi.com",
             "model": "speech-2.8-hd",
             "voiceId": "Chinese (Mandarin)_Warm_Girl",
             "apiKey": "sk-xxx"
           }
         }
       }
     }
   }
RAW_BUFFERClick to expand / collapse

Description

Auto TTS (messages.tts.auto: "inbound") is not triggered on Telegram when ASR transcription is enabled. The agent receives the transcribed text correctly, but the TTS pipeline never fires.

This appears to be the same root cause as #65951, but I can confirm it is still present in version 2026.4.15.

Steps to Reproduce

  1. Configure TTS:

    {
      "messages": {
        "tts": {
          "enabled": true,
          "auto": "inbound",
          "mode": "final",
          "provider": "minimax",
          "providers": {
            "minimax": {
              "baseUrl": "https://api.minimaxi.com",
              "model": "speech-2.8-hd",
              "voiceId": "Chinese (Mandarin)_Warm_Girl",
              "apiKey": "sk-xxx"
            }
          }
        }
      }
    }
  2. Configure ASR (whisper-transcribe.sh works correctly)

  3. Send a voice message on Telegram

  4. ASR transcribes successfully → agent replies with text

  5. Expected: Agent text reply is automatically converted to TTS audio

  6. Actual: No TTS generation event occurs. Gateway logs show zero TTS trigger events.

Environment

  • OpenClaw version: 2026.4.15 (041266a)
  • Channel: Telegram (polling surface)
  • ASR: whisper-transcribe.sh (working)
  • TTS provider: MiniMax (API verified working manually)
  • Runtime: macOS Darwin 25.4.0 (arm64), Node v25.8.1

Additional Context

  • Manual TTS generation via MiniMax API + message tool with asVoice: true works correctly
  • The auto: "inbound" mode is recognized (validated against TTS_AUTO_MODES set)
  • Gateway dynamic reload confirms messages.tts.enabled and messages.tts.mode are applied
  • Gateway logs contain zero TTS generation events for the Telegram session
  • ASR replaces <media:audio> placeholder with transcript text, after which inboundAudio context flag becomes false
  • This matches the diagnosis in #65951: isInboundAudioContext() only checks MediaType/MediaTypes from the original media metadata, but after transcript replacement these fields may be lost

Related Issues

  • #65951 — original report of this bug (2026.4.11)
  • #66553 — race condition between async STT and agent turn (may be related)

Suggested Fix

Harden isInboundAudioContext() to detect audio context from:

  • Media paths/URLs (not just MIME type placeholders)
  • The presence of a transcript that replaced <media:audio>
  • Any media:audio markers in the message metadata

This way, even after ASR replaces the audio placeholder with text, the system still knows the inbound message was originally audio and should trigger TTS on the reply.

extent analysis

TL;DR

The issue can be resolved by modifying the isInboundAudioContext() function to detect audio context from media paths/URLs, the presence of a transcript, and media:audio markers in the message metadata.

Guidance

  • Review the isInboundAudioContext() function to understand its current implementation and how it checks for audio context.
  • Modify the function to include checks for media paths/URLs, the presence of a transcript that replaced <media:audio>, and media:audio markers in the message metadata.
  • Verify that the updated function correctly identifies audio context after ASR transcription.
  • Test the changes with the provided Telegram configuration and ASR setup to ensure TTS is triggered correctly.

Example

function isInboundAudioContext(message) {
  // Existing checks for MediaType/MediaTypes
  // ...

  // Additional checks for media paths/URLs
  if (message.media && message.media.path) {
    // Check if the media path indicates an audio file
    if (message.media.path.endsWith('.wav') || message.media.path.endsWith('.mp3')) {
      return true;
    }
  }

  // Check for presence of a transcript that replaced <media:audio>
  if (message.transcript && message.transcript.includes('<media:audio>')) {
    return true;
  }

  // Check for media:audio markers in the message metadata
  if (message.metadata && message.metadata['media:audio']) {
    return true;
  }

  return false;
}

Notes

The suggested fix assumes that the isInboundAudioContext() function is the root cause of the issue. However, the related issue #66553 mentions a potential race condition between async STT and agent turn, which may also need to be addressed.

Recommendation

Apply the suggested fix to the isInboundAudioContext() function, as it directly addresses the identified issue and provides a clear solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Telegram auto TTS not triggered when ASR transcript replaces <media:audio> — inboundAudio detection fails (2026.4.15) [1 participants]