openclaw - 💡(How to fix) Fix [Bug]: Telegram voice notes not auto-transcribed despite unified inbound handling fix (#20591) [1 comments, 2 participants]

openclaw2026-04-29 14:56:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74421•Fetched 2026-04-30 06:24:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jasontoff

Participants

clawsweeper[bot]

jasontoff

Timeline (top)

labeled ×2commented ×1

Problem

Telegram voice notes are not being automatically transcribed to text, even though:

• tools.media.audio is correctly configured with OpenAI Whisper • The API key works (tested manually with curl) • PR #20591 (merged Feb 19, 2026) unified Telegram message/channel_post handling to share one processing pipeline • Running OpenClaw 2026.4.26

Expected behavior

When a voice note is sent to the bot via Telegram, OpenClaw should:

Receive the audio file (confirmed — file saves to ~/.openclaw/media/inbound/)
Call applyMediaUnderstanding to transcribe it
Include the transcript as machine-generated text in the agent context

Per Telegram docs: "inbound voice-note transcripts are framed as machine-generated, untrusted text in the agent context"

Actual behavior

Voice notes arrive but are NOT transcribed. The agent receives only the audio attachment, not the transcript text.

Config (verified working) { "tools": { "media": { "audio": { "enabled": true, "models": [ { "provider": "openai", "model": "whisper-1" } ] } } }, "channels": { "telegram": { "enabled": true, "stt": { "provider": "openai", "model": "whisper-1" } } } }

Tried both global tools.media.audio AND channel-level channels.telegram.stt — neither works)

Diagnostics • OpenAI Whisper API: ✅ Works (tested manually: curl ... /v1/audio/transcriptions) • Config JSON: ✅ Valid • Gateway logs: ❌ No "applyMediaUnderstanding", "stt", or "transcription" messages — handler not being called • Manual transcription workaround: ✅ Works with ffmpeg + curl

• Fixes #19062 (PR #20591) • Similar issues: Reddit thread on STT not triggering • Possibly incomplete fix: PR #20591 may not have fully enabled applyMediaUnderstanding for all Telegram inbound paths

Root Cause

Problem

Telegram voice notes are not being automatically transcribed to text, even though:

Expected behavior

When a voice note is sent to the bot via Telegram, OpenClaw should:

Receive the audio file (confirmed — file saves to ~/.openclaw/media/inbound/)
Call applyMediaUnderstanding to transcribe it
Include the transcript as machine-generated text in the agent context

Per Telegram docs: "inbound voice-note transcripts are framed as machine-generated, untrusted text in the agent context"

Actual behavior

Voice notes arrive but are NOT transcribed. The agent receives only the audio attachment, not the transcript text.

Tried both global tools.media.audio AND channel-level channels.telegram.stt — neither works)

Fix Action

Fix / Workaround

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Problem

Telegram voice notes are not being automatically transcribed to text, even though:

Expected behavior

When a voice note is sent to the bot via Telegram, OpenClaw should:

Receive the audio file (confirmed — file saves to ~/.openclaw/media/inbound/)
Call applyMediaUnderstanding to transcribe it
Include the transcript as machine-generated text in the agent context

Per Telegram docs: "inbound voice-note transcripts are framed as machine-generated, untrusted text in the agent context"

Actual behavior

Voice notes arrive but are NOT transcribed. The agent receives only the audio attachment, not the transcript text.

Tried both global tools.media.audio AND channel-level channels.telegram.stt — neither works)

Steps to reproduce

. Configure tools.media.audio with OpenAI Whisper 2. Send a voice note to a Telegram bot running OpenClaw 2026.4.26 3. Observe: no transcript in the agent's message context 4. Check logs: no audio transcription events

Expected behavior

Audio should get transcribed

Actual behavior

Audio doesnt

OpenClaw version

2026.4.26

Operating system

MacOS (arm64)

Install method

No response

Model

haiku

Provider / routing chain

openclaw

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The most likely fix is to review and adjust the implementation of PR #20591 to ensure applyMediaUnderstanding is correctly called for all Telegram inbound voice notes.

Guidance

Verify that the applyMediaUnderstanding function is being called for Telegram voice notes by checking the gateway logs for any related errors or warnings.
Review the code changes made in PR #20591 to ensure that the unified message/channel_post handling pipeline correctly triggers audio transcription for voice notes.
Test the transcription functionality with different types of audio files and Telegram configurations to isolate the issue.
Consider adding debug logging to the applyMediaUnderstanding function to confirm whether it is being called and to identify any potential issues.

Example

No code snippet is provided due to the lack of specific implementation details in the issue.

Notes

The issue may be related to an incomplete fix in PR #20591, and resolving this will likely require reviewing and adjusting the code changes made in that pull request.

Recommendation

Apply workaround: Review and adjust the implementation of PR #20591 to ensure correct audio transcription handling for Telegram voice notes, as the current configuration and setup seem to be correct but the transcription is not being triggered.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Audio should get transcribed

#api #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: Telegram voice notes not auto-transcribed despite unified inbound handling fix (#20591) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING