openclaw - 💡(How to fix) Fix [Bug]: Telegram voice notes not auto-transcribed despite unified inbound handling fix (#20591) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74421Fetched 2026-04-30 06:24:05
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Author
Timeline (top)
labeled ×2commented ×1

Problem

Telegram voice notes are not being automatically transcribed to text, even though:

• tools.media.audio is correctly configured with OpenAI Whisper • The API key works (tested manually with curl) • PR #20591 (merged Feb 19, 2026) unified Telegram message/channel_post handling to share one processing pipeline • Running OpenClaw 2026.4.26

Expected behavior

When a voice note is sent to the bot via Telegram, OpenClaw should:

  1. Receive the audio file (confirmed — file saves to ~/.openclaw/media/inbound/)
  2. Call applyMediaUnderstanding to transcribe it
  3. Include the transcript as machine-generated text in the agent context

Per Telegram docs: "inbound voice-note transcripts are framed as machine-generated, untrusted text in the agent context"

Actual behavior

Voice notes arrive but are NOT transcribed. The agent receives only the audio attachment, not the transcript text.

Config (verified working) { "tools": { "media": { "audio": { "enabled": true, "models": [ { "provider": "openai", "model": "whisper-1" } ] } } }, "channels": { "telegram": { "enabled": true, "stt": { "provider": "openai", "model": "whisper-1" } } } }

Tried both global tools.media.audio AND channel-level channels.telegram.stt — neither works)

Diagnostics • OpenAI Whisper API: ✅ Works (tested manually: curl ... /v1/audio/transcriptions) • Config JSON: ✅ Valid • Gateway logs: ❌ No "applyMediaUnderstanding", "stt", or "transcription" messages — handler not being called • Manual transcription workaround: ✅ Works with ffmpeg + curl

Related

• Fixes #19062 (PR #20591) • Similar issues: Reddit thread on STT not triggering • Possibly incomplete fix: PR #20591 may not have fully enabled applyMediaUnderstanding for all Telegram inbound paths

Root Cause

Problem

Telegram voice notes are not being automatically transcribed to text, even though:

• tools.media.audio is correctly configured with OpenAI Whisper • The API key works (tested manually with curl) • PR #20591 (merged Feb 19, 2026) unified Telegram message/channel_post handling to share one processing pipeline • Running OpenClaw 2026.4.26

Expected behavior

When a voice note is sent to the bot via Telegram, OpenClaw should:

  1. Receive the audio file (confirmed — file saves to ~/.openclaw/media/inbound/)
  2. Call applyMediaUnderstanding to transcribe it
  3. Include the transcript as machine-generated text in the agent context

Per Telegram docs: "inbound voice-note transcripts are framed as machine-generated, untrusted text in the agent context"

Actual behavior

Voice notes arrive but are NOT transcribed. The agent receives only the audio attachment, not the transcript text.

Config (verified working) { "tools": { "media": { "audio": { "enabled": true, "models": [ { "provider": "openai", "model": "whisper-1" } ] } } }, "channels": { "telegram": { "enabled": true, "stt": { "provider": "openai", "model": "whisper-1" } } } }

Tried both global tools.media.audio AND channel-level channels.telegram.stt — neither works)

Diagnostics • OpenAI Whisper API: ✅ Works (tested manually: curl ... /v1/audio/transcriptions) • Config JSON: ✅ Valid • Gateway logs: ❌ No "applyMediaUnderstanding", "stt", or "transcription" messages — handler not being called • Manual transcription workaround: ✅ Works with ffmpeg + curl

Related

• Fixes #19062 (PR #20591) • Similar issues: Reddit thread on STT not triggering • Possibly incomplete fix: PR #20591 may not have fully enabled applyMediaUnderstanding for all Telegram inbound paths

Fix Action

Fix / Workaround

Diagnostics • OpenAI Whisper API: ✅ Works (tested manually: curl ... /v1/audio/transcriptions) • Config JSON: ✅ Valid • Gateway logs: ❌ No "applyMediaUnderstanding", "stt", or "transcription" messages — handler not being called • Manual transcription workaround: ✅ Works with ffmpeg + curl

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Problem

Telegram voice notes are not being automatically transcribed to text, even though:

• tools.media.audio is correctly configured with OpenAI Whisper • The API key works (tested manually with curl) • PR #20591 (merged Feb 19, 2026) unified Telegram message/channel_post handling to share one processing pipeline • Running OpenClaw 2026.4.26

Expected behavior

When a voice note is sent to the bot via Telegram, OpenClaw should:

  1. Receive the audio file (confirmed — file saves to ~/.openclaw/media/inbound/)
  2. Call applyMediaUnderstanding to transcribe it
  3. Include the transcript as machine-generated text in the agent context

Per Telegram docs: "inbound voice-note transcripts are framed as machine-generated, untrusted text in the agent context"

Actual behavior

Voice notes arrive but are NOT transcribed. The agent receives only the audio attachment, not the transcript text.

Config (verified working) { "tools": { "media": { "audio": { "enabled": true, "models": [ { "provider": "openai", "model": "whisper-1" } ] } } }, "channels": { "telegram": { "enabled": true, "stt": { "provider": "openai", "model": "whisper-1" } } } }

Tried both global tools.media.audio AND channel-level channels.telegram.stt — neither works)

Diagnostics • OpenAI Whisper API: ✅ Works (tested manually: curl ... /v1/audio/transcriptions) • Config JSON: ✅ Valid • Gateway logs: ❌ No "applyMediaUnderstanding", "stt", or "transcription" messages — handler not being called • Manual transcription workaround: ✅ Works with ffmpeg + curl

Related

• Fixes #19062 (PR #20591) • Similar issues: Reddit thread on STT not triggering • Possibly incomplete fix: PR #20591 may not have fully enabled applyMediaUnderstanding for all Telegram inbound paths

Steps to reproduce

. Configure tools.media.audio with OpenAI Whisper 2. Send a voice note to a Telegram bot running OpenClaw 2026.4.26 3. Observe: no transcript in the agent's message context 4. Check logs: no audio transcription events

Expected behavior

Audio should get transcribed

Actual behavior

Audio doesnt

OpenClaw version

2026.4.26

Operating system

MacOS (arm64)

Install method

No response

Model

haiku

Provider / routing chain

openclaw

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The most likely fix is to review and adjust the implementation of PR #20591 to ensure applyMediaUnderstanding is correctly called for all Telegram inbound voice notes.

Guidance

  • Verify that the applyMediaUnderstanding function is being called for Telegram voice notes by checking the gateway logs for any related errors or warnings.
  • Review the code changes made in PR #20591 to ensure that the unified message/channel_post handling pipeline correctly triggers audio transcription for voice notes.
  • Test the transcription functionality with different types of audio files and Telegram configurations to isolate the issue.
  • Consider adding debug logging to the applyMediaUnderstanding function to confirm whether it is being called and to identify any potential issues.

Example

No code snippet is provided due to the lack of specific implementation details in the issue.

Notes

The issue may be related to an incomplete fix in PR #20591, and resolving this will likely require reviewing and adjusting the code changes made in that pull request.

Recommendation

Apply workaround: Review and adjust the implementation of PR #20591 to ensure correct audio transcription handling for Telegram voice notes, as the current configuration and setup seem to be correct but the transcription is not being triggered.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Audio should get transcribed

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Telegram voice notes not auto-transcribed despite unified inbound handling fix (#20591) [1 comments, 2 participants]