openclaw - 💡(How to fix) Fix Telegram outbound audio replies lack a way to force sendAudio/sendVoice from normal assistant MEDIA replies [1 participants]

openclaw2026-03-24 12:36:29

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#53673•Fetched 2026-04-08 01:25:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

5toCode

Participants

5toCode

OpenClaw clearly supports Telegram audio sending internally, including sendVoice, asVoice, audioAsVoice, and the [[audio_as_voice]] reply tag. However, from a normal assistant reply path using MEDIA: / generic media reply payloads, there does not appear to be a reliable way to force Telegram to send media as sendAudio or sendVoice.

In practice, this means an assistant can generate an MP3 successfully, but the normal reply surface may only emit a text/file reference instead of a playable Telegram audio attachment, and there is no exposed way in standard assistant replies to specify media kind.

Root Cause

For Telegram, there is an important distinction between:

sendAudio → regular playable audio attachment / music player UI
sendVoice → round voice-note bubble UI
sendDocument → generic file card

OpenClaw already knows about this distinction internally, but the generic outbound reply payload appears too lossy to preserve it.

RAW_BUFFERClick to expand / collapse

Summary

Why this matters

For Telegram, there is an important distinction between:

sendAudio → regular playable audio attachment / music player UI
sendVoice → round voice-note bubble UI
sendDocument → generic file card

OpenClaw already knows about this distinction internally, but the generic outbound reply payload appears too lossy to preserve it.

What I verified locally

Installed OpenClaw docs/code on 2026.03.x show:

Telegram docs explicitly mention:
- default audio file behavior
- [[audio_as_voice]] to force voice-note send
- message action example using action: "send", channel: "telegram", media, asVoice: true
Built code includes:
- replyWithAudio
- replyWithDocument
- replyWithVoice
- asVoice
- audioAsVoice
- [[audio_as_voice]]
Built code also includes attachment-kind detection helpers such as:
- resolveAttachmentKind
- isAudioAttachment
- isAudioFileName

So the capability exists in pieces.

The gap

The shared outbound reply payload used by ordinary assistant replies appears to only preserve something like:

text
mediaUrl / mediaUrls
replyToId

But not:

media kind
MIME hint
asVoice
audioAsVoice in a way that is accessible from ordinary MEDIA: reply usage for regular audio files

As a result, the Telegram sender cannot reliably distinguish:

audio attachment
voice note
generic document

for ordinary assistant-delivered media replies.

Reproduction

Generate an MP3 file locally in an assistant session.
Reply with a normal assistant media reference such as:
- MEDIA:./morning-briefing-2026-03-24.mp3
Expectation:
- Telegram should receive a playable audio attachment, ideally via sendAudio for MP3.
Actual result:
- Depending on surface/runtime, the user may get only a text/path reference or a non-playable/generic file-style result.
- There is no obvious first-class way from the standard assistant reply surface to force sendAudio.

Expected behavior

One of these should work from normal assistant replies:

Option A: richer reply payload

Allow assistant replies to preserve media metadata, e.g.

mediaKind: "audio" | "voice" | "document" | "image" | "video"
MIME/content type
audioAsVoice: true

Option B: reliable Telegram inference

When the outbound media path sees:

.mp3 / .m4a → use sendAudio
.ogg / .opus with voice intent → use sendVoice

Option C: explicit directive support for audio file vs voice note

Support reply-level directives that survive normalization, for example:

[[audio_as_voice]] for voice-note send
perhaps a parallel explicit directive for audio-file send if needed

Suggested fix

At minimum, Telegram outbound media delivery should be able to distinguish between:

audio file attachment
voice note
generic document

for ordinary assistant-produced media replies.

The cleanest fix is probably to extend the shared outbound payload model so it carries media type / send intent explicitly instead of collapsing everything to generic mediaUrl.

Nice-to-have

If the message action path already supports this correctly (action: "send", media, asVoice: true), it would be great if normal assistant reply paths had an equivalent way to express the same thing without requiring access to a separate message-action tool surface.

extent analysis

Fix Plan

To fix the issue, we need to extend the shared outbound payload model to carry media type/send intent explicitly. Here are the steps:

Extend the MEDIA: reply payload to include media kind and MIME type:
- Add mediaKind field to the payload (e.g., "audio", "voice", "document")
- Add mimeType field to the payload (e.g., "audio/mpeg", "audio/ogg")
Update the Telegram sender to use the mediaKind and mimeType fields to determine the send intent:
- Use sendAudio for audio files (e.g., MP3, M4A)
- Use sendVoice for voice notes (e.g., OGG, OPUS)
- Use sendDocument for generic documents
Add support for explicit directives in the reply payload:
- [[audio_as_voice]] for voice-note send
- [[audio_as_file]] for audio-file send (if needed)

Example code:

// Extended MEDIA: reply payload
const mediaPayload = {
  mediaUrl: './morning-briefing-2026-03-24.mp3',
  mediaKind: 'audio',
  mimeType: 'audio/mpeg'
};

// Updated Telegram sender
if (mediaPayload.mediaKind === 'audio' && mediaPayload.mimeType === 'audio/mpeg') {
  // Use sendAudio
  telegramSender.sendAudio(mediaPayload.mediaUrl);
} else if (mediaPayload.mediaKind === 'voice' && mediaPayload.mimeType === 'audio/ogg') {
  // Use sendVoice
  telegramSender.sendVoice(mediaPayload.mediaUrl);
} else {
  // Use sendDocument
  telegramSender.sendDocument(mediaPayload.mediaUrl);
}

// Support for explicit directives
if (mediaPayload.directives && mediaPayload.directives.includes('[[audio_as_voice]]')) {
  // Use sendVoice
  telegramSender.sendVoice(mediaPayload.mediaUrl);
} else if (mediaPayload.directives && mediaPayload.directives.includes('[[audio_as_file]]')) {
  // Use sendAudio
  telegramSender.sendAudio(mediaPayload.mediaUrl);
}

Verification

To verify the fix, test the following scenarios:

Send an MP3 file using the MEDIA: reply payload with mediaKind set to "audio" and mimeType set to "audio/mpeg". Verify that the file is sent as a playable audio attachment using sendAudio.
Send an OGG file using the MEDIA: reply payload with mediaKind set to "voice" and mimeType set to "audio/ogg". Verify that the file is sent as a voice note using sendVoice.
Send a document using the MEDIA: reply payload with mediaKind set to "document". Verify that the file is sent as a generic document using

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

One of these should work from normal assistant replies:

#ISR setup #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Telegram outbound audio replies lack a way to force sendAudio/sendVoice from normal assistant MEDIA replies [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Why this matters

What I verified locally

The gap

Reproduction

Expected behavior

Option A: richer reply payload

Option B: reliable Telegram inference

Option C: explicit directive support for audio file vs voice note

Suggested fix

Nice-to-have

extent analysis

Fix Plan

Verification

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Telegram outbound audio replies lack a way to force sendAudio/sendVoice from normal assistant MEDIA replies [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Why this matters

What I verified locally

The gap

Reproduction

Expected behavior

Option A: richer reply payload

Option B: reliable Telegram inference

Option C: explicit directive support for audio file vs voice note

Suggested fix

Nice-to-have

extent analysis

Fix Plan

Verification

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING