openclaw - 💡(How to fix) Fix Telegram outbound audio replies lack a way to force sendAudio/sendVoice from normal assistant MEDIA replies [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#53673Fetched 2026-04-08 01:25:00
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

OpenClaw clearly supports Telegram audio sending internally, including sendVoice, asVoice, audioAsVoice, and the [[audio_as_voice]] reply tag. However, from a normal assistant reply path using MEDIA: / generic media reply payloads, there does not appear to be a reliable way to force Telegram to send media as sendAudio or sendVoice.

In practice, this means an assistant can generate an MP3 successfully, but the normal reply surface may only emit a text/file reference instead of a playable Telegram audio attachment, and there is no exposed way in standard assistant replies to specify media kind.

Root Cause

For Telegram, there is an important distinction between:

  • sendAudio → regular playable audio attachment / music player UI
  • sendVoice → round voice-note bubble UI
  • sendDocument → generic file card

OpenClaw already knows about this distinction internally, but the generic outbound reply payload appears too lossy to preserve it.

RAW_BUFFERClick to expand / collapse

Summary

OpenClaw clearly supports Telegram audio sending internally, including sendVoice, asVoice, audioAsVoice, and the [[audio_as_voice]] reply tag. However, from a normal assistant reply path using MEDIA: / generic media reply payloads, there does not appear to be a reliable way to force Telegram to send media as sendAudio or sendVoice.

In practice, this means an assistant can generate an MP3 successfully, but the normal reply surface may only emit a text/file reference instead of a playable Telegram audio attachment, and there is no exposed way in standard assistant replies to specify media kind.

Why this matters

For Telegram, there is an important distinction between:

  • sendAudio → regular playable audio attachment / music player UI
  • sendVoice → round voice-note bubble UI
  • sendDocument → generic file card

OpenClaw already knows about this distinction internally, but the generic outbound reply payload appears too lossy to preserve it.

What I verified locally

Installed OpenClaw docs/code on 2026.03.x show:

  • Telegram docs explicitly mention:
    • default audio file behavior
    • [[audio_as_voice]] to force voice-note send
    • message action example using action: "send", channel: "telegram", media, asVoice: true
  • Built code includes:
    • replyWithAudio
    • replyWithDocument
    • replyWithVoice
    • asVoice
    • audioAsVoice
    • [[audio_as_voice]]
  • Built code also includes attachment-kind detection helpers such as:
    • resolveAttachmentKind
    • isAudioAttachment
    • isAudioFileName

So the capability exists in pieces.

The gap

The shared outbound reply payload used by ordinary assistant replies appears to only preserve something like:

  • text
  • mediaUrl / mediaUrls
  • replyToId

But not:

  • media kind
  • MIME hint
  • asVoice
  • audioAsVoice in a way that is accessible from ordinary MEDIA: reply usage for regular audio files

As a result, the Telegram sender cannot reliably distinguish:

  • audio attachment
  • voice note
  • generic document

for ordinary assistant-delivered media replies.

Reproduction

  1. Generate an MP3 file locally in an assistant session.
  2. Reply with a normal assistant media reference such as:
    • MEDIA:./morning-briefing-2026-03-24.mp3
  3. Expectation:
    • Telegram should receive a playable audio attachment, ideally via sendAudio for MP3.
  4. Actual result:
    • Depending on surface/runtime, the user may get only a text/path reference or a non-playable/generic file-style result.
    • There is no obvious first-class way from the standard assistant reply surface to force sendAudio.

Expected behavior

One of these should work from normal assistant replies:

Option A: richer reply payload

Allow assistant replies to preserve media metadata, e.g.

  • mediaKind: "audio" | "voice" | "document" | "image" | "video"
  • MIME/content type
  • audioAsVoice: true

Option B: reliable Telegram inference

When the outbound media path sees:

  • .mp3 / .m4a → use sendAudio
  • .ogg / .opus with voice intent → use sendVoice

Option C: explicit directive support for audio file vs voice note

Support reply-level directives that survive normalization, for example:

  • [[audio_as_voice]] for voice-note send
  • perhaps a parallel explicit directive for audio-file send if needed

Suggested fix

At minimum, Telegram outbound media delivery should be able to distinguish between:

  • audio file attachment
  • voice note
  • generic document

for ordinary assistant-produced media replies.

The cleanest fix is probably to extend the shared outbound payload model so it carries media type / send intent explicitly instead of collapsing everything to generic mediaUrl.

Nice-to-have

If the message action path already supports this correctly (action: "send", media, asVoice: true), it would be great if normal assistant reply paths had an equivalent way to express the same thing without requiring access to a separate message-action tool surface.

extent analysis

Fix Plan

To fix the issue, we need to extend the shared outbound payload model to carry media type/send intent explicitly. Here are the steps:

  • Extend the MEDIA: reply payload to include media kind and MIME type:
    • Add mediaKind field to the payload (e.g., "audio", "voice", "document")
    • Add mimeType field to the payload (e.g., "audio/mpeg", "audio/ogg")
  • Update the Telegram sender to use the mediaKind and mimeType fields to determine the send intent:
    • Use sendAudio for audio files (e.g., MP3, M4A)
    • Use sendVoice for voice notes (e.g., OGG, OPUS)
    • Use sendDocument for generic documents
  • Add support for explicit directives in the reply payload:
    • [[audio_as_voice]] for voice-note send
    • [[audio_as_file]] for audio-file send (if needed)

Example code:

// Extended MEDIA: reply payload
const mediaPayload = {
  mediaUrl: './morning-briefing-2026-03-24.mp3',
  mediaKind: 'audio',
  mimeType: 'audio/mpeg'
};

// Updated Telegram sender
if (mediaPayload.mediaKind === 'audio' && mediaPayload.mimeType === 'audio/mpeg') {
  // Use sendAudio
  telegramSender.sendAudio(mediaPayload.mediaUrl);
} else if (mediaPayload.mediaKind === 'voice' && mediaPayload.mimeType === 'audio/ogg') {
  // Use sendVoice
  telegramSender.sendVoice(mediaPayload.mediaUrl);
} else {
  // Use sendDocument
  telegramSender.sendDocument(mediaPayload.mediaUrl);
}

// Support for explicit directives
if (mediaPayload.directives && mediaPayload.directives.includes('[[audio_as_voice]]')) {
  // Use sendVoice
  telegramSender.sendVoice(mediaPayload.mediaUrl);
} else if (mediaPayload.directives && mediaPayload.directives.includes('[[audio_as_file]]')) {
  // Use sendAudio
  telegramSender.sendAudio(mediaPayload.mediaUrl);
}

Verification

To verify the fix, test the following scenarios:

  • Send an MP3 file using the MEDIA: reply payload with mediaKind set to "audio" and mimeType set to "audio/mpeg". Verify that the file is sent as a playable audio attachment using sendAudio.
  • Send an OGG file using the MEDIA: reply payload with mediaKind set to "voice" and mimeType set to "audio/ogg". Verify that the file is sent as a voice note using sendVoice.
  • Send a document using the MEDIA: reply payload with mediaKind set to "document". Verify that the file is sent as a generic document using

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

One of these should work from normal assistant replies:

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING