openclaw - ✅(Solved) Fix [Bug] Telegram DM voice-note transcription silently fails in 4.5: allMedia[n].path is undefined, normalizeAttachments filters out all audio attachments [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62496Fetched 2026-04-08 03:03:30
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

In OpenClaw 2026.4.5, Telegram DM voice messages arrive as raw <media:audio> placeholders with no transcription attempt, despite tools.media.audio being correctly configured and the fix from #61008 being present in the dist.

Error Message

So transcribeFirstAudio receives an empty attachment list and returns immediately — no transcription, no error, no log (errors are caught and only emitted via logVerbose, which is off by default).

Root Cause

Root cause (traced in dist)

PR fix notes

PR #63278: chore(telegram): diagnostic logging for voice DM transcription pipeline (#62496)

Description (problem / solution / changelog)

Summary

Adds diagnostic logging (info/warn level, NOT verbose-only) at every step of the Telegram voice DM transcription pipeline to identify why voice transcription silently fails (#62496).

This is NOT a fix PR. Root cause is not yet identified. This PR adds structured logging to trace the exact failure point.

What we know

  • The .ogg file IS saved to ~/.openclaw/media/inbound/ every time (download succeeds)
  • whisper-cli works manually on the saved file
  • The audio transcription pipeline is NEVER triggered (zero logs even in debug mode)
  • The agent receives <media:audio> placeholder with no transcription

Diagnostic logging added

bot-message-context.ts (buildTelegramMessageContext — pre-body guards)

  • diag:voice: buildTelegramMessageContext entry — logs chatId, isGroup, allMediaLength at function entry
  • diag:voice: baseAccess blocked — logs which guard (group-disabled, topic-disabled, or allowFrom override) rejected the message
  • diag:voice: requireTopic blocked DM — logs when requireTopic=true blocks a DM without a topic
  • diag:voice: enforceTelegramDmAccess blocked — logs when DM access enforcement (pairing/allowlist/disabled) rejects
  • diag:voice: all guards passed, calling resolveTelegramInboundBody — confirms all 4 pre-body guards passed

bot-handlers.runtime.ts (processInboundMessage)

  • diag:voice: resolveMedia threw — logged if resolveMedia throws (catch path)
  • diag:voice: resolveMedia result — logs path, contentType, whether media is null
  • diag:voice: allMedia constructed — logs array length and content types

bot-message-context.body.ts (resolveTelegramInboundBody)

  • diag:voice: hasAudio computed — logs hasAudio flag and all media content types
  • diag:voice: needsPreflightTranscription evaluated — logs the flag AND every condition feeding into it (hasAudio, hasUserText, isGroup, requireMention, mentionRegexCount, disableAudioPreflight, senderAllowedForAudioPreflight)
  • diag:voice: calling transcribeFirstAudio — logs media paths/types before the call
  • diag:voice: transcribeFirstAudio completed — logs whether transcript was produced and its length
  • diag:voice: transcribeFirstAudio failed — logs error if transcription throws

Cross-validation of pre-body guards

Review found 4 return-null paths before resolveTelegramInboundBody with zero diagnostic logs. Analysis confirms:

  • For a standard DM from an allowlisted sender, all 4 guards pass (no silent drop)
  • The guards only block in edge cases: explicitly disabled chat/topic, per-DM allowFrom mismatch, requireTopic=true without topic, or dmPolicy=disabled
  • However, the lack of diag:voice: logs at these points was a real diagnostic gap — now covered

Cleanup

All diagnostic logs are tagged with [DIAG #62496] comments and prefixed with diag:voice: for easy grep removal after root cause is identified.

Closes: N/A (diagnostic only) Related: #62496

Changed files

  • extensions/telegram/src/bot-handlers.runtime.ts (modified, +18/-0)
  • extensions/telegram/src/bot-message-context.body.ts (modified, +38/-0)
  • extensions/telegram/src/bot-message-context.ts (modified, +26/-0)

PR #66556: fix(telegram): filter undefined paths to prevent voice transcription failure

Description (problem / solution / changelog)

Fixes #62496

When Telegram voice notes have undefined paths in allMedia array, the transcription preflight fails because normalizeAttachments filters out entries where both path and url are undefined, causing silent transcription failure.

Root cause: allMedia.map((m) => m.path) creates array with undefined values when voice note path is not set, then normalizeAttachments drops all attachments.

Changes

  • Filter undefined paths with .filter(Boolean) before creating MediaPaths array
  • Apply fix to both bot-message-context.body.ts and bot-message-context.session.ts
  • Add test case for handling voice messages with undefined paths gracefully

Impact

Telegram voice transcription now works even when some media paths are undefined

Testing

✅ All existing tests pass ✅ New test case validates the fix

Changed files

  • extensions/telegram/src/bot-message-context.body.ts (modified, +1/-1)
  • extensions/telegram/src/bot-message-context.session.ts (modified, +2/-2)

Code Example

if (hasAudio && !hasUserText && (!isGroup || requireMention && ...)) {
  preflightTranscript = await transcribeFirstAudio({ ctx: {
    MediaPaths: allMedia.length > 0 ? allMedia.map((m) => m.path) : void 0,
    MediaTypes: allMedia.length > 0 ? allMedia.map((m) => m.contentType).filter(Boolean) : void 0
  }, cfg, agentDir: void 0 });
}

---

MediaPath: contextMedia.length > 0 ? contextMedia[0]?.path : void 0,
MediaPaths: contextMedia.length > 0 ? contextMedia.map((m) => m.path) : void 0,

---

.filter((entry) => Boolean(entry.path?.trim() || entry.url?.trim()))
RAW_BUFFERClick to expand / collapse

Summary

In OpenClaw 2026.4.5, Telegram DM voice messages arrive as raw <media:audio> placeholders with no transcription attempt, despite tools.media.audio being correctly configured and the fix from #61008 being present in the dist.

Environment

  • OpenClaw: 2026.4.5 (3e72c03)
  • OS: macOS (Darwin 25.4.0, arm64)
  • Channel: Telegram DM
  • STT Provider: Groq (whisper-large-v3) — loaded and verified working via direct API call
  • Gateway mode: local

Root cause (traced in dist)

The fix from #61008 correctly widened the preflight condition in bot-message-context-ID6QScNo.js line 121:

if (hasAudio && !hasUserText && (!isGroup || requireMention && ...)) {
  preflightTranscript = await transcribeFirstAudio({ ctx: {
    MediaPaths: allMedia.length > 0 ? allMedia.map((m) => m.path) : void 0,
    MediaTypes: allMedia.length > 0 ? allMedia.map((m) => m.contentType).filter(Boolean) : void 0
  }, cfg, agentDir: void 0 });
}

However, allMedia[n].path is undefined for Telegram voice notes. This is confirmed in bot-message-context-ID6QScNo.js around lines 340–390, where contextMedia objects are built with:

MediaPath: contextMedia.length > 0 ? contextMedia[0]?.path : void 0,
MediaPaths: contextMedia.length > 0 ? contextMedia.map((m) => m.path) : void 0,

When contextMedia[n].path is undefined, normalizeAttachments() in runner-Bo7fJw79.js filters out the entry:

.filter((entry) => Boolean(entry.path?.trim() || entry.url?.trim()))

So transcribeFirstAudio receives an empty attachment list and returns immediately — no transcription, no error, no log (errors are caught and only emitted via logVerbose, which is off by default).

Why path is undefined

For Telegram voice/audio messages, the media object in allMedia has contentType set (e.g. audio/ogg; codecs=opus) but path is void 0. The file is saved to media/inbound/ correctly (confirmed by checking the filesystem), but the local path is not being written back into the allMedia entry's path field before buildTelegramMessageContext passes it to the preflight step.

Symptoms

  • .ogg voice file saved to ~/.openclaw/media/inbound/
  • Agent receives <media:audio> tag ✅
  • Zero audio/transcription logs in gateway output ❌
  • No errors (silently swallowed by catch + logVerbose) ❌
  • Direct Groq API call with the same .ogg file succeeds ✅
  • Gateway restart: no effect ❌

Relationship to existing issues

  • #61008 fixed the preflight trigger condition (DM now enters the transcription path) ✅
  • #56010 identified MediaPath / MediaPaths not being populated — this is the same remaining gap
  • The two bugs are independent; #61008 is fixed in 4.5, the attachment path mapping is not

Expected behavior

When a voice note is received in a Telegram DM, the local saved path should be available in allMedia[n].path so transcribeFirstAudio can locate and transcribe the file.

extent analysis

TL;DR

The issue can be fixed by updating the buildTelegramMessageContext function to populate the path field in the allMedia entry before passing it to the preflight step.

Guidance

  • Verify that the buildTelegramMessageContext function is correctly saving the local path of the voice file to the media/inbound/ directory.
  • Update the buildTelegramMessageContext function to populate the path field in the allMedia entry with the local path of the saved voice file.
  • Check the normalizeAttachments function in runner-Bo7fJw79.js to ensure it is not filtering out the attachment due to an empty path field.
  • Test the updated code with a direct Groq API call to verify that the transcription is working correctly.

Example

// Example of how to update the buildTelegramMessageContext function
const buildTelegramMessageContext = (contextMedia) => {
  // ...
  const mediaPath = `${mediaDir}/inbound/${contextMedia[0].fileName}`;
  contextMedia[0].path = mediaPath;
  // ...
}

Notes

The issue is specific to Telegram voice messages and is caused by the path field not being populated in the allMedia entry. The fix should only affect the buildTelegramMessageContext function and should not introduce any regressions.

Recommendation

Apply a workaround by updating the buildTelegramMessageContext function to populate the path field in the allMedia entry. This will allow the transcription to work correctly for Telegram voice messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a voice note is received in a Telegram DM, the local saved path should be available in allMedia[n].path so transcribeFirstAudio can locate and transcribe the file.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING