openclaw - ✅(Solved) Fix [Bug] Telegram voice messages: media understanding audio transcription pipeline never triggered [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55052Fetched 2026-04-08 01:33:11
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×4

Telegram voice messages are received and downloaded successfully, but the applyMediaUnderstanding audio transcription pipeline is never invoked. The agent receives <media:audio> with the raw .ogg file attached but no automatic transcription occurs.

This affects all channels (confirmed on both Telegram forum topics and WhatsApp groups), not just Telegram.

Root Cause

Telegram voice messages are received and downloaded successfully, but the applyMediaUnderstanding audio transcription pipeline is never invoked. The agent receives <media:audio> with the raw .ogg file attached but no automatic transcription occurs.

This affects all channels (confirmed on both Telegram forum topics and WhatsApp groups), not just Telegram.

Fix Action

Workaround

Agent manually calls OpenAI transcription API via curl for each voice message.

PR fix notes

PR #55323: fix: add audio capability to openai-codex media understanding provide

Description (problem / solution / changelog)

The openai-codex provider (OAuth) was missing the audio capability and
transcribeAudio handler, so Pro plan users could not use audio transcription. Add both to match the regular openai provider, reusing the same Whisper API
function.

Closes #55237 Related #55052

Summary

  • Problem: openai-codex media understanding provider only registered ["image"]
    capability, missing "audio" and transcribeAudio handler
  • Why it matters: OpenAI Pro plan (OAuth) users cannot transcribe audio — forced to use
    API key provider or skip transcription entirely
  • What changed: Added "audio" to capabilities and transcribeAudio: transcribeOpenAiAudio to the codex provider; updated test assertion
  • What did NOT change: No new functions — reuses the existing transcribeOpenAiAudio that calls /v1/audio/transcriptions

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Integrations

Linked Issue/PR

  • Closes #55237
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: openaiCodexMediaUnderstandingProvider was defined with capabilities: ["image"] only and no transcribeAudio handler
  • Missing detection / guardrail: No test asserted audio capability for codex provider
  • Prior context: The codex media provider was likely added with image-only initially and
    audio was never wired up
  • Why this regressed now: Not a regression — audio was never supported for openai-codex

Regression Test Plan (if applicable)

  • Existing coverage already sufficient
  • Target test or file: extensions/openai/index.test.ts
  • Scenario the test should lock in: Codex media provider registers ["image", "audio"]
    capabilities
  • Existing test that already covers this: Updated existing assertion at line 196

User-visible / Behavior Changes

  • openai-codex provider now supports audio transcription via Whisper API

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No — same Whisper endpoint, just now reachable via codex provider
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS 15.x (arm64)
  • Runtime/container: Node 24
  • Model/provider: openai-codex

Steps

  1. Configure openai-codex as provider with audio enabled
  2. Send audio attachment through any channel

Expected

  • Audio is transcribed via Whisper API

Actual (before fix)

  • Audio transcription skipped — codex provider had no audio capability

Evidence

  • Failing test/log before + passing after

Human Verification (required)

  • Verified scenarios: Unit tests pass, format check passes
  • Edge cases checked: Provider normalization preserves openai-codex ID correctly
  • What you did not verify: Live audio transcription with real OAuth credentials

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

None

Changed files

  • .agents/skills/openclaw-parallels-smoke/SKILL.md (modified, +6/-0)
  • .github/workflows/ci-bun.yml (modified, +3/-1)
  • .github/workflows/docker-release.yml (modified, +0/-2)
  • AGENTS.md (modified, +8/-4)
  • CHANGELOG.md (modified, +39/-0)
  • apps/android/app/build.gradle.kts (modified, +2/-2)
  • apps/ios/Config/Version.xcconfig (modified, +3/-3)
  • apps/ios/README.md (modified, +3/-3)
  • apps/macos/Sources/OpenClaw/Resources/Info.plist (modified, +2/-2)
  • apps/macos/Sources/OpenClaw/TalkModeRuntime.swift (modified, +30/-10)
  • apps/macos/Tests/OpenClawIPCTests/TalkModeRuntimeSpeechTests.swift (modified, +9/-0)
  • apps/shared/OpenClawKit/Sources/OpenClawKit/TalkSystemSpeechSynthesizer.swift (modified, +34/-3)
  • apps/shared/OpenClawKit/Tests/OpenClawKitTests/TalkSystemSpeechSynthesizerTests.swift (added, +44/-0)
  • docs/.generated/config-baseline.json (modified, +470/-1471)
  • docs/.generated/config-baseline.jsonl (modified, +71/-184)
  • docs/.generated/plugin-sdk-api-baseline.json (modified, +330/-141)
  • docs/.generated/plugin-sdk-api-baseline.jsonl (modified, +136/-115)
  • docs/channels/bluebubbles.md (modified, +2/-1)
  • docs/channels/googlechat.md (modified, +1/-0)
  • docs/channels/msteams.md (modified, +2/-1)
  • docs/cli/gateway.md (modified, +2/-1)
  • docs/cli/index.md (modified, +9/-1)
  • docs/cli/models.md (modified, +9/-0)
  • docs/concepts/memory.md (modified, +7/-3)
  • docs/concepts/oauth.md (modified, +32/-3)
  • docs/gateway/authentication.md (modified, +20/-0)
  • docs/gateway/cli-backends.md (modified, +32/-7)
  • docs/help/faq.md (modified, +8/-3)
  • docs/help/testing.md (modified, +34/-9)
  • docs/install/development-channels.md (modified, +3/-3)
  • docs/plugins/architecture.md (modified, +9/-8)
  • docs/plugins/building-plugins.md (modified, +14/-13)
  • docs/plugins/manifest.md (modified, +39/-2)
  • docs/plugins/sdk-overview.md (modified, +31/-0)
  • docs/providers/anthropic.md (modified, +119/-3)
  • docs/reference/memory-config.md (modified, +14/-7)
  • docs/reference/secretref-credential-surface.md (modified, +3/-6)
  • docs/reference/secretref-user-supplied-credentials-matrix.json (modified, +6/-27)
  • docs/reference/test.md (modified, +1/-1)
  • docs/reference/wizard.md (modified, +1/-1)
  • docs/start/wizard-cli-reference.md (modified, +4/-1)
  • docs/tools/acp-agents.md (modified, +29/-5)
  • docs/tools/apply-patch.md (modified, +3/-2)
  • docs/tools/browser.md (modified, +38/-0)
  • docs/tools/exec.md (modified, +6/-4)
  • docs/tools/plugin.md (modified, +1/-0)
  • docs/tools/tts.md (modified, +56/-52)
  • docs/tts.md (modified, +56/-52)
  • extensions/acpx/openclaw.plugin.json (modified, +7/-3)
  • extensions/acpx/package.json (modified, +1/-1)
  • extensions/acpx/skills/acp-router/SKILL.md (modified, +38/-13)
  • extensions/acpx/src/config.test.ts (modified, +8/-0)
  • extensions/acpx/src/config.ts (modified, +64/-208)
  • extensions/acpx/src/runtime-internals/events.ts (modified, +18/-26)
  • extensions/acpx/src/runtime-internals/mcp-agent-command.test.ts (modified, +42/-0)
  • extensions/acpx/src/runtime-internals/mcp-agent-command.ts (modified, +16/-6)
  • extensions/acpx/src/runtime.test.ts (modified, +25/-0)
  • extensions/acpx/src/runtime.ts (modified, +3/-0)
  • extensions/amazon-bedrock/package.json (modified, +1/-1)
  • extensions/anthropic/cli-backend.ts (added, +59/-0)
  • extensions/anthropic/cli-migration.test.ts (added, +82/-0)
  • extensions/anthropic/cli-migration.ts (added, +131/-0)
  • extensions/anthropic/cli-shared.ts (added, +84/-0)
  • extensions/anthropic/index.ts (modified, +85/-2)
  • extensions/anthropic/openclaw.plugin.json (modified, +16/-2)
  • extensions/anthropic/package.json (modified, +1/-1)
  • extensions/bluebubbles/channel-config-api.ts (added, +1/-0)
  • extensions/bluebubbles/package.json (modified, +3/-6)
  • extensions/bluebubbles/src/actions.test.ts (modified, +33/-1)
  • extensions/bluebubbles/src/actions.ts (modified, +12/-6)
  • extensions/bluebubbles/src/channel-shared.ts (modified, +2/-3)
  • extensions/bluebubbles/src/config-schema.ts (modified, +7/-1)
  • extensions/bluebubbles/src/config-ui-hints.ts (added, +12/-0)
  • extensions/bluebubbles/src/monitor-processing.ts (modified, +13/-0)
  • extensions/bluebubbles/src/monitor.test.ts (modified, +47/-0)
  • extensions/bluebubbles/src/send.test.ts (modified, +4/-23)
  • extensions/bluebubbles/src/setup-core.ts (modified, +15/-12)
  • extensions/bluebubbles/src/test-harness.ts (modified, +21/-13)
  • extensions/brave/openclaw.plugin.json (modified, +3/-0)
  • extensions/brave/package.json (modified, +1/-1)
  • extensions/brave/web-search-provider.ts (added, +1/-0)
  • extensions/browser/index.test.ts (added, +90/-0)
  • extensions/browser/index.ts (added, +28/-0)
  • extensions/browser/openclaw.plugin.json (added, +9/-0)
  • extensions/browser/package.json (added, +12/-0)
  • extensions/browser/runtime-api.ts (added, +10/-0)
  • extensions/browser/src/browser-runtime.ts (added, +87/-0)
  • extensions/browser/src/browser-tool.actions.ts (renamed, +14/-8)
  • extensions/browser/src/browser-tool.schema.ts (renamed, +1/-1)
  • extensions/browser/src/browser-tool.test.ts (renamed, +15/-11)
  • extensions/browser/src/browser-tool.ts (renamed, +27/-27)
  • extensions/browser/src/browser/bridge-auth-registry.ts (renamed, +0/-0)
  • extensions/browser/src/browser/bridge-server.auth.test.ts (renamed, +0/-0)
  • extensions/browser/src/browser/bridge-server.ts (renamed, +0/-0)
  • extensions/browser/src/browser/browser-utils.test.ts (renamed, +0/-0)
  • extensions/browser/src/browser/cdp-proxy-bypass.test.ts (renamed, +0/-0)
  • extensions/browser/src/browser/cdp-proxy-bypass.ts (renamed, +0/-0)
  • extensions/browser/src/browser/cdp-timeouts.test.ts (renamed, +0/-0)
  • extensions/browser/src/browser/cdp-timeouts.ts (renamed, +0/-0)
  • extensions/browser/src/browser/cdp.helpers.ts (renamed, +0/-0)

PR #55788: Fix/OpenAI codex audio media understanding

Description (problem / solution / changelog)

The openai-codex provider (OAuth) was missing the audio capability and
transcribeAudio handler, so Pro plan users could not use audio transcription. Add both to match the regular openai provider, reusing the same Whisper API
function.

Closes #55237 Related #55052

Summary

  • Problem: openai-codex media understanding provider only registered ["image"]
    capability, missing "audio" and transcribeAudio handler
  • Why it matters: OpenAI Pro plan (OAuth) users cannot transcribe audio — forced to use
    API key provider or skip transcription entirely
  • What changed: Added "audio" to capabilities and transcribeAudio: transcribeOpenAiAudio to the codex provider; updated test assertion
  • What did NOT change: No new functions — reuses the existing transcribeOpenAiAudio that calls /v1/audio/transcriptions

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Integrations

Linked Issue/PR

  • Closes #55237
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: openaiCodexMediaUnderstandingProvider was defined with capabilities: ["image"] only and no transcribeAudio handler
  • Missing detection / guardrail: No test asserted audio capability for codex provider
  • Prior context: The codex media provider was likely added with image-only initially and
    audio was never wired up
  • Why this regressed now: Not a regression — audio was never supported for openai-codex

Regression Test Plan (if applicable)

  • Existing coverage already sufficient
  • Target test or file: extensions/openai/index.test.ts
  • Scenario the test should lock in: Codex media provider registers ["image", "audio"]
    capabilities
  • Existing test that already covers this: Updated existing assertion at line 196

User-visible / Behavior Changes

  • openai-codex provider now supports audio transcription via Whisper API

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No — same Whisper endpoint, just now reachable via codex provider
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS 15.x (arm64)
  • Runtime/container: Node 24
  • Model/provider: openai-codex

Steps

  1. Configure openai-codex as provider with audio enabled
  2. Send audio attachment through any channel

Expected

  • Audio is transcribed via Whisper API

Actual (before fix)

  • Audio transcription skipped — codex provider had no audio capability

Evidence

  • Failing test/log before + passing after

Human Verification (required)

  • Verified scenarios: Unit tests pass, format check passes
  • Edge cases checked: Provider normalization preserves openai-codex ID correctly
  • What you did not verify: Live audio transcription with real OAuth credentials

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

None

Changed files

  • extensions/openai/index.test.ts (modified, +2/-1)
  • extensions/openai/media-understanding-provider.ts (modified, +9/-1)
  • src/media-understanding/defaults.ts (modified, +1/-0)

PR #61143: fix(signal): resolve contentType for voice notes when signal-cli omits it

Description (problem / solution / changelog)

Summary

  • Problem: Signal voice notes are saved to disk but the transcription pipeline never runs. signal-cli on Linux omits contentType on voice note attachments, leaving saveMediaBuffer unable to classify the audio (fileTypeFromBuffer fails on ADTS AAC, no filePath fallback). The file is saved without extension, MediaTypes falls back to application/octet-stream, isAudioAttachment() returns false, and selectAttachments exits with outcome: no-attachment — silently.
  • Why it matters: Completely blocks voice memo transcription on Signal for Linux deployments. tools.media.audio is configured but never triggers.
  • What changed: In fetchAttachment, run detectMime({ buffer, filePath: filename }) before calling saveMediaBuffer so the extension-based MIME lookup resolves audio/aac from the attachment filename. Also forward attachment.filename as originalFilename so saved files preserve the original extension on disk.
  • What did NOT change: No core media pipeline changes. No other channel adapters touched. No config schema changes.

AI-assisted: This fix was developed with Claude Code (Opus 4.6). Fully tested — see evidence below.

Dependency: This fix is necessary but not sufficient for end-to-end Signal voice transcription. OpenAI rejects .aac file extensions (returns 400 "Unsupported file format aac"), which is addressed by #61094. Both PRs must land for transcription to work. We verified the full chain locally with both fixes applied — see evidence.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #48614
  • Related #61094 (.aac.m4a remap — required for OpenAI provider acceptance)
  • Related #60421 (transcription errors silently swallowed at default log level)
  • Related #56010, #55052 (similar symptoms on Telegram)
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: fetchAttachment in extensions/signal/src/monitor.ts passes attachment.contentType ?? undefined to saveMediaBuffer. When signal-cli omits contentType (observed on Linux with signal-cli 0.14.1), saveMediaBuffer calls detectMime({ buffer, headerMime: undefined }) — no filePath, so the extension-based MIME fallback is impossible. fileTypeFromBuffer cannot detect ADTS-format AAC. Result: mime = undefined, file saved as bare UUID without extension, ctx.MediaTypes = ["application/octet-stream"], isAudioAttachment() returns false.
  • Missing detection / guardrail: No fallback to attachment.filename for MIME resolution. Matrix got this fix in v2026.3.28 (#55692 — forwarding originalFilename to saveMediaBuffer), but Signal was missed.
  • Contributing context: signal-cli on macOS consistently provides contentType: "audio/aac" on voice note attachments; Linux signal-cli 0.14.1 sometimes omits it. The SignalAttachment type already includes filename?: string but it was never used.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/signal/src/monitor/event-handler.inbound-context.test.ts
  • Scenario the test should lock in:
    1. Event handler threads resolved audio/aac contentType into MsgContext when fetchAttachment returns it (integration wiring test)
    2. detectMime({ buffer, filePath: "voice.aac" }) returns "audio/aac" when buffer sniffing fails (core mechanism test — proves bare filename is sufficient for getFileExtension/MIME_BY_EXT lookup)
  • Why this is the smallest reliable guardrail: fetchAttachment is private, so we test the two halves separately — the event handler's contentType threading (existing test pattern) and the detectMime filename-based resolution (new direct test).
  • Existing test that already covers this (if any): The existing "forwards all fetched attachments via MediaPaths/MediaTypes" test at line 255 covers the happy path but explicitly expects "application/octet-stream" for attachments without contentType — confirming the bug was baked into test expectations.

Evidence

Unit tests:

pnpm test:extension signal — 19 files, 169/169 passed
pnpm check — clean (lint, format, typecheck)
pnpm build — clean

Live end-to-end verification (with #61094 applied on test branch):

  • Built v2026.4.2 + this fix + #61094's .aac.m4a remap
  • Started gateway from fork against production workspace (~/.openclaw/)
  • Sent Signal voice note from phone
  • OpenClaw transcribed: "Signal audio transcription test with Ben Z's fix. Time is 10:15 p.m."
  • echoTranscript delivered transcript back to Signal chat

curl verification of the .aac rejection (why #61094 is needed):

$ curl https://api.openai.com/v1/audio/transcriptions -F file="@voice.aac" -F model="gpt-4o-mini-transcribe"
→ 400: "Unsupported file format aac"

$ curl https://api.openai.com/v1/audio/transcriptions -F "[email protected];filename=voice.m4a" -F model="gpt-4o-mini-transcribe"
→ 200: {"text": "Signal audio transcription test..."}

Human Verification (required)

  • Verified scenarios: Full Signal voice note → transcription → echo reply chain on Linux (Ubuntu, signal-cli 0.14.1, OpenAI gpt-4o-mini-transcribe). Sent 3 voice notes across test runs.
  • Edge cases checked: Attachment with contentType: undefined + filename: "voice.aac" (primary fix path). Verified detectMime resolves correctly with bare filename (no full path).
  • What I did not verify: Voice notes where both contentType AND filename are missing (falls back to pre-fix behavior — application/octet-stream). Other channels (Telegram, WhatsApp, Discord).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: detectMime called with bare filename instead of full path — getFileExtension uses path.extname which handles bare filenames correctly, verified by unit test.
    • Mitigation: Direct unit test for detectMime({ buffer, filePath: "voice.aac" })"audio/aac".
  • Risk: When both contentType and filename are missing, behavior is unchanged (falls through to undefined). No regression, but also no improvement for that edge case.
    • Mitigation: Documented in PR; would require adding voiceNote?: boolean to SignalAttachment type as a future enhancement.

Changed files

  • extensions/signal/src/monitor.ts (modified, +12/-2)
  • extensions/signal/src/monitor/event-handler.inbound-context.test.ts (modified, +46/-0)

Code Example

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "provider": "openai",
            "model": "gpt-4o-mini-transcribe",
            "language": "yue"
          }
        ]
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Telegram voice messages are received and downloaded successfully, but the applyMediaUnderstanding audio transcription pipeline is never invoked. The agent receives <media:audio> with the raw .ogg file attached but no automatic transcription occurs.

This affects all channels (confirmed on both Telegram forum topics and WhatsApp groups), not just Telegram.

Environment

  • OpenClaw: 2026.3.24 (also confirmed on 2026.3.23-2)
  • OS: macOS 15.3 (arm64), Mac mini
  • Install: pnpm (global)
  • Node: v24.14.0
  • Telegram: forum supergroup (topic 1 / General)

Config (correct per docs)

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "provider": "openai",
            "model": "gpt-4o-mini-transcribe",
            "language": "yue"
          }
        ]
      }
    }
  }
}
  • OpenAI API key is available to the gateway service (confirmed in launchd plist + openclaw models status)
  • OpenAI provider has audio capability registered (verified in dist bundle)
  • Manual curl to OpenAI transcription API with the same key + same .ogg file succeeds perfectly

Steps to Reproduce

  1. Send a voice message to a Telegram forum topic (or WhatsApp group)
  2. Voice file downloads successfully to ~/.openclaw/media/inbound/
  3. Agent receives <media:audio> as body text
  4. No transcription occurs — no {{Transcript}} set, no [Audio] block replacement

Expected Behavior

applyMediaUnderstandingIfNeeded() should detect MediaPath is set, invoke runCapability("audio"), and transcribe using the configured OpenAI model.

Actual Behavior

  • Gateway log shows zero audio/transcription/media-understanding entries around voice message receipt
  • Even with --verbose flag and OPENCLAW_DEBUG_TELEGRAM_INGRESS=1, no Telegram inbound voice processing log appears
  • WhatsApp shows inbound audio log ([whatsapp] Inbound message ... audio/ogg; codecs=opus) but also no transcription log
  • The applyMediaUnderstanding function is simply never called

Code Path Analysis

Traced through the minified dist bundle:

  1. Telegram handler: resolveMediaFileRef(msg) correctly includes msg.voice → file downloads OK
  2. Context building: buildTelegramInboundContextPayload sets MediaPath, MediaType, MediaPaths from allMedia
  3. Dispatch: dispatchTelegramMessagedispatchReplyWithBufferedBlockDispatcher → should reach getReplyFromConfig
  4. Media understanding gate: getReplyFromConfig calls applyMediaUnderstandingIfNeeded() which checks hasInboundMedia(ctx) — this should return true since MediaPath is set
  5. But: the audio transcription pipeline never executes. No log output at all, even in verbose mode.

Possibly Related

  • 2026.3.22 changelog: "Agents/inbound: lazy-load media and link understanding for plain-text turns" — this optimization may incorrectly classify voice-only messages (no text body) as "plain-text turns" and skip media understanding
  • GitHub issue #7899 (similar report from 2026-02-03)
  • GitHub issue #14374 (feature request, 2026-02-12)
  • Discord reports from 2026-02-15 and 2026-03-05

Workaround

Agent manually calls OpenAI transcription API via curl for each voice message.

extent analysis

Fix Plan

To fix the issue with the applyMediaUnderstanding audio transcription pipeline not being invoked, we need to modify the applyMediaUnderstandingIfNeeded function to correctly handle voice messages without text bodies.

Here are the steps:

  • Update the hasInboundMedia function to check for MediaPath and MediaType in the context payload.
  • Modify the applyMediaUnderstandingIfNeeded function to call runCapability("audio") when hasInboundMedia returns true.
  • Add logging to verify that the audio transcription pipeline is being invoked.

Code Changes

// Update hasInboundMedia function
function hasInboundMedia(ctx) {
  return ctx.MediaPath && ctx.MediaType === 'audio';
}

// Modify applyMediaUnderstandingIfNeeded function
function applyMediaUnderstandingIfNeeded(ctx) {
  if (hasInboundMedia(ctx)) {
    console.log('Invoking audio transcription pipeline...');
    runCapability("audio");
  }
}

Verification

To verify that the fix worked, send a voice message to a Telegram forum topic or WhatsApp group and check the gateway log for audio/transcription/media-understanding entries. The log should now show that the audio transcription pipeline is being invoked.

Extra Tips

  • Make sure to update the openclaw version to the latest release to ensure that the fix is included.
  • If issues persist, try enabling verbose logging to get more detailed output.
  • Consider adding additional logging to the applyMediaUnderstandingIfNeeded function to verify that it is being called correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug] Telegram voice messages: media understanding audio transcription pipeline never triggered [3 pull requests, 1 participants]