openclaw - ✅(Solved) Fix [Bug]: WhatsApp Audio Transcription Not Working in 2026.4.1 (Regression from 2026.3.7) [2 pull requests, 6 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59437Fetched 2026-04-08 02:25:11
View on GitHub
Comments
6
Participants
1
Timeline
11
Reactions
0
Participants
Timeline (top)
commented ×6cross-referenced ×3closed ×1locked ×1

WhatsApp voice messages are received but not automatically transcribed in OpenClaw 2026.4.1, despite correct configuration. This is a regression - audio transcription was working in version 2026.3.7 (around March 7, 2026).

Root Cause

Possible Root Cause

Between 2026.3.7 and 2026.4.1, something may have changed in the media processing pipeline or audio detection/routing logic.

Fix Action

Workaround

Manual transcription via curl + Groq API, or potential downgrade to 2026.3.7.

PR fix notes

PR #59926: fix(media): always resolve bundled capability providers via compat config when cfg is provided

Description (problem / solution / changelog)

Summary

  • Problem: Groq (and potentially other bundled audio providers like Deepgram) are silently unavailable even when correctly configured via tools.media.audio.models, producing "Media provider not available: groq" on every transcription attempt.
  • Why it matters: Groq Whisper STT is completely broken for any user whose active gateway registry already contains other media understanding providers (e.g. OpenAI for image understanding). The early-return in resolvePluginCapabilityProviders caused the bundled-compat load path to be skipped entirely.
  • Root cause: resolvePluginCapabilityProviders checked activeProviders.length > 0 and returned immediately — but activeProviders is the full active registry across all capability types, not just audio. So if OpenAI was loaded for image, groq was never loaded for audio.
  • What changed: When a caller config is provided, always resolve via the compat-config path (which injects bundled providers like groq/deepgram via withBundledPluginEnablementCompat). Fall back to the active registry only when no cfg is provided or compat load yields nothing.
  • What did NOT change: No config schema changes. No behavior change when no cfg is passed.

Change Type

  • Bug fix

Scope

  • Integrations

Linked Issue/PR

  • Closes #59875
  • Related #59502, #59437
  • This PR fixes a bug or regression

Root Cause / Regression History

  • Root cause: resolvePluginCapabilityProviders line 72 if (activeProviders.length > 0) return — early return skips the compat config load that injects bundled capability providers.
  • Missing detection / guardrail: No test covered the case where the active registry has providers for capability A (image) but the request is for capability B (audio) with a bundled provider not in the startup registry.
  • Prior context: The early return was introduced to prefer the active gateway registry when already populated; this optimization broke the fallback for capability-specific providers.
  • Why this regressed now: PR #3dbd81e610 restored bundled compat loading in resolveCapabilityProviderConfig but left the early-return guard intact, so the compat path was only reached when the active registry was completely empty.

Regression Test Plan

  • Coverage level: Unit test
  • Target test: src/plugins/capability-provider-runtime.test.ts (if exists) or src/media-understanding/runner.ts test suite
  • Scenario: active registry contains openai (image), user requests groq (audio) — groq must be returned
  • Existing test: extensions/groq/plugin-registration.contract.test.ts covers registration; gap is in resolvePluginCapabilityProviders multi-provider scenario

User-visible / Behavior Changes

Groq Whisper audio transcription now works when tools.media.audio.models: [{"provider": "groq", "model": "whisper-large-v3-turbo"}] is configured and GROQ_API_KEY is set, even when other media providers (OpenAI, Google) are already loaded in the gateway registry.

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No (Groq was already being called when active registry was empty)
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OpenClaw 2026.4.1, Amazon Linux 2023
  • GROQ_API_KEY set, tools.media.audio.enabled: true

Steps

  1. Set tools.media.audio.models: [{"provider": "groq", "model": "whisper-large-v3-turbo"}]
  2. Enable OpenAI for image understanding (or any other media provider)
  3. Send voice note via WhatsApp/Telegram
  4. Before: audio understanding failed: Error: Media provider not available: groq
  5. After: Groq transcribes successfully

Expected

Groq is loaded and audio is transcribed

Actual (before fix)

Media provider not available: groq — provider silently missing from registry

Evidence

  • Issue #59875 reports exact error: "audio understanding failed: Error: Media provider not available: groq"
  • Code trace: resolvePluginCapabilityProvidersactiveProviders.length > 0 early-return → compat load skipped → groq not in registry → line 521 in runner.entries.ts throws

Human Verification

  • Traced the code path manually through resolvePluginCapabilityProvidersresolveRuntimePluginRegistryresolveCapabilityProviderConfigwithBundledPluginEnablementCompat
  • Confirmed the early-return bypasses compat loading when activeProviders.length > 0
  • Did NOT verify in a live environment with GROQ_API_KEY (no live Groq key available)

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.

Compatibility / Migration

  • Backward compatible? Yes — behavior only changes when cfg is provided and compat load succeeds
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: compat load is slightly heavier than the active registry path
    • Mitigation: resolveRuntimePluginRegistry uses getCompatibleActivePluginRegistry cache internally; most calls will hit cache and be cheap

AI-assisted: Prepared with Claude via OpenClaw. Root cause identified by code trace through capability-provider-runtime.ts, bundled-compat.ts, and runner.entries.ts. Verified by contributor.

Changed files

  • docs/channels/slack.md (modified, +34/-1)
  • src/plugins/capability-provider-runtime.ts (modified, +20/-7)

PR #60392: fix(plugins): auto-enable media provider plugins referenced from tools.media

Description (problem / solution / changelog)

Problem

After the plugin activation refactor (f911bbc353), bundled media provider plugins like Groq silently stop working even when:

  • GROQ_API_KEY is set in env
  • tools.media.audio.models explicitly references provider: groq

The auto-enable logic handles channel plugins and web search/fetch providers, but never checks tools.media config for media understanding providers. Groq (and likely Deepgram) fall through to "bundled, disabled by default."

Multiple users affected since 2026.3.31 / 2026.4.x.

Fixes #59875, fixes #59502, fixes #54695, fixes #59437

Solution

  • Add collectConfiguredMediaProviderIds() to scan all tools.media.{models,audio,image,video}.models entries for provider references
  • Add resolveMediaUnderstandingProviderPluginIds() to map provider IDs → plugin IDs (bundled snapshots + third-party manifests via mediaUnderstandingProviders contract)
  • Wire both into resolveConfiguredPlugins() so referenced media providers get auto-enabled
  • Update configMayNeedPluginAutoEnable() and configMayNeedPluginManifestRegistry() guards
  • Add "media-provider-configured" kind to PluginAutoEnableCandidate union type

Review feedback addressed (v2)

  • P1 (greptile): Normalize bundled provider IDs via normalizeMediaProviderId() before map insertion to avoid alias mismatches (e.g. "gemini""google")
  • P1 (codex): Move media provider check outside the channels-only early return in configMayNeedPluginManifestRegistry so third-party media plugins can resolve without channel config
  • P2 (greptile): Skip registry loading for bundled-only media providers via isBundledMediaProviderId() gate

Testing

  • Added tests for bundled (Groq) and third-party media provider plugin auto-enable
  • Manually verified: Groq audio transcription works with only tools.media.audio.models: [{ provider: "groq" }] — no plugins.entries.groq.enabled: true workaround needed
  • Existing tests pass: vitest run src/config/plugin-auto-enable.test.ts && vitest run src/plugins/providers.test.ts

AI-assisted

This PR was authored with AI assistance (Claude Opus 4.6). Fully tested locally (unit tests + manual Groq transcription verification on a dev OpenClaw instance). We understand what the code does — it extends the existing plugin auto-enable pattern to cover media understanding providers the same way it already covers channels, browser, and web search/fetch providers.

Changed files

  • src/config/plugin-auto-enable.test.ts (modified, +64/-1)
  • src/config/plugin-auto-enable.ts (modified, +95/-9)

Code Example

"tools": {
  "media": {
    "audio": {
      "enabled": true,
      "maxBytes": 20971520,
      "models": [
        {
          "provider": "groq",
          "model": "whisper-large-v3-turbo",
          "capabilities": ["audio"],
          "type": "provider",
          "maxBytes": 20971520,
          "timeoutSeconds": 60,
          "language": "pt",
          "baseUrl": "https://api.groq.com/openai/v1"
        }
      ]
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

WhatsApp voice messages are received but not automatically transcribed in OpenClaw 2026.4.1, despite correct configuration. This is a regression - audio transcription was working in version 2026.3.7 (around March 7, 2026).

Environment

  • OpenClaw version: 2026.4.1 (da64a97)
  • Channel: WhatsApp (Baileys)
  • OS: Linux (Oracle VPS)
  • Node: v22.22.0

Configuration

"tools": {
  "media": {
    "audio": {
      "enabled": true,
      "maxBytes": 20971520,
      "models": [
        {
          "provider": "groq",
          "model": "whisper-large-v3-turbo",
          "capabilities": ["audio"],
          "type": "provider",
          "maxBytes": 20971520,
          "timeoutSeconds": 60,
          "language": "pt",
          "baseUrl": "https://api.groq.com/openai/v1"
        }
      ]
    }
  }
}

Groq API key is configured and working (manual transcription via curl succeeds).

Steps to Reproduce

  1. Configure tools.media.audio with Groq Whisper
  2. Restart gateway
  3. Send a voice message via WhatsApp
  4. Observe logs - no transcription attempt is logged
  5. Audio appears as [media:audio] placeholder

Expected Behavior

  • Audio file should be transcribed automatically
  • Transcript should replace or annotate the [media:audio] placeholder
  • Logs should show transcription attempt and result

Actual Behavior

  • Audio message received: [whatsapp] Inbound message ... (direct, audio/ogg; codecs=opus, 69 chars)
  • No transcription attempt logged
  • Agent receives [media:audio] placeholder without transcript

Regression Timeline

  • 2026.3.2: Added MIME normalization for WhatsApp voice notes
  • 2026.3.7: User reports transcription was working
  • 2026.4.1: Transcription no longer working

Related Issues

  • #13924 - WhatsApp voice messages MIME type issues
  • #14374 - Feature request for automatic transcription
  • #17101 - Telegram voice messages not transcribed
  • #7573 - Audio models ignored, fall back to active model

Possible Root Cause

Between 2026.3.7 and 2026.4.1, something may have changed in the media processing pipeline or audio detection/routing logic.

Workaround

Manual transcription via curl + Groq API, or potential downgrade to 2026.3.7.

extent analysis

TL;DR

Downgrade to OpenClaw version 2026.3.7 as a temporary workaround to restore automatic transcription of WhatsApp voice messages.

Guidance

  • Review the media processing pipeline and audio detection/routing logic changes between versions 2026.3.7 and 2026.4.1 to identify potential causes of the regression.
  • Verify that the Groq API key and configuration are correct and functional by testing manual transcription via curl.
  • Check the logs for any errors or warnings related to audio processing or transcription to gather more information about the issue.
  • Consider testing with a different audio model or provider to isolate the issue.

Example

No code snippet is provided as the issue is related to configuration and versioning.

Notes

The root cause of the issue is unclear, but it appears to be related to changes between OpenClaw versions 2026.3.7 and 2026.4.1. Downgrading to version 2026.3.7 may restore functionality, but it is not a permanent solution.

Recommendation

Apply the workaround by downgrading to OpenClaw version 2026.3.7, as it is the last known version where automatic transcription was working. This will allow for temporary restoration of functionality while the root cause is investigated and a permanent fix is developed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: WhatsApp Audio Transcription Not Working in 2026.4.1 (Regression from 2026.3.7) [2 pull requests, 6 comments, 1 participants]