openclaw - ✅(Solved) Fix v2026.4.5: Telegram voice messages no longer transcribed (STT regression) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62205Fetched 2026-04-08 03:07:40
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
0
Timeline (top)
referenced ×5closed ×1commented ×1cross-referenced ×1

Voice message transcription (STT) stopped working after upgrading from v2026.4.2 → v2026.4.5. Telegram voice messages are no longer transcribed by the configured audio provider; the model receives the raw audio file instead.

Root Cause

Root Cause (traced through compiled source)

Fix Action

Workaround

Add "openai" to plugins.allow in openclaw.json:

"plugins": {
  "allow": ["...", "openai"]
}

This allows the openai bundled plugin to load, which registers openaiMediaUnderstandingProvider with transcribeAudio. The configured tools.media.audio.models[0].baseUrl correctly overrides the default api.openai.com URL, so a local OpenAI-compatible endpoint (e.g. Parakeet) works as expected.

Also required: a models.providers.openai entry with apiKey so resolveUsableCustomProviderApiKey can return a key for requireApiKey:

"models": {
  "providers": {
    "openai": {
      "apiKey": "local",
      "baseUrl": "<your local STT endpoint>/v1",
      "models": []
    }
  }
}

PR fix notes

PR #62234: Plugins: allowlist compat for capability provider fallback (#62205)

Description (problem / solution / changelog)

Summary

  • Problem: With a non-empty plugins.allow list that omitted bundled capability plugins (for example openai), capability fallback only added plugins.entries, so the loader still treated those plugins as blocked and media-understanding providers (including OpenAI-compatible STT) never registered.
  • Why it matters: Voice and other STT flows that use tools.media.audio with provider: "openai" could pass raw audio to the model instead of a transcript when an allowlist was used to scope plugins.
  • What changed: resolveCapabilityProviderConfig now applies withBundledPluginAllowlistCompat before withBundledPluginEnablementCompat, matching the order in applyPluginCompatibilityOverrides (allowlist, then enablement, then vitest compat).
  • What did NOT change: plugins.deny remains authoritative; denylisted plugins are not enabled by this path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #62205
  • Related #62205
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Enablement-only compat did not merge bundled capability plugin ids into plugins.allow, so resolvePluginActivationState returned not-in-allowlist and the bundled plugin did not load.
  • Missing detection / guardrail: Allowlist fallback tests mocked enablement without exercising the allowlist layer.
  • Contributing context: Same layering as withBundledPluginAllowlistCompat + withBundledPluginEnablementCompat in provider resolution tests.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this: Unit test
  • Target test or file: src/plugins/capability-provider-runtime.test.ts, src/media-understanding/provider-registry.allowlist.test.ts, src/image-generation/provider-registry.allowlist.test.ts
  • Scenario the test should lock in: Enablement compat receives config after allowlist compat merges capability plugin ids into plugins.allow when plugins.allow is non-empty.
  • Why this is the smallest reliable guardrail: Asserts the compat chain order and expected config shape without requiring a full Telegram or gateway run.

User-visible / Behavior Changes

  • With plugins.allow set, bundled capability plugins needed for capability fallback (media understanding, image generation, speech, and other keys using the same path) are merged into the allowlist so providers such as STT load without manually listing every capability plugin id.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? No
  • Trust boundary changes? No
  • Notes: plugins.deny still blocks plugins before allowlist evaluation.

Verification

  • pnpm test (scoped): src/plugins/capability-provider-runtime.test.ts, src/media-understanding/provider-registry.allowlist.test.ts, src/image-generation/provider-registry.allowlist.test.ts
  • pnpm check / pnpm build: local pnpm check fails on unrelated acpx extension TypeScript resolution (acpx/dist/runtime.js); scoped lint on touched files is clean.

Made with Cursor

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/image-generation/provider-registry.allowlist.test.ts (modified, +10/-6)
  • src/media-understanding/provider-registry.allowlist.test.ts (modified, +10/-6)
  • src/plugins/capability-provider-runtime.test.ts (modified, +25/-40)
  • src/plugins/capability-provider-runtime.ts (modified, +6/-1)

Code Example

if (params.config.allow.length > 0 && !explicitlyAllowed) return {
    enabled: false,
    reason: "not in allowlist"
};

---

"plugins": {
  "allow": ["...", "openai"]
}

---

"models": {
  "providers": {
    "openai": {
      "apiKey": "local",
      "baseUrl": "<your local STT endpoint>/v1",
      "models": []
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Voice message transcription (STT) stopped working after upgrading from v2026.4.2 → v2026.4.5. Telegram voice messages are no longer transcribed by the configured audio provider; the model receives the raw audio file instead.

Environment

  • OpenClaw version: v2026.4.5
  • Channel: Telegram (DM)
  • STT provider: OpenAI-compatible endpoint (Parakeet / NVIDIA NeMo at a local URL)
  • Config: tools.media.audio.models[0].provider = "openai", tools.media.audio.models[0].baseUrl = <local Parakeet URL>

Behavior

v2026.4.2: Voice messages were transcribed correctly; echoTranscript fired (🎤 echo sent to user before agent reply).

v2026.4.5: No echo sent; model receives raw audio file; agent responds generically instead of to the voice content.

Root Cause (traced through compiled source)

In v2026.4.5, Telegram voice handling routes through applyMediaUnderstanding (apply-CU6lCv5P.js), which builds a media understanding provider registry via buildMediaUnderstandingRegistry (provider-registry-Qp9sisqM.js).

The openaiMediaUnderstandingProvider (in media-understanding-provider-BUTtadCJ.js) provides transcribeAudio and is the correct handler for provider: "openai" audio entries. However, it only gets registered if the "openai" bundled plugin is active.

The activation check in manifest-registry-Cqdpf6fh.js (resolvePluginActivationState) contains:

if (params.config.allow.length > 0 && !explicitlyAllowed) return {
    enabled: false,
    reason: "not in allowlist"
};

If a user has a non-empty plugins.allow list that doesn't include "openai", the openai bundled plugin is blocked — even though it's enabledByDefault: true and the user has a valid tools.media.audio config using the openai-compatible API.

The compat fallback in capability-provider-runtime-CMlMeixn.js (resolvePluginCapabilityProviderswithBundledPluginEnablementCompat) is supposed to handle this, but it only adds to plugins.entries (not plugins.allow), so the allowlist check still blocks activation.

The result: getMediaUnderstandingProvider("openai", providerRegistry) returns undefined, causing runProviderEntry to throw "Media provider not available: openai" (silently caught in runAttachmentEntries). No audio.transcription outputs are produced, echoTranscript never fires, and the raw audio file is passed to the model.

Workaround

Add "openai" to plugins.allow in openclaw.json:

"plugins": {
  "allow": ["...", "openai"]
}

This allows the openai bundled plugin to load, which registers openaiMediaUnderstandingProvider with transcribeAudio. The configured tools.media.audio.models[0].baseUrl correctly overrides the default api.openai.com URL, so a local OpenAI-compatible endpoint (e.g. Parakeet) works as expected.

Also required: a models.providers.openai entry with apiKey so resolveUsableCustomProviderApiKey can return a key for requireApiKey:

"models": {
  "providers": {
    "openai": {
      "apiKey": "local",
      "baseUrl": "<your local STT endpoint>/v1",
      "models": []
    }
  }
}

Expected Behavior

Users with a configured tools.media.audio using provider: "openai" should not need to manually add "openai" to plugins.allow. The audio pipeline should work regardless of whether the openai provider plugin is explicitly allowed, since the user has an explicit audio model config that references it.

The withBundledPluginEnablementCompat compat path should either also add the plugin to plugins.allow, or the allowlist check should be skipped for bundled plugins that are required by an explicit capability config entry.

extent analysis

TL;DR

Add "openai" to plugins.allow in openclaw.json to enable the openai bundled plugin and restore voice message transcription.

Guidance

  • Verify that the tools.media.audio config is correctly set to use the openai-compatible provider by checking tools.media.audio.models[0].provider and tools.media.audio.models[0].baseUrl.
  • Add "openai" to plugins.allow in openclaw.json to allow the openai bundled plugin to load and register the openaiMediaUnderstandingProvider.
  • Ensure a models.providers.openai entry with apiKey is present in the config to enable the resolveUsableCustomProviderApiKey function to return a key for requireApiKey.
  • Test voice message transcription after applying the workaround to verify that echoTranscript fires and the model responds to the voice content.

Example

"plugins": {
  "allow": ["...", "openai"]
}
"models": {
  "providers": {
    "openai": {
      "apiKey": "local",
      "baseUrl": "<your local STT endpoint>/v1",
      "models": []
    }
  }
}

Notes

This workaround assumes that the user has a valid tools.media.audio config using the openai-compatible API. The long-term solution would involve updating the withBundledPluginEnablementCompat compat path to handle bundled plugins required by explicit capability config entries.

Recommendation

Apply the workaround by adding "openai" to plugins.allow in openclaw.json, as this allows the openai bundled plugin to load and register the openaiMediaUnderstandingProvider, restoring voice message transcription.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING