openclaw - ✅(Solved) Fix [Bug]: `infer audio transcribe` and `infer image describe` emit "No transcript/description returned for ... <path>" when no provider is configured, blaming the input instead of the missing config [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73569Fetched 2026-04-29 06:18:05
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
cross-referenced ×2referenced ×2commented ×1labeled ×1

When no audio (resp. image-understanding) provider is configured, pnpm openclaw infer audio transcribe --file <path> and pnpm openclaw infer image describe --file <path> exit 1 with Error: No transcript returned for audio: <path> and Error: No description returned for image: <path>. The error blames the input file. Reproduces with a nonexistent file, a directory, an empty file, or a wrong-format text file pretending to be audio — every input shape gives the same path-blaming error. Sibling subcommand infer image generate correctly says Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow"... and points the operator at the missing config. The audio/image-describe paths should follow the same shape.

Error Message

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

Root Cause

When no audio (resp. image-understanding) provider is configured, pnpm openclaw infer audio transcribe --file <path> and pnpm openclaw infer image describe --file <path> exit 1 with Error: No transcript returned for audio: <path> and Error: No description returned for image: <path>. The error blames the input file. Reproduces with a nonexistent file, a directory, an empty file, or a wrong-format text file pretending to be audio — every input shape gives the same path-blaming error. Sibling subcommand infer image generate correctly says Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow"... and points the operator at the missing config. The audio/image-describe paths should follow the same shape.

Fix Action

Fix / Workaround

The current "No transcript/description returned for ...: <path>" wording should be reserved for the case where a provider was actually called and returned an empty result — which is the case the closed issues #65394, #66506, #65076 patched. Today the same error string is overloaded across both failure modes; operators cannot tell from the message whether to debug their input file, their provider config, or the provider's response.

Consequence:

  • Operators waste time debugging input files when the real fix is "configure a provider."

  • Inconsistency across infer subcommands erodes operator trust in error messages generally.

  • Closed issues #65394, #66506, #65076 patched the "No transcript returned" string for genuine provider-failure cases. The same string remaining for the "no provider configured" case means those fixes are partial — the error string is still overloaded across two failure modes.

  • No grounded evidence of missed messages, failed onboarding, or extra cost.

  • Likely fix locus: the action handlers for infer audio transcribe and infer image describe (commander definitions under src/cli/cli-infer/ or wherever the parser lives in this build), or the upstream capability dispatcher they share. Detect "no provider configured for this capability" before attempting the call (the providers list is already resolved — infer audio providers and infer image providers correctly enumerate configured: false), and emit an image generate-shaped error that names the missing config and the env vars to set. The discovery logic exists; only the wording at the dispatch-failure boundary needs to change.

PR fix notes

PR #73576: fix: clarify infer audio/image errors when no provider configured

Description (problem / solution / changelog)

Summary

  • Detect the "no configured provider" case for infer image describe and infer audio transcribe when provider output text is empty.
  • Return actionable config-focused errors for that case, while preserving the existing "No transcript/description returned" message when a configured provider returns an empty response.
  • Add CLI regression tests for both branches (configured-vs-unconfigured provider behavior).

Test plan

  • pnpm test src/cli/capability-cli.test.ts

Closes #73569

Changed files

  • src/cli/capability-cli.test.ts (modified, +70/-0)
  • src/cli/capability-cli.ts (modified, +29/-0)

PR #73593: fix(cli): clarify infer media provider config errors

Description (problem / solution / changelog)

Summary

  • Problem: infer audio transcribe and infer image describe reported No transcript/description returned for ... <path> when no provider was configured.
  • Why it matters: The old message made users debug the input file even though the real fix was provider configuration.
  • What changed: The CLI now emits capability-specific missing-provider configuration errors for audio transcription and image understanding.
  • What did NOT change (scope boundary): The existing No transcript/description returned... errors still apply when a configured provider is called but returns empty text.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #73569
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: The CLI treated an empty media-understanding result the same whether no provider was configured or a configured provider returned empty text.
  • Missing detection / guardrail: Tests covered empty provider output but not the no-provider-configured branch.
  • Contributing context (if known): infer image generate already had clearer no-provider wording; audio/image describe lacked parity.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/cli/capability-cli.test.ts
  • Scenario the test should lock in: no configured audio/image-understanding provider emits a config-focused error, while configured empty output preserves the existing empty-result error.
  • Why this is the smallest reliable guardrail: The bug is in CLI error selection, so command-level unit tests directly cover the branch.
  • Existing test that already covers this (if any): Existing tests covered configured-provider empty output only.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

infer audio transcribe and infer image describe now report missing provider configuration when no suitable provider is configured, instead of blaming the input path.

Changed files

  • src/cli/capability-cli.test.ts (modified, +76/-0)
  • src/cli/capability-cli.ts (modified, +42/-0)

Code Example

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

---

=== Audio transcribe — error blames input file across every input shape ===

$ pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3
Error: No transcript returned for audio: /tmp/totally-not-real.mp3
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp                       # directory
Error: No transcript returned for audio: /tmp

$ touch /tmp/empty.mp3 && pnpm openclaw infer audio transcribe --file /tmp/empty.mp3
Error: No transcript returned for audio: /tmp/empty.mp3

$ echo "hello" > /tmp/fake.mp3 && pnpm openclaw infer audio transcribe --file /tmp/fake.mp3
Error: No transcript returned for audio: /tmp/fake.mp3

=== Confirmation that no provider is configured ===

$ pnpm openclaw infer audio providers --json | python3 -c "
import json, sys
for line in sys.stdin:
    line=line.strip()
    if not line.startswith('{'): continue
    d = json.loads(line)
    print(f\"{d['id']}: configured={d['configured']}\")
"
deepgram: configured=False
elevenlabs: configured=False
google: configured=False
groq: configured=False
mistral: configured=False
openai: configured=False
senseaudio: configured=False
xai: configured=False

=== Image describe — same shape ===

$ pnpm openclaw infer image describe --file /tmp/totally-not-real.png
Error: No description returned for image: /tmp/totally-not-real.png
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer image describe --file /tmp
Error: No description returned for image: /tmp
 ELIFECYCLE  Command failed with exit code 1.

=== In-tree counter-example: image generate gets it right ===

$ pnpm openclaw infer image generate --prompt "test"
Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY).
 ELIFECYCLE  Command failed with exit code 1.

=== Cross-capability error-wording matrix (same "no provider configured" condition) ===

CAPABILITY                 ERROR WORDING                                                                       VERDICT
infer image generate       "No image-generation model configured. Set agents.defaults.imageGenerationModel..." Clear, actionable
infer web search           "missing_brave_api_key" / "needs a Brave Search API key. Run openclaw configure..." Clear, actionable
infer web fetch            "web.fetch is disabled or no provider is available."                                Clear, actionable
infer tts convert          Per-provider list: "openai: not configured; elevenlabs: not configured; ..."        Verbose but clear
infer audio transcribe     "No transcript returned for audio: <path>"                                          Misleading
infer image describe       "No description returned for image: <path>"                                         Misleading
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

When no audio (resp. image-understanding) provider is configured, pnpm openclaw infer audio transcribe --file <path> and pnpm openclaw infer image describe --file <path> exit 1 with Error: No transcript returned for audio: <path> and Error: No description returned for image: <path>. The error blames the input file. Reproduces with a nonexistent file, a directory, an empty file, or a wrong-format text file pretending to be audio — every input shape gives the same path-blaming error. Sibling subcommand infer image generate correctly says Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow"... and points the operator at the missing config. The audio/image-describe paths should follow the same shape.

Steps to reproduce

  1. Fresh checkout of openclaw at v2026.4.26 (commit afd0304); pnpm install && pnpm build on Node 22.22.2; gateway running locally.
  2. Confirm no audio/image-understanding provider is configured: pnpm openclaw infer audio providers --json | head -3 shows every entry as "configured": false (none have credentials in this default install).
  3. Run pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3. Observe Error: No transcript returned for audio: /tmp/totally-not-real.mp3 and exit 1.
  4. Repeat with: --file /tmp (a directory), --file /tmp/empty.mp3 (touch'd empty file), --file /tmp/fake.mp3 (a text file). Same error every time, only the path in the message changes.
  5. Run pnpm openclaw infer image describe --file /tmp/totally-not-real.png. Observe Error: No description returned for image: /tmp/totally-not-real.png and exit 1. Same shape.
  6. For comparison: pnpm openclaw infer image generate --prompt "test". Observe Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY). — clearly identifies the missing config and points at the fix.

Expected behavior

When no audio (resp. image-understanding) provider is configured, the CLI should emit a clear error that names the missing config and the next step, not a path-blaming message. The closest in-tree precedent is infer image generate, which on the same "no provider configured" condition emits:

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

infer audio transcribe and infer image describe should follow that pattern — detect "no providers configured" upfront, emit a clear "no audio/image-understanding provider configured" error pointing at the relevant agents.defaults.* knob, and exit 1.

The current "No transcript/description returned for ...: <path>" wording should be reserved for the case where a provider was actually called and returned an empty result — which is the case the closed issues #65394, #66506, #65076 patched. Today the same error string is overloaded across both failure modes; operators cannot tell from the message whether to debug their input file, their provider config, or the provider's response.

Actual behavior

$ pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3 Error: No transcript returned for audio: /tmp/totally-not-real.mp3 ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp # directory Error: No transcript returned for audio: /tmp ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp/empty.mp3 # empty Error: No transcript returned for audio: /tmp/empty.mp3 ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp/fake.mp3 # text file Error: No transcript returned for audio: /tmp/fake.mp3 ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer image describe --file /tmp/totally-not-real.png Error: No description returned for image: /tmp/totally-not-real.png ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer image generate --prompt "test" Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY). ELIFECYCLE Command failed with exit code 1.

OpenClaw version

2026.4.26

Operating system

Ubuntu 24.04.4 LTS (Linux 6.8.0-110-generic)

Install method

pnpm dev

Model

anthropic/claude-opus-4-7

Provider / routing chain

N/A

Additional provider/model setup details

  • Default model: anthropic/claude-opus-4-7 (text inference works fine via Claude CLI OAuth).
  • Audio providers: all 8 listed by infer audio providers are configured: false (deepgram, elevenlabs, google, groq, mistral, openai, senseaudio, xai). No credentials in env, no ~/.openclaw/credentials/ entries for any of them.
  • Image-understanding providers: same — all listed providers are configured: false.
  • No per-agent overrides; default agent main; gateway mode=local.

Logs, screenshots, and evidence

=== Audio transcribe — error blames input file across every input shape ===

$ pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3
Error: No transcript returned for audio: /tmp/totally-not-real.mp3
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp                       # directory
Error: No transcript returned for audio: /tmp

$ touch /tmp/empty.mp3 && pnpm openclaw infer audio transcribe --file /tmp/empty.mp3
Error: No transcript returned for audio: /tmp/empty.mp3

$ echo "hello" > /tmp/fake.mp3 && pnpm openclaw infer audio transcribe --file /tmp/fake.mp3
Error: No transcript returned for audio: /tmp/fake.mp3

=== Confirmation that no provider is configured ===

$ pnpm openclaw infer audio providers --json | python3 -c "
import json, sys
for line in sys.stdin:
    line=line.strip()
    if not line.startswith('{'): continue
    d = json.loads(line)
    print(f\"{d['id']}: configured={d['configured']}\")
"
deepgram: configured=False
elevenlabs: configured=False
google: configured=False
groq: configured=False
mistral: configured=False
openai: configured=False
senseaudio: configured=False
xai: configured=False

=== Image describe — same shape ===

$ pnpm openclaw infer image describe --file /tmp/totally-not-real.png
Error: No description returned for image: /tmp/totally-not-real.png
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer image describe --file /tmp
Error: No description returned for image: /tmp
 ELIFECYCLE  Command failed with exit code 1.

=== In-tree counter-example: image generate gets it right ===

$ pnpm openclaw infer image generate --prompt "test"
Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY).
 ELIFECYCLE  Command failed with exit code 1.

=== Cross-capability error-wording matrix (same "no provider configured" condition) ===

CAPABILITY                 ERROR WORDING                                                                       VERDICT
infer image generate       "No image-generation model configured. Set agents.defaults.imageGenerationModel..." Clear, actionable
infer web search           "missing_brave_api_key" / "needs a Brave Search API key. Run openclaw configure..." Clear, actionable
infer web fetch            "web.fetch is disabled or no provider is available."                                Clear, actionable
infer tts convert          Per-provider list: "openai: not configured; elevenlabs: not configured; ..."        Verbose but clear
infer audio transcribe     "No transcript returned for audio: <path>"                                          Misleading
infer image describe       "No description returned for image: <path>"                                         Misleading

Impact and severity

Affected users/systems/channels:

  • Every operator running infer audio transcribe or infer image describe without a configured provider for the relevant capability. Linux directly observed (Ubuntu 24.04 / Node 22.22.2 / pnpm 10.33.0); platform-agnostic code path so macOS/Windows are expected to reproduce, only Linux directly verified.
  • New users following the docs to try out audio/image-understanding capabilities. The "No transcript returned" message sends them down the wrong debugging path (they check their audio file, re-encode, try a different format) instead of pointing at the missing provider config.

Severity:

  • Misleading error wording. Not a crash, not a security issue, no data loss. The underlying behavior (refusing to call non-existent providers) is correct; only the error message is wrong.
  • Trust-eroding: the operator sees an error that names their file and exits 1, naturally assumes the file is bad, and wastes time before discovering the real cause.
  • Inconsistent with sibling capability subcommands. image generate, web search, web fetch, tts convert all emit clear "no provider configured" errors. Two outliers (audio transcribe, image describe) misattribute to input.

Frequency:

  • Always, deterministic. 100% reproduction across nonexistent files, directories, empty files, and wrong-format files when no audio/image-understanding provider is configured. Independent of model/provider/transport — the error happens before any provider call.

Consequence:

  • Operators waste time debugging input files when the real fix is "configure a provider."
  • Inconsistency across infer subcommands erodes operator trust in error messages generally.
  • Closed issues #65394, #66506, #65076 patched the "No transcript returned" string for genuine provider-failure cases. The same string remaining for the "no provider configured" case means those fixes are partial — the error string is still overloaded across two failure modes.
  • No grounded evidence of missed messages, failed onboarding, or extra cost.

Additional information

  • Regression status: not classified as a Regression. Last-known-good not directly observed; no bisect performed.

  • Likely fix locus: the action handlers for infer audio transcribe and infer image describe (commander definitions under src/cli/cli-infer/ or wherever the parser lives in this build), or the upstream capability dispatcher they share. Detect "no provider configured for this capability" before attempting the call (the providers list is already resolved — infer audio providers and infer image providers correctly enumerate configured: false), and emit an image generate-shaped error that names the missing config and the env vars to set. The discovery logic exists; only the wording at the dispatch-failure boundary needs to change.

  • Suggested regression test: a unit test pair on infer audio transcribe and infer image describe actions that mock the providers list to return all-configured: false, invoke the action with any --file value, and assert the error message names "no provider configured" (not "no transcript/description returned for ..."). Pair with a parity test that confirms the existing "no transcript returned" message still appears when the provider is configured but returns an empty body — that's the original failure mode patched in #65394 / #66506 / #65076.

  • Related findings from the same session (linkable but not bundled):

    • The systemic theme of "openclaw forwards user input to the provider before validating it, then surfaces a downstream error that misattributes the cause" also drives #73185 (the empty --prompt case for infer model run). Same error class, different surface.
    • infer tts convert --text "" shows yet another instance: openclaw doesn't validate the empty text upfront; the configured Microsoft TTS provider rejects with "Microsoft TTS text cannot be empty"; non-Microsoft providers are silently ineligible. Per-provider, not per-CLI, validation. Could be filed as part of the systemic series.
  • Dedupe checked against the openclaw issue corpus on 2026-04-28: no existing open or closed issue matches this specific failure mode. Closed issues #65394, #66506, #65076 all addressed the "No transcript returned" error appearing on valid audio inputs with a configured provider — different failure mode. #63700 (open) is a feature request to add --file multimodal support to infer model run — unrelated.

  • Not exercised in this repro: behavior with at least one audio/image-understanding provider configured (we have no Deepgram/Groq/OpenAI/Google audio creds in this environment); behavior under --gateway transport (the bug is in the error-wording path which is shared, but not directly verified); concurrent invocations.

extent analysis

TL;DR

The error message for infer audio transcribe and infer image describe when no provider is configured should be changed to clearly indicate that no provider is configured, similar to the error message for infer image generate.

Guidance

  • Check the code for infer audio transcribe and infer image describe to see where the error message is being generated and modify it to include a clear indication that no provider is configured.
  • Verify that the providers list is being correctly resolved and that the configured status is being checked before attempting to call the provider.
  • Consider adding a unit test to ensure that the error message is correct when no provider is configured.
  • Review the code for other instances of error messages that may be misattributing the cause of the error.

Example

No code example is provided as the issue is related to the error message wording and not a specific code snippet.

Notes

The issue is not a regression and the fix is likely to be in the action handlers for infer audio transcribe and infer image describe. The error message should be changed to clearly indicate that no provider is configured, similar to the error message for infer image generate.

Recommendation

Apply a workaround by modifying the error message for infer audio transcribe and infer image describe to clearly indicate that no provider is configured. This will help operators to quickly identify the issue and configure a provider.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When no audio (resp. image-understanding) provider is configured, the CLI should emit a clear error that names the missing config and the next step, not a path-blaming message. The closest in-tree precedent is infer image generate, which on the same "no provider configured" condition emits:

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

infer audio transcribe and infer image describe should follow that pattern — detect "no providers configured" upfront, emit a clear "no audio/image-understanding provider configured" error pointing at the relevant agents.defaults.* knob, and exit 1.

The current "No transcript/description returned for ...: <path>" wording should be reserved for the case where a provider was actually called and returned an empty result — which is the case the closed issues #65394, #66506, #65076 patched. Today the same error string is overloaded across both failure modes; operators cannot tell from the message whether to debug their input file, their provider config, or the provider's response.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: `infer audio transcribe` and `infer image describe` emit "No transcript/description returned for ... <path>" when no provider is configured, blaming the input instead of the missing config [2 pull requests, 1 comments, 2 participants]