openclaw - ✅(Solved) Fix [Bug]: `infer audio transcribe` and `infer image describe` emit "No transcript/description returned for ... <path>" when no provider is configured, blaming the input instead of the missing config [2 pull requests, 1 comments, 2 participants]

openclaw2026-04-28 13:02:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#73569•Fetched 2026-04-29 06:18:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

bittoby

Participants

bittoby

clawsweeper[bot]

Timeline (top)

cross-referenced ×2referenced ×2commented ×1labeled ×1

When no audio (resp. image-understanding) provider is configured, pnpm openclaw infer audio transcribe --file <path> and pnpm openclaw infer image describe --file <path> exit 1 with Error: No transcript returned for audio: <path> and Error: No description returned for image: <path>. The error blames the input file. Reproduces with a nonexistent file, a directory, an empty file, or a wrong-format text file pretending to be audio — every input shape gives the same path-blaming error. Sibling subcommand infer image generate correctly says Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow"... and points the operator at the missing config. The audio/image-describe paths should follow the same shape.

Error Message

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

Root Cause

Code Example

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

---

=== Audio transcribe — error blames input file across every input shape ===

$ pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3
Error: No transcript returned for audio: /tmp/totally-not-real.mp3
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp                       # directory
Error: No transcript returned for audio: /tmp

$ touch /tmp/empty.mp3 && pnpm openclaw infer audio transcribe --file /tmp/empty.mp3
Error: No transcript returned for audio: /tmp/empty.mp3

$ echo "hello" > /tmp/fake.mp3 && pnpm openclaw infer audio transcribe --file /tmp/fake.mp3
Error: No transcript returned for audio: /tmp/fake.mp3

=== Confirmation that no provider is configured ===

$ pnpm openclaw infer audio providers --json | python3 -c "
import json, sys
for line in sys.stdin:
    line=line.strip()
    if not line.startswith('{'): continue
    d = json.loads(line)
    print(f\"{d['id']}: configured={d['configured']}\")
"
deepgram: configured=False
elevenlabs: configured=False
google: configured=False
groq: configured=False
mistral: configured=False
openai: configured=False
senseaudio: configured=False
xai: configured=False

=== Image describe — same shape ===

$ pnpm openclaw infer image describe --file /tmp/totally-not-real.png
Error: No description returned for image: /tmp/totally-not-real.png
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer image describe --file /tmp
Error: No description returned for image: /tmp
 ELIFECYCLE  Command failed with exit code 1.

=== In-tree counter-example: image generate gets it right ===

$ pnpm openclaw infer image generate --prompt "test"
Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY).
 ELIFECYCLE  Command failed with exit code 1.

=== Cross-capability error-wording matrix (same "no provider configured" condition) ===

CAPABILITY                 ERROR WORDING                                                                       VERDICT
infer image generate       "No image-generation model configured. Set agents.defaults.imageGenerationModel..." Clear, actionable
infer web search           "missing_brave_api_key" / "needs a Brave Search API key. Run openclaw configure..." Clear, actionable
infer web fetch            "web.fetch is disabled or no provider is available."                                Clear, actionable
infer tts convert          Per-provider list: "openai: not configured; elevenlabs: not configured; ..."        Verbose but clear
infer audio transcribe     "No transcript returned for audio: <path>"                                          Misleading
infer image describe       "No description returned for image: <path>"                                         Misleading

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Fresh checkout of openclaw at v2026.4.26 (commit afd0304); pnpm install && pnpm build on Node 22.22.2; gateway running locally.
Confirm no audio/image-understanding provider is configured: pnpm openclaw infer audio providers --json | head -3 shows every entry as "configured": false (none have credentials in this default install).
Run pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3. Observe Error: No transcript returned for audio: /tmp/totally-not-real.mp3 and exit 1.
Repeat with: --file /tmp (a directory), --file /tmp/empty.mp3 (touch'd empty file), --file /tmp/fake.mp3 (a text file). Same error every time, only the path in the message changes.
Run pnpm openclaw infer image describe --file /tmp/totally-not-real.png. Observe Error: No description returned for image: /tmp/totally-not-real.png and exit 1. Same shape.
For comparison: pnpm openclaw infer image generate --prompt "test". Observe Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY). — clearly identifies the missing config and points at the fix.

Expected behavior

When no audio (resp. image-understanding) provider is configured, the CLI should emit a clear error that names the missing config and the next step, not a path-blaming message. The closest in-tree precedent is infer image generate, which on the same "no provider configured" condition emits:

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

infer audio transcribe and infer image describe should follow that pattern — detect "no providers configured" upfront, emit a clear "no audio/image-understanding provider configured" error pointing at the relevant agents.defaults.* knob, and exit 1.

The current "No transcript/description returned for ...: <path>" wording should be reserved for the case where a provider was actually called and returned an empty result — which is the case the closed issues #65394, #66506, #65076 patched. Today the same error string is overloaded across both failure modes; operators cannot tell from the message whether to debug their input file, their provider config, or the provider's response.

Actual behavior

$ pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3 Error: No transcript returned for audio: /tmp/totally-not-real.mp3 ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp # directory Error: No transcript returned for audio: /tmp ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp/empty.mp3 # empty Error: No transcript returned for audio: /tmp/empty.mp3 ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp/fake.mp3 # text file Error: No transcript returned for audio: /tmp/fake.mp3 ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer image describe --file /tmp/totally-not-real.png Error: No description returned for image: /tmp/totally-not-real.png ELIFECYCLE Command failed with exit code 1.

$ pnpm openclaw infer image generate --prompt "test" Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY). ELIFECYCLE Command failed with exit code 1.

OpenClaw version

2026.4.26

Operating system

Ubuntu 24.04.4 LTS (Linux 6.8.0-110-generic)

Install method

pnpm dev

Model

anthropic/claude-opus-4-7

Provider / routing chain

N/A

Additional provider/model setup details

Default model: anthropic/claude-opus-4-7 (text inference works fine via Claude CLI OAuth).
Audio providers: all 8 listed by infer audio providers are configured: false (deepgram, elevenlabs, google, groq, mistral, openai, senseaudio, xai). No credentials in env, no ~/.openclaw/credentials/ entries for any of them.
Image-understanding providers: same — all listed providers are configured: false.
No per-agent overrides; default agent main; gateway mode=local.

Logs, screenshots, and evidence

=== Audio transcribe — error blames input file across every input shape ===

$ pnpm openclaw infer audio transcribe --file /tmp/totally-not-real.mp3
Error: No transcript returned for audio: /tmp/totally-not-real.mp3
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer audio transcribe --file /tmp                       # directory
Error: No transcript returned for audio: /tmp

$ touch /tmp/empty.mp3 && pnpm openclaw infer audio transcribe --file /tmp/empty.mp3
Error: No transcript returned for audio: /tmp/empty.mp3

$ echo "hello" > /tmp/fake.mp3 && pnpm openclaw infer audio transcribe --file /tmp/fake.mp3
Error: No transcript returned for audio: /tmp/fake.mp3

=== Confirmation that no provider is configured ===

$ pnpm openclaw infer audio providers --json | python3 -c "
import json, sys
for line in sys.stdin:
    line=line.strip()
    if not line.startswith('{'): continue
    d = json.loads(line)
    print(f\"{d['id']}: configured={d['configured']}\")
"
deepgram: configured=False
elevenlabs: configured=False
google: configured=False
groq: configured=False
mistral: configured=False
openai: configured=False
senseaudio: configured=False
xai: configured=False

=== Image describe — same shape ===

$ pnpm openclaw infer image describe --file /tmp/totally-not-real.png
Error: No description returned for image: /tmp/totally-not-real.png
 ELIFECYCLE  Command failed with exit code 1.

$ pnpm openclaw infer image describe --file /tmp
Error: No description returned for image: /tmp
 ELIFECYCLE  Command failed with exit code 1.

=== In-tree counter-example: image generate gets it right ===

$ pnpm openclaw infer image generate --prompt "test"
Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first (comfy: COMFY_API_KEY / COMFY_CLOUD_API_KEY; fal: FAL_KEY / FAL_API_KEY; google: GEMINI_API_KEY / GOOGLE_API_KEY).
 ELIFECYCLE  Command failed with exit code 1.

=== Cross-capability error-wording matrix (same "no provider configured" condition) ===

CAPABILITY                 ERROR WORDING                                                                       VERDICT
infer image generate       "No image-generation model configured. Set agents.defaults.imageGenerationModel..." Clear, actionable
infer web search           "missing_brave_api_key" / "needs a Brave Search API key. Run openclaw configure..." Clear, actionable
infer web fetch            "web.fetch is disabled or no provider is available."                                Clear, actionable
infer tts convert          Per-provider list: "openai: not configured; elevenlabs: not configured; ..."        Verbose but clear
infer audio transcribe     "No transcript returned for audio: <path>"                                          Misleading
infer image describe       "No description returned for image: <path>"                                         Misleading

Impact and severity

Affected users/systems/channels:

Every operator running infer audio transcribe or infer image describe without a configured provider for the relevant capability. Linux directly observed (Ubuntu 24.04 / Node 22.22.2 / pnpm 10.33.0); platform-agnostic code path so macOS/Windows are expected to reproduce, only Linux directly verified.
New users following the docs to try out audio/image-understanding capabilities. The "No transcript returned" message sends them down the wrong debugging path (they check their audio file, re-encode, try a different format) instead of pointing at the missing provider config.

Severity:

Misleading error wording. Not a crash, not a security issue, no data loss. The underlying behavior (refusing to call non-existent providers) is correct; only the error message is wrong.
Trust-eroding: the operator sees an error that names their file and exits 1, naturally assumes the file is bad, and wastes time before discovering the real cause.
Inconsistent with sibling capability subcommands. image generate, web search, web fetch, tts convert all emit clear "no provider configured" errors. Two outliers (audio transcribe, image describe) misattribute to input.

Frequency:

Always, deterministic. 100% reproduction across nonexistent files, directories, empty files, and wrong-format files when no audio/image-understanding provider is configured. Independent of model/provider/transport — the error happens before any provider call.

Consequence:

Operators waste time debugging input files when the real fix is "configure a provider."
Inconsistency across infer subcommands erodes operator trust in error messages generally.
Closed issues #65394, #66506, #65076 patched the "No transcript returned" string for genuine provider-failure cases. The same string remaining for the "no provider configured" case means those fixes are partial — the error string is still overloaded across two failure modes.
No grounded evidence of missed messages, failed onboarding, or extra cost.

Additional information

Regression status: not classified as a Regression. Last-known-good not directly observed; no bisect performed.
Likely fix locus: the action handlers for infer audio transcribe and infer image describe (commander definitions under src/cli/cli-infer/ or wherever the parser lives in this build), or the upstream capability dispatcher they share. Detect "no provider configured for this capability" before attempting the call (the providers list is already resolved — infer audio providers and infer image providers correctly enumerate configured: false), and emit an image generate-shaped error that names the missing config and the env vars to set. The discovery logic exists; only the wording at the dispatch-failure boundary needs to change.
Suggested regression test: a unit test pair on infer audio transcribe and infer image describe actions that mock the providers list to return all-configured: false, invoke the action with any --file value, and assert the error message names "no provider configured" (not "no transcript/description returned for ..."). Pair with a parity test that confirms the existing "no transcript returned" message still appears when the provider is configured but returns an empty body — that's the original failure mode patched in #65394 / #66506 / #65076.
Related findings from the same session (linkable but not bundled):
- The systemic theme of "openclaw forwards user input to the provider before validating it, then surfaces a downstream error that misattributes the cause" also drives #73185 (the empty --prompt case for infer model run). Same error class, different surface.
- infer tts convert --text "" shows yet another instance: openclaw doesn't validate the empty text upfront; the configured Microsoft TTS provider rejects with "Microsoft TTS text cannot be empty"; non-Microsoft providers are silently ineligible. Per-provider, not per-CLI, validation. Could be filed as part of the systemic series.
Dedupe checked against the openclaw issue corpus on 2026-04-28: no existing open or closed issue matches this specific failure mode. Closed issues #65394, #66506, #65076 all addressed the "No transcript returned" error appearing on valid audio inputs with a configured provider — different failure mode. #63700 (open) is a feature request to add --file multimodal support to infer model run — unrelated.
Not exercised in this repro: behavior with at least one audio/image-understanding provider configured (we have no Deepgram/Groq/OpenAI/Google audio creds in this environment); behavior under --gateway transport (the bug is in the error-wording path which is shared, but not directly verified); concurrent invocations.

extent analysis

TL;DR

The error message for infer audio transcribe and infer image describe when no provider is configured should be changed to clearly indicate that no provider is configured, similar to the error message for infer image generate.

Guidance

Check the code for infer audio transcribe and infer image describe to see where the error message is being generated and modify it to include a clear indication that no provider is configured.
Verify that the providers list is being correctly resolved and that the configured status is being checked before attempting to call the provider.
Consider adding a unit test to ensure that the error message is correct when no provider is configured.
Review the code for other instances of error messages that may be misattributing the cause of the error.

Example

No code example is provided as the issue is related to the error message wording and not a specific code snippet.

Notes

The issue is not a regression and the fix is likely to be in the action handlers for infer audio transcribe and infer image describe. The error message should be changed to clearly indicate that no provider is configured, similar to the error message for infer image generate.

Recommendation

Apply a workaround by modifying the error message for infer audio transcribe and infer image describe to clearly indicate that no provider is configured. This will help operators to quickly identify the issue and configure a provider.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Error: No image-generation model configured. Set agents.defaults.imageGenerationModel.primary to a provider/model like "comfy/workflow". If you want a specific provider, also configure that provider's auth/API key first ...

#api #ssr #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: `infer audio transcribe` and `infer image describe` emit "No transcript/description returned for ... <path>" when no provider is configured, blaming the input instead of the missing config [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #73576: fix: clarify infer audio/image errors when no provider configured

Description (problem / solution / changelog)

Summary

Test plan

Changed files

PR #73593: fix(cli): clarify infer media provider config errors

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING