openclaw - 💡(How to fix) Fix Bug: CLI audio transcription can use progress stdout when transcript file is empty [1 pull requests]

openclaw2026-05-27 18:53:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

CLI-based audio transcription entries can fall back to wrapper/progress stdout when the expected transcript file is empty or missing. That can cause non-transcript status text such as Transcribing with Whisper... or other CLI banners to be injected as the audio transcript.

This is easy to hit with a local Whisper wrapper that writes the actual transcript to {{OutputDir}} and prints progress/status to stdout, but the underlying issue is in the generic media CLI runner contract.

Root Cause

#84660 is similar in spirit, but it is about empty Moonshine JSON transcripts reaching the LLM in Discord voice mode. This issue is specifically about CLI transcript-file fallback to stdout in the media runner.
#62680 / #62590 were about Telegram voice notes not reaching transcription because wrapper/meta text affected audio preflight routing. This issue is after transcription runs: the runner can select the wrong output source.
#80633 added local whisper.cpp support; this issue is about hardening the output contract for file-producing CLI STT providers.

Fix Action

Fixed

Fixed by PR: fix(media): suppress local whisper progress transcripts (https://github.com/openclaw/openclaw/pull/87393)

Code Example

{
  "type": "cli",
  "command": "node",
  "args": [
    "./skills/local-whisper/transcribe.js",
    "{{MediaPath}}",
    "--model",
    "small",
    "--language",
    "auto",
    "--output-dir",
    "{{OutputDir}}"
  ]
}

RAW_BUFFERClick to expand / collapse

Summary

Affected area

src/media-understanding/runner.entries.ts
CLI audio entries configured under tools.media.audio.models
Known file-output STT integrations such as whisper, whisper-cli, parakeet-mlx, and custom/local wrappers that use {{OutputDir}} or an equivalent output file

Expected behavior

When OpenClaw can infer that a CLI STT entry is supposed to produce a transcript file, that file should be treated as the transcript source of truth.

If the expected transcript file exists but is empty, or if the expected file is missing, OpenClaw should treat the transcription as empty/failed and skip transcript injection rather than falling back to generic stdout.

Actual behavior

The runner reads the expected transcript file only when content.trim() is non-empty. If the file is empty, missing, or unreadable, it falls through to stdout.trim().

For wrappers that print progress or metadata to stdout while writing the transcript to a file, that progress text can become the transcript seen by the agent.

Reproduction shape

Configure an audio CLI entry that writes the final transcript to {{OutputDir}} and prints progress to stdout, for example:

{
  "type": "cli",
  "command": "node",
  "args": [
    "./skills/local-whisper/transcribe.js",
    "{{MediaPath}}",
    "--model",
    "small",
    "--language",
    "auto",
    "--output-dir",
    "{{OutputDir}}"
  ]
}

Have the CLI exit 0 while producing an empty transcript file, for example for silence/noise/failed recognition.
Have the CLI write progress/status text to stdout.
Observe that the progress/status stdout is returned as the audio transcription instead of being treated as no transcript.

Why this matters beyond one custom skill

This is not specific to one wrapper implementation. The generic failure mode is: file-output STT command + non-transcript stdout + empty/missing transcript file.

Bundled/manual openai-whisper usage commonly relies on Whisper writing .txt files, and whisper-cli/parakeet-mlx integrations also have inferred output-file paths. Even if some commands currently suppress verbose stdout, the runner contract still allows logs or metadata to become transcript text when the expected output file is empty.

Related issues

#84660 is similar in spirit, but it is about empty Moonshine JSON transcripts reaching the LLM in Discord voice mode. This issue is specifically about CLI transcript-file fallback to stdout in the media runner.
#62680 / #62590 were about Telegram voice notes not reaching transcription because wrapper/meta text affected audio preflight routing. This issue is after transcription runs: the runner can select the wrong output source.
#80633 added local whisper.cpp support; this issue is about hardening the output contract for file-producing CLI STT providers.

Suggested fix

Treat inferred transcript-file outputs as authoritative, or add an explicit config/output policy such as outputMode: "file" | "stdout" | "auto".

At minimum, for known file-output STT commands, do not fall back to stdout when the expected transcript file exists but is empty. Consider also treating a missing expected file as failed transcription for commands whose output path can be deterministically inferred.

Regression coverage should include:

non-empty inferred .txt file returns transcript text
empty inferred .txt file plus progress stdout returns no transcript
missing inferred .txt file plus progress stdout returns no transcript, at least for commands configured as file-authoritative

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When OpenClaw can infer that a CLI STT entry is supposed to produce a transcript file, that file should be treated as the transcript source of truth.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Bug: CLI audio transcription can use progress stdout when transcript file is empty [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Code Example

Summary

Affected area

Expected behavior

Actual behavior

Reproduction shape

Why this matters beyond one custom skill

Related issues

Suggested fix

FAQ

Expected behavior

Still need to ship something?

TRENDING