openclaw - 💡(How to fix) Fix Bug: CLI audio transcription can use progress stdout when transcript file is empty [1 pull requests]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

CLI-based audio transcription entries can fall back to wrapper/progress stdout when the expected transcript file is empty or missing. That can cause non-transcript status text such as Transcribing with Whisper... or other CLI banners to be injected as the audio transcript.

This is easy to hit with a local Whisper wrapper that writes the actual transcript to {{OutputDir}} and prints progress/status to stdout, but the underlying issue is in the generic media CLI runner contract.

Root Cause

  • #84660 is similar in spirit, but it is about empty Moonshine JSON transcripts reaching the LLM in Discord voice mode. This issue is specifically about CLI transcript-file fallback to stdout in the media runner.
  • #62680 / #62590 were about Telegram voice notes not reaching transcription because wrapper/meta text affected audio preflight routing. This issue is after transcription runs: the runner can select the wrong output source.
  • #80633 added local whisper.cpp support; this issue is about hardening the output contract for file-producing CLI STT providers.

Fix Action

Fixed

Code Example

{
  "type": "cli",
  "command": "node",
  "args": [
    "./skills/local-whisper/transcribe.js",
    "{{MediaPath}}",
    "--model",
    "small",
    "--language",
    "auto",
    "--output-dir",
    "{{OutputDir}}"
  ]
}
RAW_BUFFERClick to expand / collapse

Summary

CLI-based audio transcription entries can fall back to wrapper/progress stdout when the expected transcript file is empty or missing. That can cause non-transcript status text such as Transcribing with Whisper... or other CLI banners to be injected as the audio transcript.

This is easy to hit with a local Whisper wrapper that writes the actual transcript to {{OutputDir}} and prints progress/status to stdout, but the underlying issue is in the generic media CLI runner contract.

Affected area

  • src/media-understanding/runner.entries.ts
  • CLI audio entries configured under tools.media.audio.models
  • Known file-output STT integrations such as whisper, whisper-cli, parakeet-mlx, and custom/local wrappers that use {{OutputDir}} or an equivalent output file

Expected behavior

When OpenClaw can infer that a CLI STT entry is supposed to produce a transcript file, that file should be treated as the transcript source of truth.

If the expected transcript file exists but is empty, or if the expected file is missing, OpenClaw should treat the transcription as empty/failed and skip transcript injection rather than falling back to generic stdout.

Actual behavior

The runner reads the expected transcript file only when content.trim() is non-empty. If the file is empty, missing, or unreadable, it falls through to stdout.trim().

For wrappers that print progress or metadata to stdout while writing the transcript to a file, that progress text can become the transcript seen by the agent.

Reproduction shape

  1. Configure an audio CLI entry that writes the final transcript to {{OutputDir}} and prints progress to stdout, for example:
{
  "type": "cli",
  "command": "node",
  "args": [
    "./skills/local-whisper/transcribe.js",
    "{{MediaPath}}",
    "--model",
    "small",
    "--language",
    "auto",
    "--output-dir",
    "{{OutputDir}}"
  ]
}
  1. Have the CLI exit 0 while producing an empty transcript file, for example for silence/noise/failed recognition.
  2. Have the CLI write progress/status text to stdout.
  3. Observe that the progress/status stdout is returned as the audio transcription instead of being treated as no transcript.

Why this matters beyond one custom skill

This is not specific to one wrapper implementation. The generic failure mode is: file-output STT command + non-transcript stdout + empty/missing transcript file.

Bundled/manual openai-whisper usage commonly relies on Whisper writing .txt files, and whisper-cli/parakeet-mlx integrations also have inferred output-file paths. Even if some commands currently suppress verbose stdout, the runner contract still allows logs or metadata to become transcript text when the expected output file is empty.

Related issues

  • #84660 is similar in spirit, but it is about empty Moonshine JSON transcripts reaching the LLM in Discord voice mode. This issue is specifically about CLI transcript-file fallback to stdout in the media runner.
  • #62680 / #62590 were about Telegram voice notes not reaching transcription because wrapper/meta text affected audio preflight routing. This issue is after transcription runs: the runner can select the wrong output source.
  • #80633 added local whisper.cpp support; this issue is about hardening the output contract for file-producing CLI STT providers.

Suggested fix

Treat inferred transcript-file outputs as authoritative, or add an explicit config/output policy such as outputMode: "file" | "stdout" | "auto".

At minimum, for known file-output STT commands, do not fall back to stdout when the expected transcript file exists but is empty. Consider also treating a missing expected file as failed transcription for commands whose output path can be deterministically inferred.

Regression coverage should include:

  • non-empty inferred .txt file returns transcript text
  • empty inferred .txt file plus progress stdout returns no transcript
  • missing inferred .txt file plus progress stdout returns no transcript, at least for commands configured as file-authoritative

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When OpenClaw can infer that a CLI STT entry is supposed to produce a transcript file, that file should be treated as the transcript source of truth.

If the expected transcript file exists but is empty, or if the expected file is missing, OpenClaw should treat the transcription as empty/failed and skip transcript injection rather than falling back to generic stdout.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: CLI audio transcription can use progress stdout when transcript file is empty [1 pull requests]