When a voice note is skipped due to being under the 1024-byte threshold, OpenClaw should inject a clear message to the agent, e.g.: - `[Voice note was empty or contained only silence]` - Or set `{{Transcript}}` to a descriptive placeholder This way the agent can respond appropriately ("Looks like the voice note was silent — want to try again?") instead of fabricating an error explanation.

openclaw - ✅(Solved) Fix Silent/empty voice notes should inject a clear message instead of appearing as an API error [3 pull requests, 2 comments, 3 participants]

DonShelly · 2026-03-17T10:58:27Z

[openclaw] PR 49131: fix: add placeholder transcript for silent voice notes - Repository: openclaw/openclaw - Author: eulicesl - State: open | merged: False -… # PR #49131: fix: add placeholder transcript for silent voice notes - Repository: openclaw/openclaw - Author: eulicesl - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/49131 ## Description (problem / solution / changelog) ## Summary - inject a synthetic placeholder transcript when tiny audio is skipped as `tooSmall` - preserve the existing transcript/body formatting flow by emitting a synthetic `audio.transcription` output - add coverage for URL-only tiny audio so the agent gets deterministic context instead of a blank transcript gap ## Problem Tiny or silent voice notes below `MIN_AUDIO_FILE_BYTES` are intentionally skipped before provider transcription. That leaves the downstream agent context without a transcript signal, which can lead to misleading explanations instead of a simple “that voice note was silent/empty” response. ## What changed When audio is skipped with the `tooSmall` reason, `applyMediaUnderstanding()` now backfills a synthetic transcript: `[Voice note was empty or contained only silence — no speech detected]` This keeps the existing transcript formatting and body update behavior intact without calling the provider for tiny files. ## Validation - `pnpm build` ✅ - `pnpm check` ✅ - Targeted tests: - `src/media-understanding/runner.skip-tiny-audio.test.ts` ✅ - `src/media-understanding/apply.test.ts` ✅ - `src/media-understanding/format.test.ts` ✅ ## Notes - I could not run `codex review --base origin/main` in this environment because the `codex` CLI is not installed here. - `pnpm test` is not fully green on clean upstream `main` in this environment. I isolated the existing failures to: - `src/media-understanding/apply.echo-transcript.test.ts` - `src/memory/embeddings-gemini.test.ts` These reproduced unchanged against a clean `upstream/main` worktree before/independent of this patch. Closes #48944 ## Changed files - `src/media-understanding/apply.test.ts` (modified, +61/-3) - `src/media-understanding/apply.ts` (modified, +87/-1) - `src/media-understanding/format.test.ts` (modified, +25/-0) ## Fixed - Fixed by PR: fix: add placeholder transcript for silent voice notes (https://github.com/openclaw/openclaw/pull/49131) - Fixed by PR: Media: surface empty voice-note placeholders (https://github.com/openclaw/openclaw/pull/54991) - Fixed by PR: fix: inject synthetic transcript for tiny audio instead of silent skip (https://github.com/openclaw/openclaw/pull/55111) ## Problem When a user sends a silent or near-empty voice note, the audio file is under 1024 bytes and gets skipped by the transcription pipeline (as documented: "Tiny/empty audio files below 1024 bytes are skipped before provider/CLI transcription"). However, the agent doesn't receive a clear indication that the voice note was empty/silent. Instead, the missing or broken transcript context causes the agent to hallucinate an explanation — typically telling the user there's an API quota issue, which is inaccurate and confusing. ## Expected behavior When a voice note is skipped due to being under the 1024-byte threshold, OpenClaw should inject a clear message to the agent, e.g.: - `[Voice note was empty or contained only silence]` - Or set `{{Transcript}}` to a descriptive placeholder This way the agent can respond appropriately ("Looks like the voice note was silent — want to try again?") instead of fabricating an error explanation. ## Steps to reproduce 1. Send a voice note with no speech (just silence or very short accidental tap) 2. The audio file will be under 1024 bytes 3. Transcription is skipped 4. Agent receives unclear context and responds with a misleading API error message ## Environment - OpenClaw on Raspberry Pi 5 - Local Whisper model for transcription - Affects all agent configurations

openclaw2026-03-17 10:58:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#48944•Fetched 2026-04-08 00:50:42

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×3commented ×2referenced ×1

Error Message

This way the agent can respond appropriately ("Looks like the voice note was silent — want to try again?") instead of fabricating an error explanation. 4. Agent receives unclear context and responds with a misleading API error message

Fix Action

Fixed

Fixed by PR: fix: add placeholder transcript for silent voice notes (https://github.com/openclaw/openclaw/pull/49131)
Fixed by PR: Media: surface empty voice-note placeholders (https://github.com/openclaw/openclaw/pull/54991)
Fixed by PR: fix: inject synthetic transcript for tiny audio instead of silent skip (https://github.com/openclaw/openclaw/pull/55111)

PR fix notes

PR #49131: fix: add placeholder transcript for silent voice notes

Repository: openclaw/openclaw
Author: eulicesl
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/49131

Description (problem / solution / changelog)

Summary

inject a synthetic placeholder transcript when tiny audio is skipped as tooSmall
preserve the existing transcript/body formatting flow by emitting a synthetic audio.transcription output
add coverage for URL-only tiny audio so the agent gets deterministic context instead of a blank transcript gap

Problem

Tiny or silent voice notes below MIN_AUDIO_FILE_BYTES are intentionally skipped before provider transcription. That leaves the downstream agent context without a transcript signal, which can lead to misleading explanations instead of a simple “that voice note was silent/empty” response.

What changed

When audio is skipped with the tooSmall reason, applyMediaUnderstanding() now backfills a synthetic transcript:

[Voice note was empty or contained only silence — no speech detected]

This keeps the existing transcript formatting and body update behavior intact without calling the provider for tiny files.

Validation

pnpm build ✅
pnpm check ✅
Targeted tests:
- src/media-understanding/runner.skip-tiny-audio.test.ts ✅
- src/media-understanding/apply.test.ts ✅
- src/media-understanding/format.test.ts ✅

Notes

I could not run codex review --base origin/main in this environment because the codex CLI is not installed here.
pnpm test is not fully green on clean upstream main in this environment. I isolated the existing failures to:
- src/media-understanding/apply.echo-transcript.test.ts
- src/memory/embeddings-gemini.test.ts These reproduced unchanged against a clean upstream/main worktree before/independent of this patch.

Closes #48944

Changed files

src/media-understanding/apply.test.ts (modified, +61/-3)
src/media-understanding/apply.ts (modified, +87/-1)
src/media-understanding/format.test.ts (modified, +25/-0)

RAW_BUFFERClick to expand / collapse

Problem

When a user sends a silent or near-empty voice note, the audio file is under 1024 bytes and gets skipped by the transcription pipeline (as documented: "Tiny/empty audio files below 1024 bytes are skipped before provider/CLI transcription").

However, the agent doesn't receive a clear indication that the voice note was empty/silent. Instead, the missing or broken transcript context causes the agent to hallucinate an explanation — typically telling the user there's an API quota issue, which is inaccurate and confusing.

Expected behavior

When a voice note is skipped due to being under the 1024-byte threshold, OpenClaw should inject a clear message to the agent, e.g.:

[Voice note was empty or contained only silence]
Or set {{Transcript}} to a descriptive placeholder

This way the agent can respond appropriately ("Looks like the voice note was silent — want to try again?") instead of fabricating an error explanation.

Steps to reproduce

Send a voice note with no speech (just silence or very short accidental tap)
The audio file will be under 1024 bytes
Transcription is skipped
Agent receives unclear context and responds with a misleading API error message

Environment

OpenClaw on Raspberry Pi 5
Local Whisper model for transcription
Affects all agent configurations

extent analysis

Fix Plan

To address this issue, we need to modify the transcription pipeline to inject a clear message when a voice note is skipped due to being under the 1024-byte threshold.

Step-by-Step Solution

Check audio file size: Before sending the audio file to the transcription pipeline, check its size.
Inject clear message: If the audio file size is under 1024 bytes, inject a clear message to the agent, e.g., [Voice note was empty or contained only silence].
Update agent response: Update the agent to respond appropriately when receiving the clear message.

Example Code

import os

# Check audio file size
def check_audio_file_size(audio_file_path):
    file_size = os.path.getsize(audio_file_path)
    if file_size < 1024:
        return "[Voice note was empty or contained only silence]"
    else:
        return None

# Inject clear message
def inject_clear_message(transcript_context, audio_file_path):
    clear_message = check_audio_file_size(audio_file_path)
    if clear_message:
        transcript_context["Transcript"] = clear_message
    return transcript_context

# Example usage
audio_file_path = "path/to/audio/file.wav"
transcript_context = {}
transcript_context = inject_clear_message(transcript_context, audio_file_path)
print(transcript_context)

Verification

To verify that the fix worked, send a voice note with no speech and check the agent's response. The agent should respond with a message indicating that the voice note was silent, instead of fabricating an error explanation.

Extra Tips

Make sure to update the agent's response logic to handle the new clear message.
Consider adding additional logging or monitoring to track instances where the clear message is injected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When a voice note is skipped due to being under the 1024-byte threshold, OpenClaw should inject a clear message to the agent, e.g.:

[Voice note was empty or contained only silence]
Or set {{Transcript}} to a descriptive placeholder

This way the agent can respond appropriately ("Looks like the voice note was silent — want to try again?") instead of fabricating an error explanation.

#api #ssr #installation #tensor shape #autograd error #dependency conflict #environment setup #docker error #permission error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Silent/empty voice notes should inject a clear message instead of appearing as an API error [3 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #49131: fix: add placeholder transcript for silent voice notes

Description (problem / solution / changelog)

Summary

Problem

What changed

Validation

Notes

Changed files

Problem

Expected behavior

Steps to reproduce

Environment

extent analysis

Fix Plan

Step-by-Step Solution

Example Code

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Silent/empty voice notes should inject a clear message instead of appearing as an API error [3 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #49131: fix: add placeholder transcript for silent voice notes

Description (problem / solution / changelog)

Summary

Problem

What changed

Validation

Notes

Changed files

Problem

Expected behavior

Steps to reproduce

Environment

extent analysis

Fix Plan

Step-by-Step Solution

Example Code

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING