openclaw - ✅(Solved) Fix media-understanding: `MediaFetchError` on Discord audio fetch is given up after 1 attempt — no retry [1 pull requests, 2 comments, 2 participants]

gabrielexito-stack · 2026-04-29T11:51:59Z

[openclaw] When the media-understanding subsystem fails to fetch a Discord audio attachment MediaFetchError , it gives up after a single attempt with no retry.… When the `media-understanding` subsystem fails to fetch a Discord audio attachment (`MediaFetchError`), it gives up after **a single attempt** with no retry. The transcription pipeline (Groq Whisper API → MLX-whisper local fallback) never gets a chance — the audio file was never downloaded. User impact: voice notes silently disappear. The transcript may eventually surface in a later prompt window via a separate path, but the model handling the immediately-following turn never sees the voice note content. Compounds badly with degraded sessions, because the agent then confabulates an explanation for "messages that didn't arrive." # PR #74553: fix(media): retry transient remote media fetches - Repository: openclaw/openclaw - Author: vyctorbrzezowski - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/74553 ## Description (problem / solution / changelog) ## Summary - Problem: media-understanding remote attachment downloads were single-shot for normal URLs, so a transient CDN/network failure could drop a Discord voice note before transcription. - Why it matters: audio preflight and the normal media-understanding path both depend on the shared attachment cache before any STT provider can run. - What changed: `fetchRemoteMedia` now supports opt-in bounded retry for transient fetch/read failures and 5xx responses; `MediaAttachmentCache` enables three attempts with bounded backoff/jitter. - What did NOT change (scope boundary): 4xx responses, SSRF/policy blocks, caller aborts/timeouts, and maxBytes failures still fail fast; no Discord API behavior or real Discord tests were added. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #74316 - [x] This PR fixes a bug or regression ## Root Cause (if applicable) - Root cause: the shared media-understanding attachment cache called `fetchRemoteMedia` once, and `fetchRemoteMedia` only used `dispatcherAttempts` for transport fallback, not general transient network/5xx retry. - Missing detection / guardrail: no unit coverage locked in transient remote media fetch retry or the fail-fast carve-outs. - Contributing context (if known): Discord audio preflight routes attachment URLs through the same media-understanding cache, so a fetch failure prevented transcription from starting. ## Regression Test Plan (if applicable) - Coverage level that should have caught this: - [x] Unit test - [x] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: `src/media/fetch.test.ts`, `src/media-understanding/media-understanding-url-fallback.test.ts` - Scenario the test should lock in: retry transient fetch failures, transient body read failures, and 5xx responses when retry is enabled; do not retry 4xx, abort, SSRF block, or maxBytes. - Why this is the smallest reliable guardrail: it exercises the shared fetch/cache boundary used by Discord preflight and normal media-understanding without relying on Discord or provider services. - Existing test that already covers this (if any): none for general remote media retry. - If no new test is added, why not: N/A ## User-visible / Behavior Changes Remote media attachments processed by media-understanding can recover from transient network/read failures or 5xx responses before audio or vision processing starts. ## Diagram (if applicable) ```text Before: remote attachment -> one fetch failure -> media-understanding fails After: remote attachment -> transient failure/5xx -> bounded retry -> media-understanding continues if retry succeeds ``` ## Security Impact (required) - New permissions/capabilities? (`Yes/No`) No - Secrets/tokens handling changed? (`Yes/No`) No - New/changed network calls? (`Yes/No`) Yes - Command/tool execution surface changed? (`Yes/No`) No - Data access scope changed? (`Yes/No`) No - If any `Yes`, explain risk + mitigation: existing remote media fetches may be retried up to three attempts only for transient network/read failures and 5xx responses. 4xx, SSRF/policy block, caller abort, and maxBytes failures remain fail-fast. ## Repro + Verification ### Environment - OS: macOS - Runtime/container: Node 24 / pnpm - Model/provider: N/A - Integration/channel (if any): Discord audio preflight path, via shared media-understanding cache - Relevant config (redacted): N/A ### Steps 1. Trigger media-understanding on a remote audio attachment URL. 2. Have the first remote media fetch fail with a transient network/read error or 5xx. 3. Retry the same fetch through the shared

openclaw2026-04-29 11:51:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74316•Fetched 2026-04-30 06:25:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

gabrielexito-stack

Participants

clawsweeper[bot]

gabrielexito-stack

Timeline (top)

commented ×2cross-referenced ×1mentioned ×1subscribed ×1

When the media-understanding subsystem fails to fetch a Discord audio attachment (MediaFetchError), it gives up after a single attempt with no retry. The transcription pipeline (Groq Whisper API → MLX-whisper local fallback) never gets a chance — the audio file was never downloaded.

User impact: voice notes silently disappear. The transcript may eventually surface in a later prompt window via a separate path, but the model handling the immediately-following turn never sees the voice note content. Compounds badly with degraded sessions, because the agent then confabulates an explanation for "messages that didn't arrive."

Error Message

Medium. Cosmetically silent (a single warn log line) but high user-trust impact: from the user's side, a voice note simply vanished and the agent had no idea it was sent.

Root Cause

Fix Action

Fixed

Fixed by PR: fix(media): retry transient remote media fetches (https://github.com/openclaw/openclaw/pull/74553)

PR fix notes

PR #74553: fix(media): retry transient remote media fetches

Repository: openclaw/openclaw
Author: vyctorbrzezowski
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/74553

Description (problem / solution / changelog)

Summary

Problem: media-understanding remote attachment downloads were single-shot for normal URLs, so a transient CDN/network failure could drop a Discord voice note before transcription.
Why it matters: audio preflight and the normal media-understanding path both depend on the shared attachment cache before any STT provider can run.
What changed: fetchRemoteMedia now supports opt-in bounded retry for transient fetch/read failures and 5xx responses; MediaAttachmentCache enables three attempts with bounded backoff/jitter.
What did NOT change (scope boundary): 4xx responses, SSRF/policy blocks, caller aborts/timeouts, and maxBytes failures still fail fast; no Discord API behavior or real Discord tests were added.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #74316
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: the shared media-understanding attachment cache called fetchRemoteMedia once, and fetchRemoteMedia only used dispatcherAttempts for transport fallback, not general transient network/5xx retry.
Missing detection / guardrail: no unit coverage locked in transient remote media fetch retry or the fail-fast carve-outs.
Contributing context (if known): Discord audio preflight routes attachment URLs through the same media-understanding cache, so a fetch failure prevented transcription from starting.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/media/fetch.test.ts, src/media-understanding/media-understanding-url-fallback.test.ts
Scenario the test should lock in: retry transient fetch failures, transient body read failures, and 5xx responses when retry is enabled; do not retry 4xx, abort, SSRF block, or maxBytes.
Why this is the smallest reliable guardrail: it exercises the shared fetch/cache boundary used by Discord preflight and normal media-understanding without relying on Discord or provider services.
Existing test that already covers this (if any): none for general remote media retry.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Remote media attachments processed by media-understanding can recover from transient network/read failures or 5xx responses before audio or vision processing starts.

Diagram (if applicable)

Before:
remote attachment -> one fetch failure -> media-understanding fails

After:
remote attachment -> transient failure/5xx -> bounded retry -> media-understanding continues if retry succeeds

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) Yes
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: existing remote media fetches may be retried up to three attempts only for transient network/read failures and 5xx responses. 4xx, SSRF/policy block, caller abort, and maxBytes failures remain fail-fast.

Repro + Verification

Environment

OS: macOS
Runtime/container: Node 24 / pnpm
Model/provider: N/A
Integration/channel (if any): Discord audio preflight path, via shared media-understanding cache
Relevant config (redacted): N/A

Steps

Trigger media-understanding on a remote audio attachment URL.
Have the first remote media fetch fail with a transient network/read error or 5xx.
Retry the same fetch through the shared cache.

Expected

Transient failures and 5xx responses are retried within a bounded budget.
4xx, SSRF/policy block, caller abort, and maxBytes failures are not retried.

Actual

Before this change, the cache made one fetch attempt and rethrew the failure.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- pnpm test src/media/fetch.test.ts src/media-understanding/media-understanding-url-fallback.test.ts src/media-understanding/audio-preflight.test.ts extensions/discord/src/monitor/preflight-audio.test.ts extensions/discord/src/monitor/message-utils.test.ts
- pnpm exec oxfmt --check --threads=1 src/media/fetch.ts src/media/fetch.test.ts src/media-understanding/attachments.cache.ts src/media-understanding/media-understanding-url-fallback.test.ts CHANGELOG.md
- git diff --check
- pnpm check:changed
- codex review --base origin/main
Edge cases checked: transient fetch error, transient response body read error, 5xx retry, 4xx fail-fast, caller abort fail-fast, SSRF block fail-fast, maxBytes fail-fast.
What you did not verify: real Discord CDN or live STT provider behavior.
AI-assisted: yes. I reviewed the diff and understand the code path.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: retrying existing media fetches can add latency for attachments that keep failing.
- Mitigation: retry is bounded to three attempts, only for transient network/read failures and 5xx, with policy/size/auth/not-found/abort failures left fail-fast.

Changed files

CHANGELOG.md (modified, +1/-0)
src/media-understanding/attachments.cache.ts (modified, +9/-1)
src/media-understanding/media-understanding-url-fallback.test.ts (modified, +10/-2)
src/media/fetch.test.ts (modified, +141/-0)
src/media/fetch.ts (modified, +49/-1)

Code Example

2026-04-28T21:35:52.233+07:00 [media-understanding] audio: failed (0/1) reason=MediaFetchError

RAW_BUFFERClick to expand / collapse

Summary

Reproduction

Conditions: Discord channel/DM, OpenClaw with audio media enabled (media.audio.enabled: true, model chain groq/whisper-large-v3-turbo → MLX mlx_whisper), normal Claude-max provider routing.

User sends a Discord voice note.
Discord delivers the message webhook with attachment URL.
OpenClaw's media-understanding subsystem attempts to fetch the audio file.
Fetch fails (transient network blip, slow CDN propagation, or signed-token timing edge case).

Logs:

2026-04-28T21:35:52.233+07:00 [media-understanding] audio: failed (0/1) reason=MediaFetchError

The voice note never reaches the active model prompt; turn proceeds without it.

(0/1) = 0 successes out of 1 attempt. No retry, no backoff.

Why a retry is the right fix

Discord CDN is the primary source of MediaFetchErrors in real-world OpenClaw stacks. CDN flakes are transient by nature — the same URL retried after 1–3 seconds usually succeeds. Single-shot fetch + give-up converts a transient network blip into permanent voice-note loss.

Adjacent providers in the same subsystem appear to handle retry differently — both image: failed and audio: failed log lines use the same failed (0/1) shape, suggesting a shared no-retry contract worth widening.

Suggested implementation

Add a small retry-with-backoff to the media fetch step (before the transcription/vision model is even selected):

3 attempts max
Exponential backoff: ~500 ms, ~1500 ms, ~3000 ms
Jitter ±20%
Only retry on transient network errors (timeouts, 5xx, EOF, connection reset). 4xx (404, 403) should fail fast — those won't recover.
Surface the final retry count in the log line: failed (0/3) instead of (0/1), so operators can tell at a glance whether retry was attempted.

This is a small, well-scoped change with high user-facing benefit; failure mode today is "voice note disappears with no recovery path."

Severity

Medium. Cosmetically silent (a single warn log line) but high user-trust impact: from the user's side, a voice note simply vanished and the agent had no idea it was sent.

This issue surfaced while diagnosing a session-state cascade where the lost voice note then got attributed to a fabricated "text serialization bug" by the agent. The voice note loss itself was the only real bug — but it was hard to diagnose because no retry attempt is visible in the logs.

extent analysis

TL;DR

Implement a retry mechanism with exponential backoff in the media fetch step to handle transient network errors.

Guidance

Identify the specific error cases that should trigger a retry, such as timeouts, 5xx errors, EOF, and connection resets.
Implement a retry mechanism with a maximum of 3 attempts and exponential backoff (e.g., 500ms, 1500ms, 3000ms) with 20% jitter.
Modify the log line to surface the final retry count, e.g., failed (0/3), to provide visibility into retry attempts.
Ensure that 4xx errors (e.g., 404, 403) are not retried, as they are unlikely to recover.

Example

import time
import random

def fetch_media_with_retry(url, max_attempts=3, initial_backoff=0.5):
    attempts = 0
    backoff = initial_backoff
    while attempts < max_attempts:
        try:
            # Fetch media logic here
            return media
        except TransientNetworkError:
            attempts += 1
            backoff *= 2
            backoff_with_jitter = backoff * (1 + random.uniform(-0.2, 0.2))
            time.sleep(backoff_with_jitter)
    # Log failure with retry count
    print(f"failed ({attempts}/{max_attempts})")
    return None

Notes

The suggested implementation should be adapted to the specific programming language and framework used in the OpenClaw project.

Recommendation

Apply the workaround by implementing a retry mechanism with exponential backoff, as it is a well-scoped change with high user-facing benefit and can help prevent voice note loss due to transient network errors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix media-understanding: `MediaFetchError` on Discord audio fetch is given up after 1 attempt — no retry [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #74553: fix(media): retry transient remote media fetches

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Summary

Reproduction

Why a retry is the right fix

Suggested implementation

Severity

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING