openclaw - ✅(Solved) Fix media-understanding: `MediaFetchError` on Discord audio fetch is given up after 1 attempt — no retry [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74316Fetched 2026-04-30 06:25:35
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
2
Timeline (top)
commented ×2cross-referenced ×1mentioned ×1subscribed ×1

When the media-understanding subsystem fails to fetch a Discord audio attachment (MediaFetchError), it gives up after a single attempt with no retry. The transcription pipeline (Groq Whisper API → MLX-whisper local fallback) never gets a chance — the audio file was never downloaded.

User impact: voice notes silently disappear. The transcript may eventually surface in a later prompt window via a separate path, but the model handling the immediately-following turn never sees the voice note content. Compounds badly with degraded sessions, because the agent then confabulates an explanation for "messages that didn't arrive."

Error Message

Medium. Cosmetically silent (a single warn log line) but high user-trust impact: from the user's side, a voice note simply vanished and the agent had no idea it was sent.

Root Cause

User impact: voice notes silently disappear. The transcript may eventually surface in a later prompt window via a separate path, but the model handling the immediately-following turn never sees the voice note content. Compounds badly with degraded sessions, because the agent then confabulates an explanation for "messages that didn't arrive."

Fix Action

Fixed

PR fix notes

PR #74553: fix(media): retry transient remote media fetches

Description (problem / solution / changelog)

Summary

  • Problem: media-understanding remote attachment downloads were single-shot for normal URLs, so a transient CDN/network failure could drop a Discord voice note before transcription.
  • Why it matters: audio preflight and the normal media-understanding path both depend on the shared attachment cache before any STT provider can run.
  • What changed: fetchRemoteMedia now supports opt-in bounded retry for transient fetch/read failures and 5xx responses; MediaAttachmentCache enables three attempts with bounded backoff/jitter.
  • What did NOT change (scope boundary): 4xx responses, SSRF/policy blocks, caller aborts/timeouts, and maxBytes failures still fail fast; no Discord API behavior or real Discord tests were added.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #74316
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the shared media-understanding attachment cache called fetchRemoteMedia once, and fetchRemoteMedia only used dispatcherAttempts for transport fallback, not general transient network/5xx retry.
  • Missing detection / guardrail: no unit coverage locked in transient remote media fetch retry or the fail-fast carve-outs.
  • Contributing context (if known): Discord audio preflight routes attachment URLs through the same media-understanding cache, so a fetch failure prevented transcription from starting.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/media/fetch.test.ts, src/media-understanding/media-understanding-url-fallback.test.ts
  • Scenario the test should lock in: retry transient fetch failures, transient body read failures, and 5xx responses when retry is enabled; do not retry 4xx, abort, SSRF block, or maxBytes.
  • Why this is the smallest reliable guardrail: it exercises the shared fetch/cache boundary used by Discord preflight and normal media-understanding without relying on Discord or provider services.
  • Existing test that already covers this (if any): none for general remote media retry.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Remote media attachments processed by media-understanding can recover from transient network/read failures or 5xx responses before audio or vision processing starts.

Diagram (if applicable)

Before:
remote attachment -> one fetch failure -> media-understanding fails

After:
remote attachment -> transient failure/5xx -> bounded retry -> media-understanding continues if retry succeeds

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) Yes
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: existing remote media fetches may be retried up to three attempts only for transient network/read failures and 5xx responses. 4xx, SSRF/policy block, caller abort, and maxBytes failures remain fail-fast.

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 24 / pnpm
  • Model/provider: N/A
  • Integration/channel (if any): Discord audio preflight path, via shared media-understanding cache
  • Relevant config (redacted): N/A

Steps

  1. Trigger media-understanding on a remote audio attachment URL.
  2. Have the first remote media fetch fail with a transient network/read error or 5xx.
  3. Retry the same fetch through the shared cache.

Expected

  • Transient failures and 5xx responses are retried within a bounded budget.
  • 4xx, SSRF/policy block, caller abort, and maxBytes failures are not retried.

Actual

  • Before this change, the cache made one fetch attempt and rethrew the failure.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • pnpm test src/media/fetch.test.ts src/media-understanding/media-understanding-url-fallback.test.ts src/media-understanding/audio-preflight.test.ts extensions/discord/src/monitor/preflight-audio.test.ts extensions/discord/src/monitor/message-utils.test.ts
    • pnpm exec oxfmt --check --threads=1 src/media/fetch.ts src/media/fetch.test.ts src/media-understanding/attachments.cache.ts src/media-understanding/media-understanding-url-fallback.test.ts CHANGELOG.md
    • git diff --check
    • pnpm check:changed
    • codex review --base origin/main
  • Edge cases checked: transient fetch error, transient response body read error, 5xx retry, 4xx fail-fast, caller abort fail-fast, SSRF block fail-fast, maxBytes fail-fast.
  • What you did not verify: real Discord CDN or live STT provider behavior.
  • AI-assisted: yes. I reviewed the diff and understand the code path.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: retrying existing media fetches can add latency for attachments that keep failing.
    • Mitigation: retry is bounded to three attempts, only for transient network/read failures and 5xx, with policy/size/auth/not-found/abort failures left fail-fast.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/media-understanding/attachments.cache.ts (modified, +9/-1)
  • src/media-understanding/media-understanding-url-fallback.test.ts (modified, +10/-2)
  • src/media/fetch.test.ts (modified, +141/-0)
  • src/media/fetch.ts (modified, +49/-1)

Code Example

2026-04-28T21:35:52.233+07:00 [media-understanding] audio: failed (0/1) reason=MediaFetchError
RAW_BUFFERClick to expand / collapse

Summary

When the media-understanding subsystem fails to fetch a Discord audio attachment (MediaFetchError), it gives up after a single attempt with no retry. The transcription pipeline (Groq Whisper API → MLX-whisper local fallback) never gets a chance — the audio file was never downloaded.

User impact: voice notes silently disappear. The transcript may eventually surface in a later prompt window via a separate path, but the model handling the immediately-following turn never sees the voice note content. Compounds badly with degraded sessions, because the agent then confabulates an explanation for "messages that didn't arrive."

Reproduction

Conditions: Discord channel/DM, OpenClaw with audio media enabled (media.audio.enabled: true, model chain groq/whisper-large-v3-turbo → MLX mlx_whisper), normal Claude-max provider routing.

  1. User sends a Discord voice note.
  2. Discord delivers the message webhook with attachment URL.
  3. OpenClaw's media-understanding subsystem attempts to fetch the audio file.
  4. Fetch fails (transient network blip, slow CDN propagation, or signed-token timing edge case).
  5. Logs:
    2026-04-28T21:35:52.233+07:00 [media-understanding] audio: failed (0/1) reason=MediaFetchError
  6. The voice note never reaches the active model prompt; turn proceeds without it.

(0/1) = 0 successes out of 1 attempt. No retry, no backoff.

Why a retry is the right fix

Discord CDN is the primary source of MediaFetchErrors in real-world OpenClaw stacks. CDN flakes are transient by nature — the same URL retried after 1–3 seconds usually succeeds. Single-shot fetch + give-up converts a transient network blip into permanent voice-note loss.

Adjacent providers in the same subsystem appear to handle retry differently — both image: failed and audio: failed log lines use the same failed (0/1) shape, suggesting a shared no-retry contract worth widening.

Suggested implementation

Add a small retry-with-backoff to the media fetch step (before the transcription/vision model is even selected):

  • 3 attempts max
  • Exponential backoff: ~500 ms, ~1500 ms, ~3000 ms
  • Jitter ±20%
  • Only retry on transient network errors (timeouts, 5xx, EOF, connection reset). 4xx (404, 403) should fail fast — those won't recover.
  • Surface the final retry count in the log line: failed (0/3) instead of (0/1), so operators can tell at a glance whether retry was attempted.

This is a small, well-scoped change with high user-facing benefit; failure mode today is "voice note disappears with no recovery path."

Severity

Medium. Cosmetically silent (a single warn log line) but high user-trust impact: from the user's side, a voice note simply vanished and the agent had no idea it was sent.

Related

  • This issue surfaced while diagnosing a session-state cascade where the lost voice note then got attributed to a fabricated "text serialization bug" by the agent. The voice note loss itself was the only real bug — but it was hard to diagnose because no retry attempt is visible in the logs.

extent analysis

TL;DR

Implement a retry mechanism with exponential backoff in the media fetch step to handle transient network errors.

Guidance

  • Identify the specific error cases that should trigger a retry, such as timeouts, 5xx errors, EOF, and connection resets.
  • Implement a retry mechanism with a maximum of 3 attempts and exponential backoff (e.g., 500ms, 1500ms, 3000ms) with 20% jitter.
  • Modify the log line to surface the final retry count, e.g., failed (0/3), to provide visibility into retry attempts.
  • Ensure that 4xx errors (e.g., 404, 403) are not retried, as they are unlikely to recover.

Example

import time
import random

def fetch_media_with_retry(url, max_attempts=3, initial_backoff=0.5):
    attempts = 0
    backoff = initial_backoff
    while attempts < max_attempts:
        try:
            # Fetch media logic here
            return media
        except TransientNetworkError:
            attempts += 1
            backoff *= 2
            backoff_with_jitter = backoff * (1 + random.uniform(-0.2, 0.2))
            time.sleep(backoff_with_jitter)
    # Log failure with retry count
    print(f"failed ({attempts}/{max_attempts})")
    return None

Notes

The suggested implementation should be adapted to the specific programming language and framework used in the OpenClaw project.

Recommendation

Apply the workaround by implementing a retry mechanism with exponential backoff, as it is a well-scoped change with high user-facing benefit and can help prevent voice note loss due to transient network errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix media-understanding: `MediaFetchError` on Discord audio fetch is given up after 1 attempt — no retry [1 pull requests, 2 comments, 2 participants]