openclaw - ✅(Solved) Fix Bug: model-fallback marks candidate_succeeded even when fallback model returns 404 [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62119Fetched 2026-04-08 03:08:47
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
closed ×1commented ×1cross-referenced ×1

When the primary model (Claude Sonnet) times out, the fallback chain kicks in. However, if the fallback model also errors (404), OpenClaw logs candidate_succeeded instead of continuing to the next fallback. This means the error surfaces to the user instead of cascading through the remaining fallback candidates.

Error Message

When the primary model (Claude Sonnet) times out, the fallback chain kicks in. However, if the fallback model also errors (404), OpenClaw logs candidate_succeeded instead of continuing to the next fallback. This means the error surfaces to the user instead of cascading through the remaining fallback candidates. embedded_run_agent_end: isError=true, error="404 status code (no body)", model="gemini-2.5-pro" The fallback candidate is marked candidate_succeeded even though agent_end fired with isError=true and a 404. The second fallback (e.g. Qwen3) is never attempted. The 404 error surfaces to the user. A 404 (or any non-200 error) from a fallback model should be treated as a failure, and the chain should continue to the next configured fallback.

Root Cause

When the primary model (Claude Sonnet) times out, the fallback chain kicks in. However, if the fallback model also errors (404), OpenClaw logs candidate_succeeded instead of continuing to the next fallback. This means the error surfaces to the user instead of cascading through the remaining fallback candidates.

Fix Action

Fixed

PR fix notes

PR #62244: Fix: HTTP 404 classification for model fallback chain (#62119)

Description (problem / solution / changelog)

Summary

  • Problem: Assistant errors shaped like 404 status code (no body) did not classify as a failover signal, so the embedded runner returned an error payload without throwing. The outer model fallback layer then treated the attempt as success (candidate_succeeded) and skipped remaining fallback models.
  • Why it matters: Users with a broken primary or first fallback (for example a 404 from the provider) never reached the next configured fallback.
  • What changed: In classifyFailoverClassificationFromHttpStatus, HTTP 404 now maps to model_not_found by default, while preserving message-based session, auth, and billing signals (same pattern as HTTP 410).
  • What did NOT change: Other status codes, the embedded runner loop structure, and runWithModelFallback behavior beyond what classification fixes.

Change Type

  • Bug fix

Scope

  • Gateway / orchestration (agent model fallback path)

Linked Issue/PR

  • Closes #62119
  • This PR fixes a bug or regression

Root Cause

  • Root cause: classifyFailoverClassificationFromHttpStatus had no branch for HTTP 404, and bare 404 text without not found wording did not match isModelNotFoundErrorMessage, so classifyFailoverReason returned null.
  • Missing detection / guardrail: HTTP status 404 as a first-class failover signal alongside other status codes.
  • Contributing context: Reported in #62119 with logs showing embedded_run_agent_end with isError=true alongside candidate_succeeded.

Regression Test Plan

  • Coverage: Unit test
  • Target test file: src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts
  • Scenario: classifyFailoverReason("404 status code (no body)")model_not_found; 404 + session/auth/billing strings still follow message classification.

User-visible / Behavior Changes

  • When a fallback model returns a minimal HTTP 404 error, OpenClaw can continue to the next configured model in the fallback chain instead of stopping after the first fallback candidate.

Diagram

N/A

Security Impact

  • New permissions/capabilities? No

Testing

  • pnpm test src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (pass)
  • Full pnpm check was not green in this workspace due to missing acpx/dist/runtime.js in extensions/acpx (pre-existing / local env). Targeted oxfmt + oxlint on touched files passed.

Made with Cursor

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +19/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +15/-4)

Code Example

embedded_run_agent_end: isError=true, error="404 status code (no body)", model="gemini-2.5-pro"
model_fallback_decision: decision="candidate_succeeded", candidateModel="gemini-2.5-pro"
RAW_BUFFERClick to expand / collapse

Summary

When the primary model (Claude Sonnet) times out, the fallback chain kicks in. However, if the fallback model also errors (404), OpenClaw logs candidate_succeeded instead of continuing to the next fallback. This means the error surfaces to the user instead of cascading through the remaining fallback candidates.

Steps to Reproduce

  1. Configure agents.defaults.model.primary as anthropic/claude-sonnet-4-6
  2. Add a broken model as first fallback (e.g. google/gemini-2.5-pro when it returns 404/503)
  3. Add a working model as second fallback (e.g. ollama/qwen3:8b)
  4. Trigger a request that causes the primary to timeout

Observed Behavior

From logs:

embedded_run_agent_end: isError=true, error="404 status code (no body)", model="gemini-2.5-pro"
model_fallback_decision: decision="candidate_succeeded", candidateModel="gemini-2.5-pro"

The fallback candidate is marked candidate_succeeded even though agent_end fired with isError=true and a 404. The second fallback (e.g. Qwen3) is never attempted. The 404 error surfaces to the user.

Expected Behavior

A 404 (or any non-200 error) from a fallback model should be treated as a failure, and the chain should continue to the next configured fallback.

Environment

  • OpenClaw version: 2026.4.5
  • Primary: anthropic/claude-sonnet-4-6
  • Fallbacks: google/gemini-2.5-pro, google/gemini-2.5-flash, ollama/qwen3:8b
  • Observed on both google/gemini-2.5-pro (404) and openai/o4-mini (404) as first fallback

extent analysis

TL;DR

The fallback chain in OpenClaw should be modified to treat non-200 errors from fallback models as failures, allowing the chain to continue to the next configured fallback.

Guidance

  • Review the OpenClaw configuration to ensure that the fallback chain is properly set up and that the candidate_succeeded decision is correctly handled for non-200 errors.
  • Verify that the agent_end event is correctly logging errors with isError=true for non-200 status codes.
  • Check the OpenClaw documentation for any specific settings or flags that control the fallback chain behavior, particularly for handling errors from fallback models.
  • Consider adding custom logging or debugging to track the decision-making process for the fallback chain to better understand why the candidate_succeeded decision is being made for a 404 error.

Example

No code snippet is provided as the issue seems to be related to the configuration and behavior of OpenClaw rather than a specific code implementation.

Notes

The issue seems to be specific to the OpenClaw version 2026.4.5 and the behavior of the fallback chain when encountering non-200 errors from fallback models. The solution may involve modifying the OpenClaw configuration or waiting for an update to the OpenClaw software that addresses this issue.

Recommendation

Apply workaround: Modify the OpenClaw configuration to correctly handle non-200 errors from fallback models, allowing the fallback chain to continue to the next configured fallback. This may involve custom configuration settings or flags that control the fallback chain behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING