openclaw - 💡(How to fix) Fix Model fallback cascade takes 30s per invalid candidate instead of failing fast (v2026.4.24–.25) [1 comments, 2 participants]

openclaw2026-05-02 17:01:01

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#76165•Fetched 2026-05-03 04:41:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

vdruts

Participants

clawsweeper[bot]

vdruts

Timeline (top)

commented ×1cross-referenced ×1unsubscribed ×1

Split from #75687 (closed prematurely — only Bugs 1–2 were addressed on main). This is Bug 4 from that report.

Non-retryable 400 errors (e.g. invalid model ID, unsupported parameter) take ~30 seconds each before the fallback fires. With a 3-entry fallback chain, a single user message can stall for 90+ seconds before reaching a working model.

In v2026.4.23, the same fallback chain failed over in seconds. The 30s delay in .24/.25 suggests an internal retry or timeout is wrapping deterministic, non-retryable errors.

Example scenario

Chain: groq/model-a (400 invalid) → openrouter/model-b (400 invalid) → anthropic/claude-sonnet

v2026.4.23: ~2s total to reach claude-sonnet
v2026.4.24+: ~60-90s total to reach claude-sonnet (30s × 2 invalid candidates)

This is especially painful combined with event loop pressure — the 30s blocking compounds with other CPU-bound work.

Environment

openclaw: v2026.4.24, v2026.4.25 (regressed) — v2026.4.23 (fast failover)
node: 22.21.1
os: Windows 11 Pro N (10.0.26200)
platform: win32
bots: 8 Telegram accounts

Root Cause

Split from #75687 (closed prematurely — only Bugs 1–2 were addressed on main). This is Bug 4 from that report.

In v2026.4.23, the same fallback chain failed over in seconds. The 30s delay in .24/.25 suggests an internal retry or timeout is wrapping deterministic, non-retryable errors.

Example scenario

Chain: groq/model-a (400 invalid) → openrouter/model-b (400 invalid) → anthropic/claude-sonnet

v2026.4.23: ~2s total to reach claude-sonnet
v2026.4.24+: ~60-90s total to reach claude-sonnet (30s × 2 invalid candidates)

This is especially painful combined with event loop pressure — the 30s blocking compounds with other CPU-bound work.

Environment

openclaw: v2026.4.24, v2026.4.25 (regressed) — v2026.4.23 (fast failover)
node: 22.21.1
os: Windows 11 Pro N (10.0.26200)
platform: win32
bots: 8 Telegram accounts

Code Example

Chain: groq/model-a (400 invalid) → openrouter/model-b (400 invalid) → anthropic/claude-sonnet

---

openclaw: v2026.4.24, v2026.4.25 (regressed) — v2026.4.23 (fast failover)
node: 22.21.1
os: Windows 11 Pro N (10.0.26200)
platform: win32
bots: 8 Telegram accounts

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Split from #75687 (closed prematurely — only Bugs 1–2 were addressed on main). This is Bug 4 from that report.

In v2026.4.23, the same fallback chain failed over in seconds. The 30s delay in .24/.25 suggests an internal retry or timeout is wrapping deterministic, non-retryable errors.

Example scenario

Chain: groq/model-a (400 invalid) → openrouter/model-b (400 invalid) → anthropic/claude-sonnet

v2026.4.23: ~2s total to reach claude-sonnet
v2026.4.24+: ~60-90s total to reach claude-sonnet (30s × 2 invalid candidates)

This is especially painful combined with event loop pressure — the 30s blocking compounds with other CPU-bound work.

Environment

openclaw: v2026.4.24, v2026.4.25 (regressed) — v2026.4.23 (fast failover)
node: 22.21.1
os: Windows 11 Pro N (10.0.26200)
platform: win32
bots: 8 Telegram accounts

Steps to reproduce

Configure a model fallback chain where the first 1-2 entries return 400 (e.g. invalid model ID)
Send a message to any agent
Observe ~30s delay per invalid model before fallback fires
Compare with v2026.4.23 where the same chain fails over in seconds

Expected behavior

Non-retryable errors (HTTP 400, 404, invalid model) should fail over to the next candidate immediately — no retry, no timeout. These are deterministic failures.

Actual behavior

Each invalid candidate blocks for ~30 seconds before the fallback fires. A 3-entry chain with 2 invalid models stalls for 60-90 seconds.

OpenClaw version

v2026.4.24, v2026.4.25

Operating system

Windows

Related issues

Parent: #75687 (closed — only startup fanout and Bonjour crash were fixed)
Related: #75707 (broader event loop saturation — still open)

extent analysis

TL;DR

The most likely fix is to adjust the timeout or retry settings for non-retryable errors in the model fallback chain to immediately fail over to the next candidate.

Guidance

Investigate the changes made between v2026.4.23 and v2026.4.24 to identify the introduction of the 30-second delay for non-retryable errors.
Review the configuration of the model fallback chain to ensure that it is set up to fail over immediately for deterministic failures like HTTP 400 or invalid model IDs.
Consider implementing a custom error handling mechanism to bypass the default retry or timeout behavior for non-retryable errors.
Test the fallback chain with different error scenarios to verify that the fix works as expected.

Example

No code snippet is provided as the issue does not contain sufficient information about the specific code changes or configurations.

Notes

The issue seems to be specific to the OpenClaw version v2026.4.24 and later, and may be related to changes in the error handling or retry mechanism. Further investigation is needed to determine the root cause and implement a fix.

Recommendation

Apply a workaround by adjusting the timeout or retry settings for non-retryable errors in the model fallback chain, as the root cause of the issue is not immediately clear and may require further investigation.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Non-retryable errors (HTTP 400, 404, invalid model) should fail over to the next candidate immediately — no retry, no timeout. These are deterministic failures.

#inference speed #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix Model fallback cascade takes 30s per invalid candidate instead of failing fast (v2026.4.24–.25) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Example scenario

Environment

Root Cause

Example scenario

Environment

Code Example

Bug type

Beta release blocker

Summary

Example scenario

Environment

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING