openclaw - 💡(How to fix) Fix Model fallback retries primary too aggressively before moving to next fallback [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57906Fetched 2026-04-08 01:56:15
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Timeline (top)
commented ×1cross-referenced ×1
RAW_BUFFERClick to expand / collapse

Problem

When the primary model (e.g., Opus) returns overloaded_error, the gateway retries it for 30-60 seconds before attempting the first fallback (Sonnet). Even then, if Sonnet is also overloaded, GPT-5.4 as the third fallback is rarely reached.

Expected behavior

After 1-2 failed attempts on primary (within ~5 seconds), move to the first fallback. If that also fails after 1-2 attempts, move to the next. The full fallback chain should be exhausted within 10-15 seconds, not 60+.

Environment

  • OpenClaw 2026.3.28
  • Fallback config: anthropic/claude-opus-4-6 → anthropic/claude-sonnet-4-6 → openai-codex/gpt-5.4

Suggestion

A configurable model.fallbackTimeoutMs or model.maxRetriesBeforeFallback would help operators tune this.

extent analysis

Fix Plan

To address the issue, we will introduce a configurable model.fallbackTimeoutMs and model.maxRetriesBeforeFallback to control the fallback behavior.

Configuration Changes

  • Add the following configuration options:
    • model.fallbackTimeoutMs: timeout in milliseconds before falling back to the next model
    • model.maxRetriesBeforeFallback: maximum number of retries before falling back to the next model

Code Changes

# Add configuration options
model_config = {
    'fallbackTimeoutMs': 5000,  # 5 seconds
    'maxRetriesBeforeFallback': 2
}

# Update the fallback logic
def fallback_to_next_model(current_model, retry_count, timeout):
    if retry_count >= model_config['maxRetriesBeforeFallback'] or timeout >= model_config['fallbackTimeoutMs']:
        # Fallback to the next model
        return get_next_model(current_model)
    return current_model

# Example usage
current_model = 'anthropic/claude-opus-4-6'
retry_count = 0
timeout = 0
while retry_count < model_config['maxRetriesBeforeFallback']:
    try:
        # Try to use the current model
        response = use_model(current_model)
        break
    except OverloadedError:
        # Increment retry count and timeout
        retry_count += 1
        timeout += 1000  # assume 1 second timeout
        current_model = fallback_to_next_model(current_model, retry_count, timeout)

Verification

To verify the fix, test the fallback behavior with different scenarios:

  • Primary model overloaded: verify that the first fallback is attempted within 5 seconds
  • First fallback overloaded: verify that the second fallback is attempted within 10 seconds
  • Second fallback overloaded: verify that the third fallback is attempted within 15 seconds

Extra Tips

  • Monitor the fallback behavior and adjust the model.fallbackTimeoutMs and model.maxRetriesBeforeFallback configuration options as needed to achieve the desired fallback behavior.
  • Consider adding logging and metrics to track the fallback behavior and identify potential issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After 1-2 failed attempts on primary (within ~5 seconds), move to the first fallback. If that also fails after 1-2 attempts, move to the next. The full fallback chain should be exhausted within 10-15 seconds, not 60+.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING