openclaw - 💡(How to fix) Fix Feature: Configurable overload retry count and circuit breaker for model failover [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59253Fetched 2026-04-08 02:26:52
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

Root Cause

Root cause (from source)

Code Example

const OVERLOAD_FAILOVER_BACKOFF_POLICY = {
  initialMs: 250, maxMs: 1500, factor: 2, jitter: 0.2
};

---

{
  "agents": {
    "defaults": {
      "model": {
        "overloadMaxRetries": 2
      }
    }
  }
}

---

{
  "agents": {
    "defaults": {
      "model": {
        "overloadCircuitBreakerFailures": 3,
        "overloadCircuitBreakerWindowMinutes": 5,
        "overloadCircuitBreakerCooldownMinutes": 10
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Problem

When a primary model (e.g., Anthropic Claude Opus) returns overloaded_error, OpenClaw retries 4+ times with exponential backoff (250ms → 500ms → 1000ms → 1500ms) before triggering failover to the next model in the fallback chain. This adds ~30-40 seconds of wasted latency per request even when the fallback model (e.g., GPT-5.4) is healthy and ready.

Worse, there is no cross-request circuit breaker — each new request independently rediscovers the primary is down, paying the same retry tax every time. During a sustained outage (observed: 2+ hours on 3/31/2026), this makes the experience feel broken even though the fallback model works fine when it finally gets requests.

Observed behavior (from gateway logs)

  • 35 Opus overloaded errors over ~2 hours
  • Each request retried Opus 4 times (~37s) before falling back to GPT-5.4
  • GPT-5.4 succeeded 8/8 times when it actually received requests
  • The retry delay was the entire source of user-perceived failure

Root cause (from source)

The retry policy is hardcoded:

const OVERLOAD_FAILOVER_BACKOFF_POLICY = {
  initialMs: 250, maxMs: 1500, factor: 2, jitter: 0.2
};

The run loop retries up to 24 + (profiles × 8) iterations (min 32) before giving up. There is no configurable knob for overload-specific retry count.

Proposed solution

1. Configurable overload retry count (quick win)

Add a config option to control how many overload retries occur before triggering model failover:

{
  "agents": {
    "defaults": {
      "model": {
        "overloadMaxRetries": 2
      }
    }
  }
}

Default could stay at 4 for backward compat, but letting users set it to 1-2 would dramatically reduce latency during outages.

2. Cross-request circuit breaker (bigger win)

After N overload failures within a time window, skip the primary model entirely and go straight to the fallback for a cooldown period:

{
  "agents": {
    "defaults": {
      "model": {
        "overloadCircuitBreakerFailures": 3,
        "overloadCircuitBreakerWindowMinutes": 5,
        "overloadCircuitBreakerCooldownMinutes": 10
      }
    }
  }
}

Meaning: "If 3 overload failures happen within 5 minutes, skip the primary for the next 10 minutes and go straight to fallback."

The auth.cooldowns config already exists for billing failures — this would be the overload equivalent.

Impact

This would turn a 30-40 second latency penalty into a <1 second transparent failover during provider outages. For users with capable fallback models configured, outages would become nearly invisible.

extent analysis

TL;DR

Implement a configurable overload retry count and a cross-request circuit breaker to reduce latency during model outages.

Guidance

  • Introduce a configurable overloadMaxRetries option to control the number of retries before triggering model failover, allowing users to set it to 1-2 for reduced latency.
  • Implement a cross-request circuit breaker with configurable overloadCircuitBreakerFailures, overloadCircuitBreakerWindowMinutes, and overloadCircuitBreakerCooldownMinutes options to skip the primary model after a specified number of failures within a time window.
  • Consider defaulting overloadMaxRetries to 4 for backward compatibility while allowing users to adjust it.
  • Review the existing auth.cooldowns config for billing failures as a reference for implementing the overload circuit breaker.

Example

{
  "agents": {
    "defaults": {
      "model": {
        "overloadMaxRetries": 2,
        "overloadCircuitBreakerFailures": 3,
        "overloadCircuitBreakerWindowMinutes": 5,
        "overloadCircuitBreakerCooldownMinutes":

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING