openclaw - ✅(Solved) Fix Failover ladder retries sibling Gemini models on the same billing account when the cap is account-scoped, multiplying RESOURCE_EXHAUSTED noise [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80451Fetched 2026-05-11 03:14:30
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Timeline (top)
commented ×1cross-referenced ×1

When google/gemini-3.1-pro-preview returns 429 RESOURCE_EXHAUSTED due to the monthly billing-account spending cap, OpenClaw's failover ladder retries google/gemini-2.5-flash and google/gemini-2.5-pro even though all three Gemini models share the same Google billing account and therefore the same cap. Each sibling model's attempt produces additional RESOURCE_EXHAUSTED lines for a billing condition that is structurally guaranteed not to recover by switching siblings.

In a real-world cascade I observed across a 7-day window, this produced 15 RESOURCE_EXHAUSTED lines per failed run, repeated across 26 distinct runs over ~10 hours, for a total of ~390 cascade-amplified events from a single billing condition. The actual primary failover target — OpenAI — would have had a chance to handle the conversation if it weren't for a separate replay bug (filed separately).

Error Message

openai/gpt-5.4: <separate format error, see related issue> | Failover decisions for all three Gemini siblings carry reason=rate_limit despite the underlying error being RESOURCE_EXHAUSTED + "monthly spending cap" (a separate classification issue worth tracking on its own; see PR #74120 for in-flight billing classification work that targets LiteLLM-style budget errors but doesn't yet cover Google's native shape).

  • A simpler heuristic: when a model returns a billing-class error (after PR #74120 ships native Google classification), set a short-lived provider+account cooldown that suppresses sibling attempts for the rest of the run and a host-level cooldown of N minutes (overlaps PR #64127's circuit breaker work).

Root Cause

When google/gemini-3.1-pro-preview returns 429 RESOURCE_EXHAUSTED due to the monthly billing-account spending cap, OpenClaw's failover ladder retries google/gemini-2.5-flash and google/gemini-2.5-pro even though all three Gemini models share the same Google billing account and therefore the same cap. Each sibling model's attempt produces additional RESOURCE_EXHAUSTED lines for a billing condition that is structurally guaranteed not to recover by switching siblings.

In a real-world cascade I observed across a 7-day window, this produced 15 RESOURCE_EXHAUSTED lines per failed run, repeated across 26 distinct runs over ~10 hours, for a total of ~390 cascade-amplified events from a single billing condition. The actual primary failover target — OpenAI — would have had a chance to handle the conversation if it weren't for a separate replay bug (filed separately).

Fix Action

Fixed

PR fix notes

PR #64127: feat: Provider circuit breaker for quota exhaustion

Description (problem / solution / changelog)

Resolves #64085

This PR introduces proper handling for daily/weekly/monthly quota exhaustion errors:

  1. Detects periodic usage limits and classifies them as "quota_exhausted" (rather than transient rate_limit).
  2. Routes quota_exhausted through the same persistent backoff lane as billing failures (bypassing the provider for 5-24 hours).
  3. Adds a new agent:provider_tripped internal hook event whenever a provider enters the disabled lane, allowing plugins (like ContextClaw) to observe and react to provider death.

Tested via local inspection; handles the Gemini 429 loops by correctly stepping back for the day.

Changed files

  • apps/macos/Sources/OpenClawProtocol/GatewayModels.swift (modified, +22/-0)
  • apps/shared/OpenClawKit/Sources/OpenClawProtocol/GatewayModels.swift (modified, +22/-0)
  • dist/protocol.schema.json (added, +12738/-0)
  • scripts/e2e/lib/doctor-install-switch/scenario.sh (modified, +1/-1)
  • src/agents/auth-profiles/state-observation.ts (modified, +18/-1)
  • src/agents/auth-profiles/types.ts (modified, +1/-0)
  • src/agents/auth-profiles/usage.test.ts (modified, +18/-0)
  • src/agents/auth-profiles/usage.ts (modified, +14/-2)
  • src/agents/failover-error.test.ts (modified, +9/-3)
  • src/agents/failover-error.ts (modified, +1/-0)
  • src/agents/failover-policy.ts (modified, +1/-0)
  • src/agents/model-fallback.probe.test.ts (modified, +55/-1)
  • src/agents/model-fallback.ts (modified, +13/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +7/-5)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +4/-1)
  • src/agents/pi-embedded-helpers/types.ts (modified, +1/-0)
  • src/agents/runtime-plan/types.ts (modified, +1/-0)
  • src/hooks/internal-hooks.ts (modified, +25/-0)

Code Example

model_stack:
     - google/gemini-3.1-pro-preview
     - openai/gpt-5.4
     - google/gemini-2.5-flash
     - google/gemini-2.5-pro

---

Embedded agent failed before reply: All models failed (4):
  google/gemini-3.1-pro-preview: 429 RESOURCE_EXHAUSTED [monthly spending cap] (rate_limit) |
  openai/gpt-5.4:                <separate format error, see related issue> |
  google/gemini-2.5-flash:       429 RESOURCE_EXHAUSTED [monthly spending cap] (rate_limit) |
  google/gemini-2.5-pro:         429 RESOURCE_EXHAUSTED [monthly spending cap] (rate_limit)
RAW_BUFFERClick to expand / collapse

Summary

When google/gemini-3.1-pro-preview returns 429 RESOURCE_EXHAUSTED due to the monthly billing-account spending cap, OpenClaw's failover ladder retries google/gemini-2.5-flash and google/gemini-2.5-pro even though all three Gemini models share the same Google billing account and therefore the same cap. Each sibling model's attempt produces additional RESOURCE_EXHAUSTED lines for a billing condition that is structurally guaranteed not to recover by switching siblings.

In a real-world cascade I observed across a 7-day window, this produced 15 RESOURCE_EXHAUSTED lines per failed run, repeated across 26 distinct runs over ~10 hours, for a total of ~390 cascade-amplified events from a single billing condition. The actual primary failover target — OpenAI — would have had a chance to handle the conversation if it weren't for a separate replay bug (filed separately).

Reproduction

  1. Configure two or more Gemini models from the same billing account in the failover chain. Example:
    model_stack:
      - google/gemini-3.1-pro-preview
      - openai/gpt-5.4
      - google/gemini-2.5-flash
      - google/gemini-2.5-pro
  2. Trigger sustained traffic that pushes the Google billing account past its monthly spending cap.
  3. Observe failover ladder behavior in the gateway journal.

Observed

Embedded agent failed before reply: All models failed (4):
  google/gemini-3.1-pro-preview: 429 RESOURCE_EXHAUSTED [monthly spending cap] (rate_limit) |
  openai/gpt-5.4:                <separate format error, see related issue> |
  google/gemini-2.5-flash:       429 RESOURCE_EXHAUSTED [monthly spending cap] (rate_limit) |
  google/gemini-2.5-pro:         429 RESOURCE_EXHAUSTED [monthly spending cap] (rate_limit)

Failover decisions for all three Gemini siblings carry reason=rate_limit despite the underlying error being RESOURCE_EXHAUSTED + "monthly spending cap" (a separate classification issue worth tracking on its own; see PR #74120 for in-flight billing classification work that targets LiteLLM-style budget errors but doesn't yet cover Google's native shape).

Expected

The failover ladder should be aware of billing-account scope: when one model on a billing account hits a cap, sibling models on the same billing account should be considered "same failure class" and skipped without contacting the API. Possible shapes:

  • A billingAccount field on each provider entry (default = inferred from provider+account-id), with a "skip same-billing-account siblings on RESOURCE_EXHAUSTED / billing-class errors" policy.
  • A simpler heuristic: when a model returns a billing-class error (after PR #74120 ships native Google classification), set a short-lived provider+account cooldown that suppresses sibling attempts for the rest of the run and a host-level cooldown of N minutes (overlaps PR #64127's circuit breaker work).

Why filing

This is a defect-class bug — the failover ladder produces guaranteed-to-fail attempts whose only effect is journal noise and (transient, until cap kicks in) extra API hits. It's adjacent to but distinct from:

  • PR #64127 (provider circuit breaker for quota exhaustion) — host-level circuit breaker; would suppress runs but not the within-run sibling attempts.
  • PR #78086 (state-aware failover and lane suspension) — session-level suspension; same scope concern.
  • PR #74120 (classify budget-exceeded as billing) — fixes classification of LiteLLM proxy errors; doesn't address sibling scope.

None of these address the structural "siblings share billing account" insight, hence this issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Failover ladder retries sibling Gemini models on the same billing account when the cap is account-scoped, multiplying RESOURCE_EXHAUSTED noise [1 pull requests, 1 comments, 2 participants]