openclaw - 💡(How to fix) Fix Feature: Model retry with backoff before falling back to backup chain [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#61691Fetched 2026-04-08 02:55:49
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

  • Reduces unnecessary fallback chain traversal for transient failures
  • Improves resilience on unreliable network conditions (e.g., mobile hotspot, remote VPN)
  • More graceful degradation under rate limiting
  • Aligned with the existing retry pattern already used in cron tasks
RAW_BUFFERClick to expand / collapse

"Feature request: model-layer retry with backoff

Problem

When the primary model fails due to transient issues (network jitter, rate limiting, timeouts, server errors), OpenClaw immediately fails over to the next model in the fallback chain. This is abrupt and unnecessary for temporary blips that would recover after a short wait.

Desired behavior

Before switching to a fallback model, the system should automatically retry the primary model 1–2 times with a short backoff delay (e.g., 5–10 seconds). Only after all retries are exhausted should it proceed to the next model in the fallback chain.

Suggested config shape (conceptual)

json defaults: model: primary: minimax-portal/MiniMax-M2.7-highspeed fallbacks: - gptclub-openai/gpt-5.4 retry: maxAttempts: 2 backoffMs: [5000, 10000] retryOn: - network - rate_limit - timeout - server_error

Trigger conditions (aligned with existing cron retry schema)

  • network
  • rate_limit
  • timeout
  • server_error
  • overloaded

Why this matters

  • Reduces unnecessary fallback chain traversal for transient failures
  • Improves resilience on unreliable network conditions (e.g., mobile hotspot, remote VPN)
  • More graceful degradation under rate limiting
  • Aligned with the existing retry pattern already used in cron tasks

References

Existing retry patterns already exist in OpenClaw for cron tasks and channel plugins (Discord/Telegram). This request extends the same concept to the model layer."

extent analysis

TL;DR

Implement a retry mechanism with backoff in the model layer to handle transient issues before falling back to the next model.

Guidance

  • Introduce a retry configuration option in the model settings to specify the maximum number of attempts and backoff delays.
  • Define the trigger conditions for retry, such as network errors, rate limiting, timeouts, and server errors, and ensure they align with the existing cron retry schema.
  • Implement a retry logic that waits for a short backoff delay before retrying the primary model, and only proceeds to the next model in the fallback chain after all retries are exhausted.
  • Consider reusing the existing retry patterns already implemented in OpenClaw for cron tasks and channel plugins as a reference.

Example

{
  "defaults": {
    "model": {
      "primary": "minimax-portal/MiniMax-M2.7-highspeed",
      "fallbacks": [
        "gptclub-openai/gpt-5.4"
      ],
      "retry": {
        "maxAttempts": 2,
        "backoffMs": [5000, 10000],
        "retryOn": [
          "network",
          "rate_limit",
          "timeout",
          "server_error"
        ]
      }
    }
  }
}

Notes

The implementation details of the retry mechanism are not specified, and the exact code changes required will depend on the existing architecture of OpenClaw.

Recommendation

Apply a workaround by implementing a custom retry mechanism in the model layer, as the existing retry patterns in OpenClaw can serve as a reference, but a direct upgrade to a fixed version is not implied.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING