openclaw - 💡(How to fix) Fix Feature request: configurable OVERLOAD_FAILOVER_BACKOFF_POLICY [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#49912Fetched 2026-04-08 01:01:19
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
1
Timeline (top)
commented ×1

Fix Action

Fix / Workaround

Current workaround

Manually patching the dist files:

# In dist/reply-*.js, dist/compact-*.js, dist/plugin-sdk/dispatch-*.js
const OVERLOAD_FAILOVER_BACKOFF_POLICY = {
  initialMs: 2000,   # was 250
  maxMs: 15000,      # was 1500
  factor: 2,
  jitter: .2
};

Code Example

# In dist/reply-*.js, dist/compact-*.js, dist/plugin-sdk/dispatch-*.js
const OVERLOAD_FAILOVER_BACKOFF_POLICY = {
  initialMs: 2000,   # was 250
  maxMs: 15000,      # was 1500
  factor: 2,
  jitter: .2
};

---

{
  "overloadFailoverBackoffPolicy": {
    "initialMs": 2000,
    "maxMs": 15000,
    "factor": 2,
    "jitter": 0.2
  }
}
RAW_BUFFERClick to expand / collapse

Problem

The hardcoded OVERLOAD_FAILOVER_BACKOFF_POLICY uses an initial backoff of 250ms and a max of 1500ms. This is too aggressive for agents with large system prompts (e.g. ~40K tokens), where API overload events are more likely and token processing time is longer.

What happens in practice

When Anthropic returns an overload response during normal operation, OpenClaw retries with 250ms → 500ms → 1000ms → 1500ms backoffs. For a large-prompt agent, the retry itself can also hit overload (the system hasn't recovered yet), causing a cascading failure loop — a restart death spiral where the agent continuously crashes and restarts, never successfully making a request.

Current workaround

Manually patching the dist files:

# In dist/reply-*.js, dist/compact-*.js, dist/plugin-sdk/dispatch-*.js
const OVERLOAD_FAILOVER_BACKOFF_POLICY = {
  initialMs: 2000,   # was 250
  maxMs: 15000,      # was 1500
  factor: 2,
  jitter: .2
};

This workaround is fragile — it gets silently overwritten on every npm update openclaw, requiring manual re-patching after every update.

Feature Request

Expose overloadFailoverBackoffPolicy as a configurable option in agent config (e.g., clawdbot.json):

{
  "overloadFailoverBackoffPolicy": {
    "initialMs": 2000,
    "maxMs": 15000,
    "factor": 2,
    "jitter": 0.2
  }
}

Requirements

  • Keep 250ms as the default for backward compatibility — this change should be opt-in
  • Allow per-agent override so agents with large system prompts can use longer backoffs
  • Alternatively, auto-scale the backoff based on estimated prompt token count

Impact

Agents with large system prompts (~40K+ tokens) are significantly more vulnerable to overload death spirals. The 250ms default was likely designed for lightweight agents and doesn't account for the variance in prompt complexity across different deployments.

Environment

  • OpenClaw version: latest (npm)
  • Affected agent sizes: ~40K token system prompts
  • Workaround patch applied to: dist/reply-*.js, dist/compact-*.js, dist/plugin-sdk/dispatch-*.js

extent analysis

Fix Plan

To address the issue, we will expose overloadFailoverBackoffPolicy as a configurable option in the agent config. Here are the steps:

  • Update the clawdbot.json config file to include the overloadFailoverBackoffPolicy option:
{
  "overloadFailoverBackoffPolicy": {
    "initialMs": 2000,
    "maxMs": 15000,
    "factor": 2,
    "jitter": 0.2
  }
}
  • In the OpenClaw code, add a check to load the overloadFailoverBackoffPolicy from the agent config:
const config = require('./clawdbot.json');
const overloadFailoverBackoffPolicy = config.overloadFailoverBackoffPolicy || {
  initialMs: 250,
  maxMs: 1500,
  factor: 2,
  jitter: 0.2
};
  • Use the loaded overloadFailoverBackoffPolicy in the retry logic:
const backoff = require('backoff');
const retry = backoff.fibonacci({
  initialMs: overloadFailoverBackoffPolicy.initialMs,
  maxMs: overloadFailoverBackoffPolicy.maxMs,
  factor: overloadFailoverBackoffPolicy.factor,
  jitter: overloadFailoverBackoffPolicy.jitter
});

Verification

To verify that the fix worked, you can test the agent with a large system prompt (~40K tokens) and check that it no longer enters a restart death spiral. You can also monitor the agent's logs to ensure that the retry backoff policy is being applied correctly.

Extra Tips

  • Make sure to update the clawdbot.json config file for each agent that requires a custom overloadFailoverBackoffPolicy.
  • Consider adding a warning or error message if the overloadFailoverBackoffPolicy is not configured correctly.
  • You can also explore auto-scaling the backoff based on estimated prompt token count to further improve the agent's resilience.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING