openclaw - 💡(How to fix) Fix Persistent file-based provider cooldown blocks user for hours after billing recovery [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70903Fetched 2026-04-24 10:38:04
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

When an Anthropic (or any) provider returns a 402 billing error, OpenClaw writes a disabledUntil timestamp to the agent auth-state file that persists across gateway restarts. On repeated failures the timestamp extends forward, which means even after the user tops up credit, they remain locked out for hours (or up to 24h) until either a human edits the JSON or the timestamp naturally expires.

This affects every user who runs a single-provider configuration (e.g. Opus-only) or whose fallback chain shares a single auth profile.

Error Message

When an Anthropic (or any) provider returns a 402 billing error, OpenClaw writes a disabledUntil timestamp to the agent auth-state file that persists across gateway restarts. On repeated failures the timestamp extends forward, which means even after the user tops up credit, they remain locked out for hours (or up to 24h) until either a human edits the JSON or the timestamp naturally expires.

Root Cause

When an Anthropic (or any) provider returns a 402 billing error, OpenClaw writes a disabledUntil timestamp to the agent auth-state file that persists across gateway restarts. On repeated failures the timestamp extends forward, which means even after the user tops up credit, they remain locked out for hours (or up to 24h) until either a human edits the JSON or the timestamp naturally expires.

This affects every user who runs a single-provider configuration (e.g. Opus-only) or whose fallback chain shares a single auth profile.

Fix Action

Workaround

A ~/bin/opus-reset script that:

  1. Scans all ~/.openclaw/agents/*/agent/auth-state.json files.
  2. Strips disabledUntil, disabledReason, cooldownUntil, pausedUntil, failureCounts, errorCount, lastFailureAt, quarantineUntil fields.
  3. Runs openclaw gateway restart.

This is the minimum needed to recover today. It should not be necessary.

Code Example

"anthropic:default": {
  "errorCount": 2,
  "lastFailureAt": 1776983458318,
  "failureCounts": { "billing": 2 },
  "disabledUntil": 1777001449315,
  "disabledReason": "billing"
}
RAW_BUFFERClick to expand / collapse

Summary

When an Anthropic (or any) provider returns a 402 billing error, OpenClaw writes a disabledUntil timestamp to the agent auth-state file that persists across gateway restarts. On repeated failures the timestamp extends forward, which means even after the user tops up credit, they remain locked out for hours (or up to 24h) until either a human edits the JSON or the timestamp naturally expires.

This affects every user who runs a single-provider configuration (e.g. Opus-only) or whose fallback chain shares a single auth profile.

Reproduction

  1. Run OpenClaw with a single-provider config (e.g. primary: anthropic/claude-opus-4-7, fallback: anthropic/claude-opus-4-6, both resolving to the same anthropic:default auth profile).
  2. Let the Anthropic account hit 402 "out of extra usage".
  3. Top up credit in the Anthropic console. Confirm a curl to GET https://api.anthropic.com/v1/models with the same key returns 200.
  4. Fire a normal message. OpenClaw still returns billing issue (skipping all models).
  5. Run openclaw gateway restart. Gateway restarts cleanly but provider remains disabled.
  6. Inspect ~/.openclaw/agents/main/agent/auth-state.json and observe:
"anthropic:default": {
  "errorCount": 2,
  "lastFailureAt": 1776983458318,
  "failureCounts": { "billing": 2 },
  "disabledUntil": 1777001449315,
  "disabledReason": "billing"
}
  1. disabledUntil is several hours or up to 24h in the future. Only manually editing this file (removing disabledUntil, disabledReason, and zeroing counters) AND restarting the gateway restores service.

Impact

  • User is silently locked out long after the underlying cause is resolved.
  • Remote users cannot recover without SSH / local shell access \u2014 slash commands are dead when the gateway thinks the provider is disabled.
  • Increasing lock on repeat failures: each new 402 during the disabled window extends disabledUntil further forward, compounding the problem.
  • No observability: nothing surfaces "you are in a persistent cooldown until X" to the user. They just see blank responses or cron failures labelled "billing issue".

Observed in the wild

  • 2026-04-23: Anthropic credit depleted ~08:00 AEST. Credit restored ~10:00 AEST. Provider stayed disabled until a manual auth-state.json edit + gateway restart some 60 minutes later.
  • 2026-04-24: Anthropic credit depleted evening 2026-04-23. Credit restored ~08:00 AEST the next morning. Provider stayed disabled until ~13:00 AEST (5+ hours after top-up), only resolved by manually clearing disabledUntil via Claude Code in the local terminal.

Proposed fix (any of these would resolve it)

  1. Don't persist disabledUntil to disk. Keep it in-memory only with a short TTL (e.g. 5 minutes). On restart, probe the provider fresh.
  2. Clear disabledUntil on successful probe. Add a periodic health check (every 2\u20135 minutes) that hits /v1/models (or equivalent) on disabled providers. First success clears the cooldown immediately.
  3. Cap disabledUntil at a sensible ceiling (e.g. 15 minutes maximum, regardless of failure count). Billing errors are usually user-recoverable on a minutes timescale, not hours.
  4. Expose a first-class CLI reset: openclaw auth reset --provider anthropic that clears the state cleanly. This at least gives remote users a documented escape hatch.

Option 2 is probably the cleanest: a provider marked "disabled" should be opportunistically re-probed, and if it responds healthy, the cooldown should be cleared without any user action.

Environment

  • OpenClaw version: 2026.4.21 (commit f788c88)
  • OS: macOS 25.3.0 arm64
  • Auth mode: api_key for anthropic:default
  • Fallback chain: Opus-only (no cross-provider safety net, which makes this failure mode fully blocking)

Workaround

A ~/bin/opus-reset script that:

  1. Scans all ~/.openclaw/agents/*/agent/auth-state.json files.
  2. Strips disabledUntil, disabledReason, cooldownUntil, pausedUntil, failureCounts, errorCount, lastFailureAt, quarantineUntil fields.
  3. Runs openclaw gateway restart.

This is the minimum needed to recover today. It should not be necessary.

extent analysis

TL;DR

The most likely fix is to implement a periodic health check that clears the disabledUntil timestamp when a disabled provider becomes available again.

Guidance

  • Consider implementing option 2 from the proposed fixes: add a periodic health check (every 2-5 minutes) that hits /v1/models on disabled providers, and clear the cooldown immediately on the first success.
  • To verify the fix, test the scenario described in the reproduction steps and check if the provider is re-enabled after the credit is topped up.
  • To mitigate the issue temporarily, use the provided ~/bin/opus-reset script to reset the auth state and restart the gateway.
  • Review the proposed fixes and choose the one that best fits the requirements, considering factors such as simplicity, effectiveness, and potential impact on the system.

Example

No code example is provided as the issue does not require a specific code snippet to illustrate the solution.

Notes

The chosen solution should be tested thoroughly to ensure it resolves the issue without introducing new problems. The periodic health check should be configured to run at a reasonable interval to avoid overwhelming the provider with requests.

Recommendation

Apply workaround: use the ~/bin/opus-reset script to reset the auth state and restart the gateway until a permanent fix is implemented. This provides a temporary solution to recover from the issue without requiring manual editing of the auth-state.json file.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Persistent file-based provider cooldown blocks user for hours after billing recovery [1 participants]