claude-code - 💡(How to fix) Fix Auto-retry with backoff on shared-capacity rate limits instead of failing the turn [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#50841Fetched 2026-04-20 12:11:33
View on GitHub
Comments
2
Participants
3
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×2labeled ×2

Error Message

  1. Detect the shared-capacity error (distinct from quota exhaustion).

Root Cause

  • Users are effectively throttled twice: once by the API, once by losing the ability to continue a session they're paying for.
  • No reliable Retry-After header doesn't preclude client-side backoff — 250ms → 500ms → 1s → 2s → 4s capped at 60s would catch the vast majority of these transient throttles.
  • The "state goes stale" argument against retry doesn't hold for short windows — prompt cache TTL is 5 minutes.
RAW_BUFFERClick to expand / collapse

Problem

When the API returns Server is temporarily limiting requests (not your usage limit) · Rate limited, Claude Code terminates the turn. The user returns to a dead-end conversation with no progress and no automatic recovery, despite having usage quota remaining.

This is particularly disruptive during long autonomous runs (e.g. /loop, multi-agent orchestration, do-work skill) where losing a turn mid-task can lose significant state.

Expected behavior

Shared-capacity throttles are transient and not caused by the user exceeding their plan. The client should:

  1. Detect the shared-capacity error (distinct from quota exhaustion).
  2. Retry with exponential backoff for a bounded window (e.g. up to 60s) before surfacing the failure.
  3. Surface a visible "waiting on capacity" indicator during the retry window so users know it's recovering, not hung.

Why this matters

  • Users are effectively throttled twice: once by the API, once by losing the ability to continue a session they're paying for.
  • No reliable Retry-After header doesn't preclude client-side backoff — 250ms → 500ms → 1s → 2s → 4s capped at 60s would catch the vast majority of these transient throttles.
  • The "state goes stale" argument against retry doesn't hold for short windows — prompt cache TTL is 5 minutes.

Suggested implementation notes

  • Only auto-retry for the shared-capacity / 529-class errors, not 429 quota errors.
  • Make max retry window configurable (env var or setting).
  • Skip auto-retry if the last tool call was destructive and its response was dropped (avoid double-fire on writes).

extent analysis

TL;DR

Implement exponential backoff with retry for shared-capacity errors to handle transient API throttling without surfacing failure to the user immediately.

Guidance

  • Detect shared-capacity errors (distinct from quota exhaustion) and differentiate them from 429 quota errors to apply retry logic only when appropriate.
  • Implement exponential backoff with a bounded window (e.g., up to 60s) to retry requests that fail due to shared-capacity throttling, starting with a short delay (e.g., 250ms) and doubling it until the maximum window is reached.
  • Display a "waiting on capacity" indicator to users during the retry window to prevent the perception of the system being hung.
  • Consider making the maximum retry window configurable via an environment variable or setting to allow for flexibility in handling different scenarios.

Example

import time
import random

def retry_shared_capacity_error(max_retries=5, initial_delay=0.25, max_delay=60):
    delay = initial_delay
    for attempt in range(max_retries):
        try:
            # API call here
            return
        except SharedCapacityError:
            print("Shared capacity error, retrying in {:.2f} seconds".format(delay))
            time.sleep(delay)
            delay = min(delay * 2, max_delay)
    raise Exception("Failed after {} retries".format(max_retries))

Notes

This approach assumes that the API's shared-capacity throttling is indeed transient and that retrying with exponential backoff will eventually succeed. It's crucial to monitor the effectiveness of this strategy and adjust parameters as needed to balance between retrying sufficiently and not overwhelming the API with repeated requests.

Recommendation

Apply workaround by implementing exponential backoff with retry for shared-capacity errors, as this directly addresses the issue of transient throttling without requiring changes to the API itself.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Shared-capacity throttles are transient and not caused by the user exceeding their plan. The client should:

  1. Detect the shared-capacity error (distinct from quota exhaustion).
  2. Retry with exponential backoff for a bounded window (e.g. up to 60s) before surfacing the failure.
  3. Surface a visible "waiting on capacity" indicator during the retry window so users know it's recovering, not hung.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Auto-retry with backoff on shared-capacity rate limits instead of failing the turn [2 comments, 3 participants]