claude-code - 💡(How to fix) Fix Proposal: Client-Side API Failover and High-Availability Routing [4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48665Fetched 2026-04-16 06:54:18
View on GitHub
Comments
4
Participants
2
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
commented ×4labeled ×3closed ×1

Code Example

Update - Claude.ai and Platform are down. Login for Claude Code does not work via Claude.ai.
Apr 15, 2026 - 15:40 UTC
Update - The Claude API has fully recovered as of 8:01 PT / 16:01 UTC. We are currently working on mitigating the ongoing errors for Claude AI. Claude Code users who are logged in are still able to use it, but logging in is still broken.
Apr 15, 2026 - 15:20 UTC
Identified - The issue has been identified and a fix is being implemented.
Apr 15, 2026 - 15:03 UTC
Update - We are continuing to investigate this issue.
Apr 15, 2026 - 14:55 UTC
Investigating - We are seeing increased errors on Claude.ai, API, and Claude Code
Apr 15, 2026 - 14:53 UTC
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing requests and this feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

Problem Statement: Claude Code is increasingly used as a primary driver for professional software engineering. However, it currently has a single point of failure: the primary API gateway. During major outages or capacity events (as seen in March 2026), the tool becomes unusable, causing total productivity loss. While internal retries exist, they cannot solve a systemic provider outage.

Proposed Solution: An "Availability Suite" I propose a first-class "Failover" mechanism that allows the CLI to transparently route requests to a secondary configuration (e.g., a different billing account or a provider like OpenRouter) upon encountering specific terminal errors (5xx, 429).

This feature would act as the logical connective tissue between several existing community requests:

  1. Leverage Profile Support (#20131): Instead of a single config, the failover logic would switch between the profiles/buckets proposed in #20131.
  2. Automate the Retry Loop (#46959): Rather than requiring users to build their own "OnApiRetry" hooks to trigger external wrapper scripts, the harness should natively handle the provider-switch during the retry cycle.
  3. Implement the "Sidecar" Philosophy (#40183): This brings the "Sidecar" model of capacity protection to the client side, ensuring that the developer is shielded from infrastructure fragility.

Suggested Implementation: A failover_policy in settings.json: { "failover_policy": { "enabled": true, "primary_profile": "work", "secondary_profile": "openrouter-backup", "triggers": [500, 502, 503, 504, 429], "max_failover_attempts": 1 } }

Why this matters: As highlighted by the users in #40771, Claude Code is being used to build complex, self-healing systems. The tool itself should embody these same engineering principles—resilience, redundancy, and graceful degradation. By implementing client-side failover, Anthropic ensures that Claude Code remains a reliable professional tool even when the underlying infrastructure is under extreme pressure.

Proposed Solution

I'd like to be able to use a single config, or hierarchical configs, to define multiple logins and endpoints, and have hooks available to intercept http errors and redirect based on configuration settings.

Alternative Solutions

Writing my own, but the hooks for web failures (500, 401) aren't available in the api, so looking to implement my own proxy to intercept and redirect.

Re: Critical - Blocking my work below This is the 2nd time in 2 weeks that Anthropic has had an outage preventing the current AI work. And there are 3-4x/week that I need to switch between accounts due to usage.

Priority

Critical - Blocking my work

Feature Category

API and model interactions

Use Case Example

Working on a longer term interaction with claude, and start getting 500 errors back from Anthropic (2026-04-15@12:00ET)

 Update - Claude.ai and Platform are down. Login for Claude Code does not work via Claude.ai.
Apr 15, 2026 - 15:40 UTC
Update - The Claude API has fully recovered as of 8:01 PT / 16:01 UTC. We are currently working on mitigating the ongoing errors for Claude AI. Claude Code users who are logged in are still able to use it, but logging in is still broken.
Apr 15, 2026 - 15:20 UTC
Identified - The issue has been identified and a fix is being implemented.
Apr 15, 2026 - 15:03 UTC
Update - We are continuing to investigate this issue.
Apr 15, 2026 - 14:55 UTC
Investigating - We are seeing increased errors on Claude.ai, API, and Claude Code
Apr 15, 2026 - 14:53 UTC

Having the ability to fail over to a different endpoint or configuration would allow claudecode to continue working on solutions, albeit, likely slower. It would also allow claude to remain able to answer questions, perhaps on ways to diagnose the issue to ensure it's not a isolated problem.

For multiple accounts, currently using multiple .claude* directories each with different configurations to different models/endpoints. While this allows the ability to "easily" switch accounts, context is lost and has to be rebuilt before continuing is possible.

Additional Context

No response

extent analysis

TL;DR

Implementing a client-side failover mechanism with a secondary configuration can help mitigate the impact of primary API gateway outages on Claude Code productivity.

Guidance

  • Introduce a failover_policy in the settings.json file to define the failover behavior, including the primary and secondary profiles, error triggers, and maximum failover attempts.
  • Develop a mechanism to switch between profiles based on the defined failover policy, allowing the CLI to transparently route requests to the secondary configuration upon encountering specific terminal errors.
  • Leverage existing community requests, such as profile support and automated retry loops, to integrate the failover logic into the CLI.
  • Consider implementing a "Sidecar" philosophy to ensure capacity protection and shield developers from infrastructure fragility.

Example

{
  "failover_policy": {
    "enabled": true,
    "primary_profile": "work",
    "secondary_profile": "openrouter-backup",
    "triggers": [500, 502, 503, 504, 429],
    "max_failover_attempts": 1
  }
}

Notes

The proposed solution relies on the existence of a secondary configuration or endpoint, which may not always be available. Additionally, the implementation details of the failover mechanism are not fully specified and may require further development.

Recommendation

Apply a workaround by implementing a client-side failover mechanism, as the proposed solution addresses a critical need for resilience and redundancy in Claude Code. This will help mitigate the impact of primary API gateway outages and ensure continued productivity.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING