litellm - ✅(Solved) Fix Anthropic 400 "credit balance too low" error does not trigger fallback routing [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24320Fetched 2026-04-08 01:13:19
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Timeline (top)
cross-referenced ×2commented ×1labeled ×1mentioned ×1

When Anthropic returns a 400 Bad Request with error type invalid_request_error and message "Your credit balance is too low to access the Anthropic API", LiteLLM's router does not trigger fallback routing. The error is classified as a non-retryable client error, so the request fails without attempting any configured fallback models.

This is problematic because "credit balance too low" is a billing/quota issue, not an invalid request. The request itself is well-formed — it would succeed on any other provider serving the same model. Fallback should absolutely be attempted here.

Error Message

No fallback model group found for original model_group=anthropic/claude-opus-4-6 Error doing the fallback: litellm.InternalServerError: ...

Root Cause

This is problematic because "credit balance too low" is a billing/quota issue, not an invalid request. The request itself is well-formed — it would succeed on any other provider serving the same model. Fallback should absolutely be attempted here.

PR fix notes

PR #24616: fix(router): 429 routing — cooldown bypass, providers.json mapping, Anthropic credit balance fallback

Description (problem / solution / changelog)

Summary

Three related fixes for 429/rate-limit routing failures. All three share the same root cause: rate-limit errors were not surfaced as RateLimitError, so the router could not cool down deployments or trigger fallback routing.


Fix 1 — cooldown_handlers.py: APIConnectionError wrapping 429 bypassed cooldown (closes #24366)

_is_cooldown_required() returned False for any exception containing 'APIConnectionError'. But providers from providers.json have their HTTP 429 responses wrapped as APIConnectionError by the catch-all mapper — so those deployments were never cooled down and all retries kept hitting the same rate-limited endpoint.

Fix: only skip cooldown for APIConnectionError when neither '429' nor 'rate limit' appears in the exception string.

# Before — skips cooldown for ANY APIConnectionError, even wrapped 429s
if 'APIConnectionError' in exception_str:
    return False

# After — only skips when no rate-limit signal is present
if 'APIConnectionError' in exception_str:
    if not ('429' in exception_str or 'rate limit' in exception_str.lower()):
        return False

Fix 2 — constants.py: 9 providers.json providers missing from openai_compatible_providers (closes #24366)

veniceai, scaleway, gmi, sarvam, xiaomi_mimo, abliteration, llamagate, assemblyai, charity_engine were not in openai_compatible_providers. Without this, their exceptions fell through to the catch-all mapper which raises APIConnectionError for any status code — preventing RateLimitError from being raised for 429 responses.


Fix 3 — exception_mapping_utils.py: Anthropic 400 'credit balance too low' → RateLimitError (closes #24320)

Anthropic returns HTTP 400 for billing exhaustion with the message "Your credit balance is too low to access the Anthropic API". This was mapped to BadRequestError, preventing router fallback.

The request itself is valid — the failure is a billing/quota issue that should be treated like a rate limit so the router tries alternative deployments.


Tests

21 new regression tests in tests/router_unit_tests/test_pr_429_routing_fixes.py:

  • TestCooldownAPIConnectionError429 (6 tests) — pure vs 429-wrapped APIConnectionError
  • TestProvidersJsonInOpenaiCompatible (10 tests) — all 9 new providers + existing ones unchanged
  • TestAnthropicCreditBalanceMapping (5 tests) — credit balance → RateLimitError, normal 400 → BadRequestError, 429 → RateLimitError

All 21 pass — 0 regressions.

Changed files

  • litellm/constants.py (modified, +9/-0)
  • litellm/litellm_core_utils/exception_mapping_utils.py (modified, +36/-0)
  • litellm/llms/anthropic/experimental_pass_through/messages/streaming_iterator.py (modified, +28/-4)
  • litellm/llms/xai/chat/transformation.py (modified, +5/-0)
  • litellm/model_prices_and_context_window_backup.json (modified, +37329/-37334)
  • litellm/router.py (modified, +11/-1)
  • litellm/router_utils/cooldown_handlers.py (modified, +12/-1)
  • model_prices_and_context_window.json (modified, +37329/-37334)
  • tests/router_unit_tests/test_pr_429_routing_fixes.py (added, +247/-0)
  • tests/router_unit_tests/test_pr_litellm_round2_fixes.py (added, +414/-0)

PR #107: refactor: extract business logic from routers to CQRS service layer

Description (problem / solution / changelog)

Summary

  • Extract business logic from 22+ routers into 20 new CQRS service classes following Clean Architecture
  • Register all services in DI container (20 Factory providers, 20 Dep types)
  • Create shared utilities: RateLimitService, IssuePriorityMapper, StateNameNormalizer
  • Consolidate duplicated _require_admin patterns and _get_admin_workspace across routers
  • Register previously unwired services (FeatureToggleService, ScimService) in DI
  • Migrate raw SQL in 3 services to repository pattern (NoteTemplateService, AdminDashboardService, DependencyGraphService)
  • Fix regression: restore inline admin checks for slug-aware routers (skill_templates, workspace_plugins, workspace_role_skills)

New Services Created (20)

ServiceSource Router
RateLimitServiceghost_text, issues_ai_context (x2)
AdminDashboardServiceadmin.py
AIConfigurationServiceai_configuration.py
CreateExtractedIssuesServiceai_extraction.py, workspace_notes_ai.py
GovernanceRollbackServiceai_governance.py
BlockOwnershipServiceblock_ownership.py
DependencyGraphServicedependency_graph.py
NoteTemplateServicenote_templates.py
RelatedIssuesSuggestionServicerelated_issues.py
McpServerService + McpOAuthServiceworkspace_mcp_servers.py
AttachmentManagementServiceai_attachments.py
PluginLifecycleServiceworkspace_plugins.py
OcrConfigurationServiceworkspace_ocr_settings.py
WorkspaceAISettingsServiceworkspace_ai_settings.py
SprintBoardServicepm_sprint_board.py
CapacityPlanServicepm_capacity.py
ActionButtonServiceworkspace_action_buttons.py
MCPToolExecutionServicemcp_tools.py
ProjectDetailServiceprojects.py

Stats

  • 58 files changed, +7,015 / -4,734 lines
  • Biggest router reductions: workspace_mcp_servers.py (960→350), ai_configuration.py (649→200), ai_governance.py (530→150)

Test plan

  • Verify skill-templates endpoint works with workspace slugs (was broken, now fixed)
  • Verify workspace_plugins endpoints work with workspace slugs
  • Verify workspace_role_skills endpoints work with workspace slugs
  • Run make quality-gates-backend (ruff, pyright, pytest)
  • Smoke test: workspace CRUD, issue CRUD, AI chat, plugin install/uninstall
  • Verify MCP server OAuth flow still works
  • Verify AI configuration provider key testing still works
<!-- This is an auto-generated comment: release notes by coderabbit.ai -->

Summary by CodeRabbit

  • Refactor

    • Many API routes simplified to thin HTTP handlers delegating logic to centralized services for more consistent behavior.
  • New Features

    • Admin dashboard, AI configuration & provider key testing, OCR settings, attachment signed URLs & ingestion, MCP server/tool/OAuth support, plugin lifecycle tools, sprint board & capacity planning, project detail summaries, dependency-graph view, related-issue suggestions, note templates, workspace action buttons, AI governance rollback, and Redis-backed rate limiting.
  • Tests

    • Unit tests updated to validate service-level behavior and router delegation.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Changed files

  • backend/src/pilot_space/ai/infrastructure/anthropic_client_pool.py (modified, +45/-21)
  • backend/src/pilot_space/ai/services/ghost_text.py (modified, +117/-20)
  • backend/src/pilot_space/api/v1/dependencies.py (modified, +362/-0)
  • backend/src/pilot_space/api/v1/routers/_chat_schemas.py (modified, +12/-82)
  • backend/src/pilot_space/api/v1/routers/_mcp_server_schemas.py (modified, +33/-546)
  • backend/src/pilot_space/api/v1/routers/admin.py (modified, +14/-293)
  • backend/src/pilot_space/api/v1/routers/ai.py (modified, +0/-24)
  • backend/src/pilot_space/api/v1/routers/ai_annotations.py (modified, +1/-19)
  • backend/src/pilot_space/api/v1/routers/ai_approvals.py (modified, +6/-52)
  • backend/src/pilot_space/api/v1/routers/ai_attachments.py (modified, +45/-237)
  • backend/src/pilot_space/api/v1/routers/ai_chat.py (modified, +2/-2)
  • backend/src/pilot_space/api/v1/routers/ai_chat_model_routing.py (modified, +4/-43)
  • backend/src/pilot_space/api/v1/routers/ai_configuration.py (modified, +35/-473)
  • backend/src/pilot_space/api/v1/routers/ai_extraction.py (modified, +65/-279)
  • backend/src/pilot_space/api/v1/routers/ai_governance.py (modified, +34/-431)
  • backend/src/pilot_space/api/v1/routers/ai_pr_review.py (modified, +1/-39)
  • backend/src/pilot_space/api/v1/routers/ai_sessions.py (modified, +7/-102)
  • backend/src/pilot_space/api/v1/routers/ai_tasks.py (modified, +1/-42)
  • backend/src/pilot_space/api/v1/routers/block_ownership.py (modified, +28/-188)
  • backend/src/pilot_space/api/v1/routers/dependency_graph.py (modified, +35/-189)
  • backend/src/pilot_space/api/v1/routers/ghost_text.py (modified, +4/-42)
  • backend/src/pilot_space/api/v1/routers/knowledge_graph.py (modified, +1/-13)
  • backend/src/pilot_space/api/v1/routers/mcp_tools.py (modified, +35/-151)
  • backend/src/pilot_space/api/v1/routers/note_templates.py (modified, +44/-159)
  • backend/src/pilot_space/api/v1/routers/notes_ai.py (modified, +2/-23)
  • backend/src/pilot_space/api/v1/routers/notifications.py (modified, +5/-45)
  • backend/src/pilot_space/api/v1/routers/pm_blocks.py (modified, +21/-95)
  • backend/src/pilot_space/api/v1/routers/pm_capacity.py (modified, +9/-88)
  • backend/src/pilot_space/api/v1/routers/pm_dependency_graph.py (modified, +5/-26)
  • backend/src/pilot_space/api/v1/routers/pm_release_notes.py (modified, +1/-20)
  • backend/src/pilot_space/api/v1/routers/pm_sprint_board.py (modified, +20/-134)
  • backend/src/pilot_space/api/v1/routers/projects.py (modified, +39/-223)
  • backend/src/pilot_space/api/v1/routers/related_issues.py (modified, +35/-153)
  • backend/src/pilot_space/api/v1/routers/skill_approvals.py (modified, +6/-62)
  • backend/src/pilot_space/api/v1/routers/skill_templates.py (modified, +8/-23)
  • backend/src/pilot_space/api/v1/routers/skills.py (modified, +1/-17)
  • backend/src/pilot_space/api/v1/routers/webhooks.py (modified, +3/-29)
  • backend/src/pilot_space/api/v1/routers/workspace_action_buttons.py (modified, +30/-132)
  • backend/src/pilot_space/api/v1/routers/workspace_ai_settings.py (modified, +7/-262)
  • backend/src/pilot_space/api/v1/routers/workspace_encryption.py (modified, +10/-62)
  • backend/src/pilot_space/api/v1/routers/workspace_feature_toggles.py (modified, +3/-3)
  • backend/src/pilot_space/api/v1/routers/workspace_issues.py (modified, +7/-37)
  • backend/src/pilot_space/api/v1/routers/workspace_mcp_servers.py (modified, +91/-606)
  • backend/src/pilot_space/api/v1/routers/workspace_note_issue_links.py (modified, +4/-32)
  • backend/src/pilot_space/api/v1/routers/workspace_note_links.py (modified, +6/-46)
  • backend/src/pilot_space/api/v1/routers/workspace_notes_ai.py (modified, +42/-110)
  • backend/src/pilot_space/api/v1/routers/workspace_ocr_settings.py (modified, +53/-194)
  • backend/src/pilot_space/api/v1/routers/workspace_plugins.py (modified, +51/-306)
  • backend/src/pilot_space/api/v1/routers/workspace_quota.py (modified, +5/-44)
  • backend/src/pilot_space/api/v1/routers/workspace_role_skills.py (modified, +7/-23)
  • backend/src/pilot_space/api/v1/schemas/ai.py (added, +36/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_annotations.py (added, +31/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_chat.py (added, +102/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_chat_model_routing.py (added, +60/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_extraction.py (added, +118/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_governance.py (added, +34/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_sessions.py (added, +125/-0)
  • backend/src/pilot_space/api/v1/schemas/ai_tasks.py (added, +57/-0)
  • backend/src/pilot_space/api/v1/schemas/attachments.py (modified, +5/-37)
  • backend/src/pilot_space/api/v1/schemas/dependency_graph.py (added, +46/-0)
  • backend/src/pilot_space/api/v1/schemas/ghost_text.py (added, +80/-0)
  • backend/src/pilot_space/api/v1/schemas/issue.py (modified, +7/-0)
  • backend/src/pilot_space/api/v1/schemas/knowledge_graph.py (modified, +8/-0)
  • backend/src/pilot_space/api/v1/schemas/mcp_server.py (added, +571/-0)
  • backend/src/pilot_space/api/v1/schemas/mcp_tools.py (added, +52/-0)
  • backend/src/pilot_space/api/v1/schemas/notifications.py (added, +57/-0)
  • backend/src/pilot_space/api/v1/schemas/pm_blocks.py (added, +47/-0)
  • backend/src/pilot_space/api/v1/schemas/pm_capacity.py (added, +40/-0)
  • backend/src/pilot_space/api/v1/schemas/pm_dependency_graph.py (added, +45/-0)
  • backend/src/pilot_space/api/v1/schemas/pm_release_notes.py (added, +36/-0)
  • backend/src/pilot_space/api/v1/schemas/pm_sprint_board.py (added, +70/-0)
  • backend/src/pilot_space/api/v1/schemas/pr_review.py (modified, +34/-0)
  • backend/src/pilot_space/api/v1/schemas/related_issues.py (added, +44/-0)
  • backend/src/pilot_space/api/v1/schemas/skill_approvals.py (added, +84/-0)
  • backend/src/pilot_space/api/v1/schemas/skills.py (added, +30/-0)
  • backend/src/pilot_space/api/v1/schemas/workspace_encryption.py (added, +77/-0)
  • backend/src/pilot_space/api/v1/schemas/workspace_note_issue_links.py (added, +43/-0)
  • backend/src/pilot_space/api/v1/schemas/workspace_note_links.py (added, +58/-0)
  • backend/src/pilot_space/api/v1/schemas/workspace_notes_ai.py (added, +39/-0)
  • backend/src/pilot_space/api/v1/schemas/workspace_ocr_settings.py (added, +51/-0)
  • backend/src/pilot_space/api/v1/schemas/workspace_quota.py (added, +55/-0)
  • backend/src/pilot_space/application/services/action_button.py (added, +174/-0)
  • backend/src/pilot_space/application/services/admin_dashboard.py (added, +198/-0)
  • backend/src/pilot_space/application/services/ai_configuration.py (added, +446/-0)
  • backend/src/pilot_space/application/services/ai_extraction.py (added, +234/-0)
  • backend/src/pilot_space/application/services/ai_governance.py (added, +347/-0)
  • backend/src/pilot_space/application/services/attachment_management.py (added, +317/-0)
  • backend/src/pilot_space/application/services/block_ownership.py (added, +239/-0)
  • backend/src/pilot_space/application/services/capacity_plan.py (added, +118/-0)
  • backend/src/pilot_space/application/services/dependency_graph.py (added, +241/-0)
  • backend/src/pilot_space/application/services/mcp_oauth.py (added, +292/-0)
  • backend/src/pilot_space/application/services/mcp_server.py (added, +554/-0)
  • backend/src/pilot_space/application/services/mcp_tool_execution.py (added, +156/-0)
  • backend/src/pilot_space/application/services/note_template.py (added, +203/-0)
  • backend/src/pilot_space/application/services/ocr_configuration.py (added, +175/-0)
  • backend/src/pilot_space/application/services/plugin_lifecycle.py (added, +357/-0)
  • backend/src/pilot_space/application/services/pm_block_insight_service.py (modified, +112/-0)
  • backend/src/pilot_space/application/services/project_detail.py (added, +187/-0)
  • backend/src/pilot_space/application/services/rate_limit.py (added, +74/-0)
  • backend/src/pilot_space/application/services/related_issues.py (added, +158/-0)

Code Example

No fallback model group found for original model_group=anthropic/claude-opus-4-6
Error doing the fallback: litellm.InternalServerError: ...

---

anthropic/claude-opus-4-6 -> databricks/databricks-claude-opus-4-6, vertex_ai/claude-opus-4-6

---

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits."
  }
}
RAW_BUFFERClick to expand / collapse

Bug Report

Description

When Anthropic returns a 400 Bad Request with error type invalid_request_error and message "Your credit balance is too low to access the Anthropic API", LiteLLM's router does not trigger fallback routing. The error is classified as a non-retryable client error, so the request fails without attempting any configured fallback models.

This is problematic because "credit balance too low" is a billing/quota issue, not an invalid request. The request itself is well-formed — it would succeed on any other provider serving the same model. Fallback should absolutely be attempted here.

Expected Behavior

When Anthropic returns a billing-related error (credit balance too low, quota exceeded, etc.), LiteLLM should classify it as a fallback-eligible error and route to the configured fallback model group, just as it would for a 500 or 429.

Actual Behavior

The router raises the error directly to the caller with:

No fallback model group found for original model_group=anthropic/claude-opus-4-6
Error doing the fallback: litellm.InternalServerError: ...

The fallback mechanism is entered but fails because the error is classified as non-retryable. CooldownDeployments=[] confirms the failed deployment is never placed on cooldown.

Broader Issue

More generally, there are several 400-class errors from providers that should trigger fallback because they are provider-specific issues, not request issues:

  • Anthropic: 400 with "credit balance too low" — billing issue
  • Anthropic: 400 with "overloaded" — capacity issue
  • OpenAI: 400 with "billing_hard_limit_reached" — billing issue
  • Any provider: Account-level blocks, region restrictions, or temporary suspensions returned as 400

These are all cases where the same request would succeed on a different provider/deployment. LiteLLM should either:

  1. Treat specific known billing/quota error messages as fallback-eligible regardless of HTTP status code
  2. Provide a configuration option (e.g., fallback_on_status_codes: [400, 500, 429]) to let users control which status codes trigger fallback routing

Our Use Case

We run a multi-provider setup with fallbacks configured across Anthropic, Databricks (Foundation Model API), Vertex AI, and AWS Bedrock — all serving the same Claude models. When one provider has an issue (billing, outage, rate limits), we expect LiteLLM to seamlessly route to the next provider. This worked perfectly in our mock_testing_fallbacks tests but failed in production because the real Anthropic error came back as a 400.

Reproduction

  1. Configure a model with fallbacks:
    anthropic/claude-opus-4-6 -> databricks/databricks-claude-opus-4-6, vertex_ai/claude-opus-4-6
  2. Ensure the Anthropic API key has zero credits
  3. Send a request to anthropic/claude-opus-4-6
  4. Observe: fallback is NOT triggered, error is returned directly

Anthropic Error Response

{
  "type": "error",
  "error": {
    "type": "invalid_request_error",
    "message": "Your credit balance is too low to access the Anthropic API. Please go to Plans & Billing to upgrade or purchase credits."
  }
}

HTTP status: 400 Bad Request

Environment

  • LiteLLM version: v1.81.0
  • Deployment: ECS Fargate
  • Python: 3.13

Note on Anthropic's Error Classification

Anthropic arguably should return 402 Payment Required or 429 Too Many Requests for billing issues rather than 400 Bad Request with invalid_request_error. However, LiteLLM shouldn't rely solely on HTTP status codes for fallback decisions — the error message content should also be considered.

extent analysis

Fix Plan

To address the issue, we need to modify the error handling logic in LiteLLM to treat specific known billing/quota error messages as fallback-eligible, regardless of the HTTP status code. We can achieve this by adding a configuration option to specify which error messages should trigger fallback routing.

Step 1: Add a configuration option

Add a new configuration option fallback_error_messages to specify the error messages that should trigger fallback routing.

# config.py
fallback_error_messages = [
    "Your credit balance is too low to access the Anthropic API",
    "billing_hard_limit_reached",
    # Add other known billing/quota error messages here
]

Step 2: Modify the error handling logic

Modify the error handling logic to check if the error message is in the fallback_error_messages list. If it is, trigger the fallback routing.

# error_handler.py
def handle_error(error):
    if error.message in fallback_error_messages:
        # Trigger fallback routing
        trigger_fallback()
    else:
        # Raise the error directly to the caller
        raise error

Step 3: Update the fallback mechanism

Update the fallback mechanism to use the new fallback_error_messages configuration option.

# fallback.py
def trigger_fallback():
    # Get the fallback model group
    fallback_model_group = get_fallback_model_group()
    
    # Route to the fallback model group
    route_to_fallback_model_group(fallback_model_group)

Verification

To verify that the fix worked, you can test the following scenarios:

  • Send a request to anthropic/claude-opus-4-6 with an Anthropic API key that has zero credits. The fallback should be triggered, and the request should be routed to the next provider.
  • Send a request to anthropic/claude-opus-4-6 with an Anthropic API key that has sufficient credits. The request should succeed, and the fallback should not be triggered.

Extra Tips

  • Make sure to update the fallback_error_messages list with known billing/quota error messages from all providers.
  • Consider adding a configuration option to let users control which status codes trigger fallback routing, in addition to the fallback_error_messages list.
  • Test the fallback mechanism thoroughly to ensure it works as expected in different scenarios.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix Anthropic 400 "credit balance too low" error does not trigger fallback routing [2 pull requests, 1 comments, 2 participants]