litellm - ✅(Solved) Fix OpenAI-like providers from providers.json: 429 errors bypass cooldown (wrapped as APIConnectionError) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24366Fetched 2026-04-08 01:18:11
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
0
Author
Participants
Timeline (top)
referenced ×3cross-referenced ×2commented ×1labeled ×1

When using an OpenAI-like provider registered via providers.json (e.g. any custom provider with a base_url and api_key_env), 429 rate limit errors are incorrectly wrapped as APIConnectionError instead of RateLimitError. This prevents the router from cooling down the failing deployment and routing to healthy alternatives in the same model group.

Error Message

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: <Provider>Exception - Error code: 429 - {'error': {'message': 'Rate limit exceeded. Retry in 6s.', 'type': 'rate_limit'}}. Received Model Group=<model-group>
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 3

Root Cause

Two issues combine to cause this:

1. Providers from providers.json are not added to litellm._openai_like_providers

In exception_mapping_utils.py, the status-code-aware exception mapping for OpenAI-like providers only runs when custom_llm_provider in litellm._openai_like_providers (line ~857). Providers loaded from providers.json are not added to this list, so their exceptions fall through to the catch-all handler at line ~2406, which wraps everything as APIConnectionError regardless of HTTP status code.

2. Cooldown handler blanket-ignores APIConnectionError

In router_utils/cooldown_handlers.py, _is_cooldown_required() returns False for any exception string containing "APIConnectionError":

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            return False  # ← never cools down, even for wrapped 429s

Fix Action

Fixed

PR fix notes

PR #24367: fix(router): cooldown 429 errors wrapped as APIConnectionError

Description (problem / solution / changelog)

Summary

Fixes #24366

OpenAI-like providers registered via providers.json have their HTTP 429 responses incorrectly mapped to APIConnectionError by the catch-all exception handler (because they are not added to litellm._openai_like_providers). The cooldown handler then blanket-ignores the error, preventing the router from cooling down the failing deployment and routing to healthy alternatives in the same model group.

Changes

litellm/router_utils/cooldown_handlers.py

  • Added _is_rate_limit_error() helper that detects 429/rate-limit indicators in exception strings and status codes
  • Modified _is_cooldown_required() to call this helper before skipping cooldown for APIConnectionError — if the wrapped error is actually a rate limit, cooldown proceeds normally

tests/router_unit_tests/test_router_cooldown_utils.py

  • Added test: 429 wrapped as APIConnectionError_is_cooldown_required returns True
  • Added test: genuine APIConnectionError (connection refused) → still returns False

Root Cause

Two issues combine:

  1. Providers from providers.json are not in _openai_like_providers, so their exceptions bypass the status-code-aware mapping in exception_mapping_utils.py and fall through to the catch-all handler which wraps everything as APIConnectionError

  2. _is_cooldown_required() blanket-ignores APIConnectionError (line 57-63), so rate-limited deployments never enter cooldown

This PR fixes issue 2 (the safety net). Issue 1 (the root cause — proper provider registration) could be addressed separately.

Testing

pytest tests/router_unit_tests/test_router_cooldown_utils.py::test_is_cooldown_required_429_wrapped_as_apiconnectionerror -v  # PASS
pytest tests/router_unit_tests/test_router_cooldown_utils.py::test_is_cooldown_required_genuine_apiconnectionerror -v          # PASS

Changed files

  • litellm/router_utils/cooldown_handlers.py (modified, +33/-0)
  • tests/router_unit_tests/test_router_cooldown_utils.py (modified, +67/-0)

Code Example

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            return False  # ← never cools down, even for wrapped 429s

---

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            if "429" in exception_str or "Rate limit" in exception_str:
                pass  # still cooldown for rate limits
            else:
                return False

---

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: <Provider>Exception - Error code: 429 - {'error': {'message': 'Rate limit exceeded. Retry in 6s.', 'type': 'rate_limit'}}. Received Model Group=<model-group>
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 3
RAW_BUFFERClick to expand / collapse

Bug Report

Environment

  • LiteLLM version: latest (main-latest Docker image)
  • Python: 3.13
  • Router strategy: usage-based-routing-v2

Description

When using an OpenAI-like provider registered via providers.json (e.g. any custom provider with a base_url and api_key_env), 429 rate limit errors are incorrectly wrapped as APIConnectionError instead of RateLimitError. This prevents the router from cooling down the failing deployment and routing to healthy alternatives in the same model group.

Root Cause

Two issues combine to cause this:

1. Providers from providers.json are not added to litellm._openai_like_providers

In exception_mapping_utils.py, the status-code-aware exception mapping for OpenAI-like providers only runs when custom_llm_provider in litellm._openai_like_providers (line ~857). Providers loaded from providers.json are not added to this list, so their exceptions fall through to the catch-all handler at line ~2406, which wraps everything as APIConnectionError regardless of HTTP status code.

2. Cooldown handler blanket-ignores APIConnectionError

In router_utils/cooldown_handlers.py, _is_cooldown_required() returns False for any exception string containing "APIConnectionError":

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            return False  # ← never cools down, even for wrapped 429s

Result

When a providers.json provider returns HTTP 429:

  1. Exception mapping falls through to catch-all → APIConnectionError
  2. Cooldown handler sees "APIConnectionError" in the string → skips cooldown
  3. Router retries the same deployment (up to num_retries times), all fail with 429
  4. Error surfaces to caller: Available Model Group Fallbacks=None
  5. Other healthy deployments in the same model group are never tried

Reproduction

  1. Register a provider in providers.json with two deployments under the same model_name
  2. Hit rate limits on the first deployment (429 response)
  3. Observe: second deployment never receives traffic; all retries go to the rate-limited deployment

Expected Behavior

429 from any provider should:

  • Be mapped to RateLimitError (not APIConnectionError)
  • Trigger deployment cooldown so the router picks a healthy alternative

Suggested Fix

Option A (comprehensive): Register providers.json providers in _openai_like_providers so they enter the status-code-aware exception mapping path.

Option B (targeted): In _is_cooldown_required(), do not skip cooldown for APIConnectionError when the exception string also indicates a rate limit (e.g. contains "429" or "Rate limit"):

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            if "429" in exception_str or "Rate limit" in exception_str:
                pass  # still cooldown for rate limits
            else:
                return False

Option A is the proper fix; Option B is a safety net.

Logs

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: <Provider>Exception - Error code: 429 - {'error': {'message': 'Rate limit exceeded. Retry in 6s.', 'type': 'rate_limit'}}. Received Model Group=<model-group>
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 3

extent analysis

Fix Plan

To fix the issue, we will implement Option A (comprehensive): Register providers.json providers in _openai_like_providers so they enter the status-code-aware exception mapping path.

Here are the concrete steps:

  • In exception_mapping_utils.py, modify the __init__ method to include providers from providers.json in litellm._openai_like_providers.
  • Add a check to ensure that only providers with a base_url and api_key_env are added to the list.

Example code:

# exception_mapping_utils.py
import json

def __init__(self):
    # ...
    self._load_providers_from_json()

def _load_providers_from_json(self):
    with open('providers.json') as f:
        providers = json.load(f)
        for provider in providers:
            if 'base_url' in provider and 'api_key_env' in provider:
                litellm._openai_like_providers.append(provider)

Verification

To verify that the fix worked:

  • Register a provider in providers.json with two deployments under the same model_name.
  • Hit rate limits on the first deployment (429 response).
  • Observe that the second deployment receives traffic after the first deployment cools down.

Extra Tips

  • Make sure to handle any potential errors when loading the providers.json file.
  • Consider adding logging to track when providers are added to litellm._openai_like_providers.
  • Review the exception_mapping_utils.py file to ensure that the status-code-aware exception mapping path is correctly handling rate limit errors.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING