litellm - ✅(Solved) Fix OpenAI-like providers from providers.json: 429 errors bypass cooldown (wrapped as APIConnectionError) [1 pull requests, 1 comments, 2 participants]

litellm2026-03-22 19:48:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24366•Fetched 2026-04-08 01:18:11

View on GitHub

Comments

Participants

Timeline

Reactions

Author

lazmo88

Participants

AhsanSheraz

lazmo88

Timeline (top)

referenced ×3cross-referenced ×2commented ×1labeled ×1

When using an OpenAI-like provider registered via providers.json (e.g. any custom provider with a base_url and api_key_env), 429 rate limit errors are incorrectly wrapped as APIConnectionError instead of RateLimitError. This prevents the router from cooling down the failing deployment and routing to healthy alternatives in the same model group.

Error Message

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: <Provider>Exception - Error code: 429 - {'error': {'message': 'Rate limit exceeded. Retry in 6s.', 'type': 'rate_limit'}}. Received Model Group=<model-group>
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 3

Root Cause

Two issues combine to cause this:

1. Providers from providers.json are not added to litellm._openai_like_providers

In exception_mapping_utils.py, the status-code-aware exception mapping for OpenAI-like providers only runs when custom_llm_provider in litellm._openai_like_providers (line ~857). Providers loaded from providers.json are not added to this list, so their exceptions fall through to the catch-all handler at line ~2406, which wraps everything as APIConnectionError regardless of HTTP status code.

2. Cooldown handler blanket-ignores APIConnectionError

In router_utils/cooldown_handlers.py, _is_cooldown_required() returns False for any exception string containing "APIConnectionError":

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            return False  # ← never cools down, even for wrapped 429s

Fix Action

Fixed

Fixed by PR: fix(router): cooldown 429 errors wrapped as APIConnectionError (https://github.com/BerriAI/litellm/pull/24367)

PR fix notes

PR #24367: fix(router): cooldown 429 errors wrapped as APIConnectionError

Repository: BerriAI/litellm
Author: lazmo88
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24367

Description (problem / solution / changelog)

Summary

Fixes #24366

OpenAI-like providers registered via providers.json have their HTTP 429 responses incorrectly mapped to APIConnectionError by the catch-all exception handler (because they are not added to litellm._openai_like_providers). The cooldown handler then blanket-ignores the error, preventing the router from cooling down the failing deployment and routing to healthy alternatives in the same model group.

Changes

litellm/router_utils/cooldown_handlers.py

Added _is_rate_limit_error() helper that detects 429/rate-limit indicators in exception strings and status codes
Modified _is_cooldown_required() to call this helper before skipping cooldown for APIConnectionError — if the wrapped error is actually a rate limit, cooldown proceeds normally

tests/router_unit_tests/test_router_cooldown_utils.py

Added test: 429 wrapped as APIConnectionError → _is_cooldown_required returns True
Added test: genuine APIConnectionError (connection refused) → still returns False

Root Cause

Two issues combine:

Providers from providers.json are not in _openai_like_providers, so their exceptions bypass the status-code-aware mapping in exception_mapping_utils.py and fall through to the catch-all handler which wraps everything as APIConnectionError
_is_cooldown_required() blanket-ignores APIConnectionError (line 57-63), so rate-limited deployments never enter cooldown

This PR fixes issue 2 (the safety net). Issue 1 (the root cause — proper provider registration) could be addressed separately.

Testing

pytest tests/router_unit_tests/test_router_cooldown_utils.py::test_is_cooldown_required_429_wrapped_as_apiconnectionerror -v  # PASS
pytest tests/router_unit_tests/test_router_cooldown_utils.py::test_is_cooldown_required_genuine_apiconnectionerror -v          # PASS

Changed files

litellm/router_utils/cooldown_handlers.py (modified, +33/-0)
tests/router_unit_tests/test_router_cooldown_utils.py (modified, +67/-0)

Code Example

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            return False  # ← never cools down, even for wrapped 429s

---

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            if "429" in exception_str or "Rate limit" in exception_str:
                pass  # still cooldown for rate limits
            else:
                return False

---

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: <Provider>Exception - Error code: 429 - {'error': {'message': 'Rate limit exceeded. Retry in 6s.', 'type': 'rate_limit'}}. Received Model Group=<model-group>
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 3

RAW_BUFFERClick to expand / collapse

Bug Report

Environment

LiteLLM version: latest (main-latest Docker image)
Python: 3.13
Router strategy: usage-based-routing-v2

Description

Root Cause

Two issues combine to cause this:

1. Providers from providers.json are not added to litellm._openai_like_providers

2. Cooldown handler blanket-ignores APIConnectionError

In router_utils/cooldown_handlers.py, _is_cooldown_required() returns False for any exception string containing "APIConnectionError":

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            return False  # ← never cools down, even for wrapped 429s

Result

When a providers.json provider returns HTTP 429:

Exception mapping falls through to catch-all → APIConnectionError
Cooldown handler sees "APIConnectionError" in the string → skips cooldown
Router retries the same deployment (up to num_retries times), all fail with 429
Error surfaces to caller: Available Model Group Fallbacks=None
Other healthy deployments in the same model group are never tried

Reproduction

Register a provider in providers.json with two deployments under the same model_name
Hit rate limits on the first deployment (429 response)
Observe: second deployment never receives traffic; all retries go to the rate-limited deployment

Expected Behavior

429 from any provider should:

Be mapped to RateLimitError (not APIConnectionError)
Trigger deployment cooldown so the router picks a healthy alternative

Suggested Fix

Option A (comprehensive): Register providers.json providers in _openai_like_providers so they enter the status-code-aware exception mapping path.

Option B (targeted): In _is_cooldown_required(), do not skip cooldown for APIConnectionError when the exception string also indicates a rate limit (e.g. contains "429" or "Rate limit"):

ignored_strings = ["APIConnectionError"]
if exception_str is not None:
    for ignored_string in ignored_strings:
        if ignored_string in exception_str:
            if "429" in exception_str or "Rate limit" in exception_str:
                pass  # still cooldown for rate limits
            else:
                return False

Option A is the proper fix; Option B is a safety net.

Logs

litellm.exceptions.APIConnectionError: litellm.APIConnectionError: <Provider>Exception - Error code: 429 - {'error': {'message': 'Rate limit exceeded. Retry in 6s.', 'type': 'rate_limit'}}. Received Model Group=<model-group>
Available Model Group Fallbacks=None LiteLLM Retried: 3 times, LiteLLM Max Retries: 3

extent analysis

Fix Plan

To fix the issue, we will implement Option A (comprehensive): Register providers.json providers in _openai_like_providers so they enter the status-code-aware exception mapping path.

Here are the concrete steps:

In exception_mapping_utils.py, modify the __init__ method to include providers from providers.json in litellm._openai_like_providers.
Add a check to ensure that only providers with a base_url and api_key_env are added to the list.

Example code:

# exception_mapping_utils.py
import json

def __init__(self):
    # ...
    self._load_providers_from_json()

def _load_providers_from_json(self):
    with open('providers.json') as f:
        providers = json.load(f)
        for provider in providers:
            if 'base_url' in provider and 'api_key_env' in provider:
                litellm._openai_like_providers.append(provider)

Verification

To verify that the fix worked:

Register a provider in providers.json with two deployments under the same model_name.
Hit rate limits on the first deployment (429 response).
Observe that the second deployment receives traffic after the first deployment cools down.

Extra Tips

Make sure to handle any potential errors when loading the providers.json file.
Consider adding logging to track when providers are added to litellm._openai_like_providers.
Review the exception_mapping_utils.py file to ensure that the status-code-aware exception mapping path is correctly handling rate limit errors.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix OpenAI-like providers from providers.json: 429 errors bypass cooldown (wrapped as APIConnectionError) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #24367: fix(router): cooldown 429 errors wrapped as APIConnectionError

Description (problem / solution / changelog)

Summary

Changes

Root Cause

Testing

Changed files

Code Example

Bug Report

Environment

Description

Root Cause

Result

Reproduction

Expected Behavior

Suggested Fix

Logs

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING