litellm - ✅(Solved) Fix [Bug]: dynamic_rate_limiter_v3 do not trigger fallback [3 pull requests, 1 participants]

MaximeBOUDIER · 2026-03-16T14:13:28Z

[litellm] PR 23833: fix dynamic rate limiter v3 : raise litellm.RateLimitError to trigger fallback - Repository: BerriAI/litellm - Author: BillionClaw - State:… # PR #23833: fix(dynamic_rate_limiter_v3): raise litellm.RateLimitError to trigger fallback - Repository: BerriAI/litellm - Author: BillionClaw - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/23833 ## Description (problem / solution / changelog) ## Problem The dynamic_rate_limiter_v3 raised HTTPException(429) when rate limits were exceeded. This bypassed the router's fallback logic because the router only catches litellm.RateLimitError for fallback handling. ## Root Cause The async_pre_call_hook in dynamic_rate_limiter_v3 raised HTTPException(429) when rate limits were exceeded. However, the router's fallback logic in async_function_with_fallbacks_common_utils only handles specific exception types like litellm.RateLimitError. ## Solution Changed the exception type from HTTPException to litellm.RateLimitError in: - Model-wide capacity limit exceeded - Priority-based rate limit exceeded (when saturated) ## Changes - Modified litellm/proxy/hooks/dynamic_rate_limiter_v3.py to raise litellm.RateLimitError instead of HTTPException - Added tests to verify the fix works correctly Fixes #23749 ## Changed files - `litellm/proxy/hooks/dynamic_rate_limiter_v3.py` (modified, +17/-32) - `tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3_fallback.py` (added, +248/-0) --- # PR #23838: fix(proxy): preserve pre-call 429 behavior without fallbacks - Repository: BerriAI/litellm - Author: Sameerlite - State: closed | merged: False - Link: https://github.com/BerriAI/litellm/pull/23838 ## Description (problem / solution / changelog) Fixes #23749 ## Summary - Make `dynamic_rate_limiter_v3` fallback-aware: if fallback routes exist, set `_litellm_rate_limit_error` for router handling; if no fallback exists, keep fail-fast behavior by raising `RateLimitError` from pre-call. - Keep router-side handling for `_litellm_rate_limit_error` so configured fallbacks can redirect rate-limited calls. - Add test coverage in `tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py` for fallback-marker behavior with configured fallbacks. **Before** **Now** ## Changed files - `litellm/proxy/hooks/dynamic_rate_limiter_v3.py` (modified, +102/-46) - `litellm/router.py` (modified, +9/-0) - `tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py` (modified, +234/-0) --- # PR #23839: fix(proxy): preserve pre-call 429 behavior without fallbacks - Repository: BerriAI/litellm - Author: Sameerlite - State: closed | merged: False - Link: https://github.com/BerriAI/litellm/pull/23839 ## Description (problem / solution / changelog) Fixes #23749 ## Summary - Make `dynamic_rate_limiter_v3` fallback-aware: if fallback routes exist, set `_litellm_rate_limit_error` for router handling; if no fallback exists, keep fail-fast behavior by raising `RateLimitError` from pre-call. - Keep router-side handling for `_litellm_rate_limit_error` so configured fallbacks can redirect rate-limited calls. - Add test coverage in `tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py` for fallback-marker behavior with configured fallbacks. Config: ``` model_list: - model_name: ptu-primary litellm_params: model: gpt-3.5-turbo mock_response: "primary-ok" rpm: 1 # tiny capacity so we saturate immediately - model_name: paygo-fallback litellm_params: model: gpt-3.5-turbo mock_response: "fallback-ok" litellm_settings: callbacks: ["dynamic_rate_limiter_v3"] priority_reservation: high: 0.7 medium: 0.3 low: 0.0 fallbacks: - ptu-primary: ["paygo-fallback"] ``` **Before** **Now** ## Changed files - `.gitignore` (modified, +2/-0) - `litellm/proxy/hooks/dynamic_rate_limiter_v3.py` (modified, +100/-23) - `litellm/router.py` (modified, +21/-0) - `tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py` (modified, +380/-0) ## Fixed - Fixed by PR: fix(dynamic_rate_limiter_v3): raise litellm.RateLimitError to trigger fallback (https://github.com/BerriAI/litellm/pull/23833) - Fixed by PR: fix(proxy): preserve pre-call 429 behavior without fallbacks (https://github.com/BerriAI/litellm/pull/23838) - Fixed by PR: fix(proxy): preserve pre-call 429 behavior without fallbacks (https://github.com/BerriAI/litellm/pull/23839) ### Check for existing issues - [x] I have searched the existing issues and checked that my issue is not a duplicate. ### What happened? When us

litellm2026-03-16 14:13:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23749•Fetched 2026-04-08 00:49:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

MaximeBOUDIER

Participants

MaximeBOUDIER

Timeline (top)

cross-referenced ×5labeled ×3referenced ×1

Root Cause

The issue is architectural. Looking at the code flow:

1. Pre-call hook executes first litellm/proxy/common_request_processing.py#L681

2. Rate limiter raises 429 in pre_call_hook litellm/proxy/hooks/dynamic_rate_limiter.py#L220-L231

3. Router with fallback logic is never reached (this line is never executed when pre_call_hook raises) litellm/proxy/common_request_processing.py#L859

4. Fallback logic lives in the router (never called because pre_call_hook already raised 429) litellm/router.py#L5271-L5320

Fix Action

Fixed

Fixed by PR: fix(dynamic_rate_limiter_v3): raise litellm.RateLimitError to trigger fallback (https://github.com/BerriAI/litellm/pull/23833)
Fixed by PR: fix(proxy): preserve pre-call 429 behavior without fallbacks (https://github.com/BerriAI/litellm/pull/23838)
Fixed by PR: fix(proxy): preserve pre-call 429 behavior without fallbacks (https://github.com/BerriAI/litellm/pull/23839)

PR fix notes

PR #23833: fix(dynamic_rate_limiter_v3): raise litellm.RateLimitError to trigger fallback

Repository: BerriAI/litellm
Author: BillionClaw
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/23833

Description (problem / solution / changelog)

Problem

The dynamic_rate_limiter_v3 raised HTTPException(429) when rate limits were exceeded. This bypassed the router's fallback logic because the router only catches litellm.RateLimitError for fallback handling.

Root Cause

The async_pre_call_hook in dynamic_rate_limiter_v3 raised HTTPException(429) when rate limits were exceeded. However, the router's fallback logic in async_function_with_fallbacks_common_utils only handles specific exception types like litellm.RateLimitError.

Solution

Changed the exception type from HTTPException to litellm.RateLimitError in:

Model-wide capacity limit exceeded
Priority-based rate limit exceeded (when saturated)

Changes

Modified litellm/proxy/hooks/dynamic_rate_limiter_v3.py to raise litellm.RateLimitError instead of HTTPException
Added tests to verify the fix works correctly

Fixes #23749

Changed files

litellm/proxy/hooks/dynamic_rate_limiter_v3.py (modified, +17/-32)
tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3_fallback.py (added, +248/-0)

PR #23838: fix(proxy): preserve pre-call 429 behavior without fallbacks

Repository: BerriAI/litellm
Author: Sameerlite
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/23838

Description (problem / solution / changelog)

Fixes #23749

Summary

Make dynamic_rate_limiter_v3 fallback-aware: if fallback routes exist, set _litellm_rate_limit_error for router handling; if no fallback exists, keep fail-fast behavior by raising RateLimitError from pre-call.
Keep router-side handling for _litellm_rate_limit_error so configured fallbacks can redirect rate-limited calls.
Add test coverage in tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py for fallback-marker behavior with configured fallbacks.

Before <img width="1101" height="237" alt="image" src="https://github.com/user-attachments/assets/c0fb4e53-a577-4376-a7c8-57a3fee63917" />

Now <img width="1101" height="452" alt="image" src="https://github.com/user-attachments/assets/0a7d26e6-7d60-4000-9dad-d1b5ad6303c3" />

Changed files

litellm/proxy/hooks/dynamic_rate_limiter_v3.py (modified, +102/-46)
litellm/router.py (modified, +9/-0)
tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py (modified, +234/-0)

PR #23839: fix(proxy): preserve pre-call 429 behavior without fallbacks

Repository: BerriAI/litellm
Author: Sameerlite
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/23839

Description (problem / solution / changelog)

Fixes #23749

Summary

Make dynamic_rate_limiter_v3 fallback-aware: if fallback routes exist, set _litellm_rate_limit_error for router handling; if no fallback exists, keep fail-fast behavior by raising RateLimitError from pre-call.
Keep router-side handling for _litellm_rate_limit_error so configured fallbacks can redirect rate-limited calls.
Add test coverage in tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py for fallback-marker behavior with configured fallbacks. Config:

model_list:
  - model_name: ptu-primary
    litellm_params:
      model: gpt-3.5-turbo
      mock_response: "primary-ok"
      rpm: 1   # tiny capacity so we saturate immediately

  - model_name: paygo-fallback
    litellm_params:
      model: gpt-3.5-turbo
      mock_response: "fallback-ok"

litellm_settings:
  callbacks: ["dynamic_rate_limiter_v3"]
  priority_reservation:
    high: 0.7
    medium: 0.3
    low: 0.0
  fallbacks:
    - ptu-primary: ["paygo-fallback"]

Before <img width="1101" height="237" alt="image" src="https://github.com/user-attachments/assets/c0fb4e53-a577-4376-a7c8-57a3fee63917" />

Now <img width="1101" height="452" alt="image" src="https://github.com/user-attachments/assets/0a7d26e6-7d60-4000-9dad-d1b5ad6303c3" />

Changed files

.gitignore (modified, +2/-0)
litellm/proxy/hooks/dynamic_rate_limiter_v3.py (modified, +100/-23)
litellm/router.py (modified, +21/-0)
tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py (modified, +380/-0)

Code Example

model_list:
  - model_name: gpt-4.1-ptu-20250414
    litellm_params:
      model: azure/gpt-4.1-ptu
      api_base: ...
      tpm: 40000

  - model_name: gpt-4.1-20250414
    litellm_params:
      model: azure/gpt-4.1
      api_base: ...

---

litellm_settings:
  callbacks: ["dynamic_rate_limiter_v3", "prometheus"]
  priority_reservation:
    high: 0.7
    medium: 0.3
    low: 0.0
  fallbacks:
    - gpt-4.1-ptu-20250414: [gpt-4.1-20250414]

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using dynamic_rate_limiter_v3 for priority-based rate limiting alongside configured fallbacks, rate-limited requests return 429 to the client instead of falling back to the configured fallback model. This is because the rate limiter raises HTTPException(429) in async_pre_call_hook, which executes before the router's fallback logic.

My goal is to:

Reserve capacity for high-priority requests (key tagged "priority":"high")
Automatically redirect low-priority requests to Fallback when model TPM is saturated.

This example setup is designed for a PTU reservation scenario with Pay-As-You-Go fallback on Azure OpenAI / vertex ai

Current Setup

Models:

model_list:
  - model_name: gpt-4.1-ptu-20250414
    litellm_params:
      model: azure/gpt-4.1-ptu
      api_base: ...
      tpm: 40000

  - model_name: gpt-4.1-20250414
    litellm_params:
      model: azure/gpt-4.1
      api_base: ...

LiteLLM Settings:

litellm_settings:
  callbacks: ["dynamic_rate_limiter_v3", "prometheus"]
  priority_reservation:
    high: 0.7
    medium: 0.3
    low: 0.0
  fallbacks:
    - gpt-4.1-ptu-20250414: [gpt-4.1-20250414]

Current Behavior

High-priority request → Served by PTU Model ✅
Low-priority request when PTU saturated → Returns 429 ❌

Expected Behavior

High-priority request → Served by PTU model ✅
Low-priority request when PTU saturated → Falls back to PayGo model ✅

Root Cause Analysis

The issue is architectural. Looking at the code flow:

1. Pre-call hook executes first litellm/proxy/common_request_processing.py#L681

2. Rate limiter raises 429 in pre_call_hook litellm/proxy/hooks/dynamic_rate_limiter.py#L220-L231

3. Router with fallback logic is never reached (this line is never executed when pre_call_hook raises) litellm/proxy/common_request_processing.py#L859

4. Fallback logic lives in the router (never called because pre_call_hook already raised 429) litellm/router.py#L5271-L5320

Steps to Reproduce

Deploy a model and configure a fallback option. 
Use two API keys with different priority levels.
Bust the model to reach max TPM
When the maximum TPM is reached, calls are not redirected to the fallback option.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.12

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to modify the dynamic_rate_limiter_v3 to not raise a 429 exception when the rate limit is exceeded for low-priority requests. Instead, it should allow the request to proceed to the router's fallback logic.

Here are the steps to fix the issue:

Modify the dynamic_rate_limiter_v3 to check the priority of the request and only raise a 429 exception for high-priority requests when the rate limit is exceeded.
For low-priority requests, allow the request to proceed to the router's fallback logic by not raising an exception.

Example code changes:

# litellm/proxy/hooks/dynamic_rate_limiter.py

def async_pre_call_hook(self, request):
    # ... (existing code)

    if exceeded_limit and request.priority == "high":
        raise HTTPException(429)
    # For low-priority requests, do not raise an exception
    elif exceeded_limit and request.priority == "low":
        # Allow the request to proceed to the router's fallback logic
        pass

Additionally, we need to modify the common_request_processing.py to handle the case where the rate limiter does not raise an exception for low-priority requests.

# litellm/proxy/common_request_processing.py

def async_pre_call_hook(self, request):
    # ... (existing code)

    try:
        # Call the rate limiter
        await self.rate_limiter.async_pre_call_hook(request)
    except HTTPException as e:
        # Handle the exception
        if e.status_code == 429 and request.priority == "low":
            # Proceed to the router's fallback logic
            return await self.router.handle_request(request)

Verification

To verify that the fix worked, you can test the following scenarios:

Send a high-priority request when the rate limit is exceeded and verify that a 429 exception is raised.
Send a low-priority request when the rate limit is exceeded and verify that the request is redirected to the fallback model.

Extra Tips

Make sure to test the changes thoroughly to ensure that the rate limiter is working correctly for both high-priority and low-priority requests.
Consider adding additional logging to monitor the rate limiter's behavior and ensure that it is working as expected.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: dynamic_rate_limiter_v3 do not trigger fallback [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #23833: fix(dynamic_rate_limiter_v3): raise litellm.RateLimitError to trigger fallback

Description (problem / solution / changelog)

Problem

Root Cause

Solution

Changes

Changed files

PR #23838: fix(proxy): preserve pre-call 429 behavior without fallbacks

Description (problem / solution / changelog)

Summary

Changed files

PR #23839: fix(proxy): preserve pre-call 429 behavior without fallbacks

Description (problem / solution / changelog)

Summary

Changed files

Code Example

Check for existing issues

What happened?

Current Setup

Current Behavior

Expected Behavior

Root Cause Analysis

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING