litellm - ✅(Solved) Fix [Bug]: dynamic_rate_limiter_v3 do not trigger fallback [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23749Fetched 2026-04-08 00:49:25
View on GitHub
Comments
0
Participants
1
Timeline
9
Reactions
0
Participants
Timeline (top)
cross-referenced ×5labeled ×3referenced ×1

Root Cause

The issue is architectural. Looking at the code flow:

1. Pre-call hook executes first litellm/proxy/common_request_processing.py#L681

2. Rate limiter raises 429 in pre_call_hook litellm/proxy/hooks/dynamic_rate_limiter.py#L220-L231

3. Router with fallback logic is never reached (this line is never executed when pre_call_hook raises) litellm/proxy/common_request_processing.py#L859

4. Fallback logic lives in the router (never called because pre_call_hook already raised 429) litellm/router.py#L5271-L5320

Fix Action

Fixed

PR fix notes

PR #23833: fix(dynamic_rate_limiter_v3): raise litellm.RateLimitError to trigger fallback

Description (problem / solution / changelog)

Problem

The dynamic_rate_limiter_v3 raised HTTPException(429) when rate limits were exceeded. This bypassed the router's fallback logic because the router only catches litellm.RateLimitError for fallback handling.

Root Cause

The async_pre_call_hook in dynamic_rate_limiter_v3 raised HTTPException(429) when rate limits were exceeded. However, the router's fallback logic in async_function_with_fallbacks_common_utils only handles specific exception types like litellm.RateLimitError.

Solution

Changed the exception type from HTTPException to litellm.RateLimitError in:

  • Model-wide capacity limit exceeded
  • Priority-based rate limit exceeded (when saturated)

Changes

  • Modified litellm/proxy/hooks/dynamic_rate_limiter_v3.py to raise litellm.RateLimitError instead of HTTPException
  • Added tests to verify the fix works correctly

Fixes #23749

Changed files

  • litellm/proxy/hooks/dynamic_rate_limiter_v3.py (modified, +17/-32)
  • tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3_fallback.py (added, +248/-0)

PR #23838: fix(proxy): preserve pre-call 429 behavior without fallbacks

Description (problem / solution / changelog)

Fixes #23749

Summary

  • Make dynamic_rate_limiter_v3 fallback-aware: if fallback routes exist, set _litellm_rate_limit_error for router handling; if no fallback exists, keep fail-fast behavior by raising RateLimitError from pre-call.
  • Keep router-side handling for _litellm_rate_limit_error so configured fallbacks can redirect rate-limited calls.
  • Add test coverage in tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py for fallback-marker behavior with configured fallbacks.

Before <img width="1101" height="237" alt="image" src="https://github.com/user-attachments/assets/c0fb4e53-a577-4376-a7c8-57a3fee63917" />

Now <img width="1101" height="452" alt="image" src="https://github.com/user-attachments/assets/0a7d26e6-7d60-4000-9dad-d1b5ad6303c3" />

Changed files

  • litellm/proxy/hooks/dynamic_rate_limiter_v3.py (modified, +102/-46)
  • litellm/router.py (modified, +9/-0)
  • tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py (modified, +234/-0)

PR #23839: fix(proxy): preserve pre-call 429 behavior without fallbacks

Description (problem / solution / changelog)

Fixes #23749

Summary

  • Make dynamic_rate_limiter_v3 fallback-aware: if fallback routes exist, set _litellm_rate_limit_error for router handling; if no fallback exists, keep fail-fast behavior by raising RateLimitError from pre-call.
  • Keep router-side handling for _litellm_rate_limit_error so configured fallbacks can redirect rate-limited calls.
  • Add test coverage in tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py for fallback-marker behavior with configured fallbacks. Config:
model_list:
  - model_name: ptu-primary
    litellm_params:
      model: gpt-3.5-turbo
      mock_response: "primary-ok"
      rpm: 1   # tiny capacity so we saturate immediately

  - model_name: paygo-fallback
    litellm_params:
      model: gpt-3.5-turbo
      mock_response: "fallback-ok"

litellm_settings:
  callbacks: ["dynamic_rate_limiter_v3"]
  priority_reservation:
    high: 0.7
    medium: 0.3
    low: 0.0
  fallbacks:
    - ptu-primary: ["paygo-fallback"]

Before <img width="1101" height="237" alt="image" src="https://github.com/user-attachments/assets/c0fb4e53-a577-4376-a7c8-57a3fee63917" />

Now <img width="1101" height="452" alt="image" src="https://github.com/user-attachments/assets/0a7d26e6-7d60-4000-9dad-d1b5ad6303c3" />

Changed files

  • .gitignore (modified, +2/-0)
  • litellm/proxy/hooks/dynamic_rate_limiter_v3.py (modified, +100/-23)
  • litellm/router.py (modified, +21/-0)
  • tests/test_litellm/proxy/hooks/test_dynamic_rate_limiter_v3.py (modified, +380/-0)

Code Example

model_list:
  - model_name: gpt-4.1-ptu-20250414
    litellm_params:
      model: azure/gpt-4.1-ptu
      api_base: ...
      tpm: 40000

  - model_name: gpt-4.1-20250414
    litellm_params:
      model: azure/gpt-4.1
      api_base: ...

---

litellm_settings:
  callbacks: ["dynamic_rate_limiter_v3", "prometheus"]
  priority_reservation:
    high: 0.7
    medium: 0.3
    low: 0.0
  fallbacks:
    - gpt-4.1-ptu-20250414: [gpt-4.1-20250414]

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using dynamic_rate_limiter_v3 for priority-based rate limiting alongside configured fallbacks, rate-limited requests return 429 to the client instead of falling back to the configured fallback model. This is because the rate limiter raises HTTPException(429) in async_pre_call_hook, which executes before the router's fallback logic.

My goal is to:

  • Reserve capacity for high-priority requests (key tagged "priority":"high")
  • Automatically redirect low-priority requests to Fallback when model TPM is saturated.

This example setup is designed for a PTU reservation scenario with Pay-As-You-Go fallback on Azure OpenAI / vertex ai

Current Setup

Models:

model_list:
  - model_name: gpt-4.1-ptu-20250414
    litellm_params:
      model: azure/gpt-4.1-ptu
      api_base: ...
      tpm: 40000

  - model_name: gpt-4.1-20250414
    litellm_params:
      model: azure/gpt-4.1
      api_base: ...

LiteLLM Settings:

litellm_settings:
  callbacks: ["dynamic_rate_limiter_v3", "prometheus"]
  priority_reservation:
    high: 0.7
    medium: 0.3
    low: 0.0
  fallbacks:
    - gpt-4.1-ptu-20250414: [gpt-4.1-20250414]

Current Behavior

  1. High-priority request → Served by PTU Model ✅
  2. Low-priority request when PTU saturated → Returns 429

Expected Behavior

  1. High-priority request → Served by PTU model ✅
  2. Low-priority request when PTU saturated → Falls back to PayGo model

Root Cause Analysis

The issue is architectural. Looking at the code flow:

1. Pre-call hook executes first litellm/proxy/common_request_processing.py#L681

2. Rate limiter raises 429 in pre_call_hook litellm/proxy/hooks/dynamic_rate_limiter.py#L220-L231

3. Router with fallback logic is never reached (this line is never executed when pre_call_hook raises) litellm/proxy/common_request_processing.py#L859

4. Fallback logic lives in the router (never called because pre_call_hook already raised 429) litellm/router.py#L5271-L5320

Steps to Reproduce

  1. Deploy a model and configure a fallback option.

  2. Use two API keys with different priority levels.
  3. Bust the model to reach max TPM
  4. When the maximum TPM is reached, calls are not redirected to the fallback option.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.12

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to modify the dynamic_rate_limiter_v3 to not raise a 429 exception when the rate limit is exceeded for low-priority requests. Instead, it should allow the request to proceed to the router's fallback logic.

Here are the steps to fix the issue:

  • Modify the dynamic_rate_limiter_v3 to check the priority of the request and only raise a 429 exception for high-priority requests when the rate limit is exceeded.
  • For low-priority requests, allow the request to proceed to the router's fallback logic by not raising an exception.

Example code changes:

# litellm/proxy/hooks/dynamic_rate_limiter.py

def async_pre_call_hook(self, request):
    # ... (existing code)

    if exceeded_limit and request.priority == "high":
        raise HTTPException(429)
    # For low-priority requests, do not raise an exception
    elif exceeded_limit and request.priority == "low":
        # Allow the request to proceed to the router's fallback logic
        pass

Additionally, we need to modify the common_request_processing.py to handle the case where the rate limiter does not raise an exception for low-priority requests.

# litellm/proxy/common_request_processing.py

def async_pre_call_hook(self, request):
    # ... (existing code)

    try:
        # Call the rate limiter
        await self.rate_limiter.async_pre_call_hook(request)
    except HTTPException as e:
        # Handle the exception
        if e.status_code == 429 and request.priority == "low":
            # Proceed to the router's fallback logic
            return await self.router.handle_request(request)

Verification

To verify that the fix worked, you can test the following scenarios:

  • Send a high-priority request when the rate limit is exceeded and verify that a 429 exception is raised.
  • Send a low-priority request when the rate limit is exceeded and verify that the request is redirected to the fallback model.

Extra Tips

  • Make sure to test the changes thoroughly to ensure that the rate limiter is working correctly for both high-priority and low-priority requests.
  • Consider adding additional logging to monitor the rate limiter's behavior and ensure that it is working as expected.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING