litellm - ✅(Solved) Fix [Bug]: No Retry-After header on RouterRateLimitError (all deployments in cooldown) [2 pull requests, 1 comments, 2 participants]

Q: Expected behavior

``` HTTP/1.1 429 Too Many Requests retry-after: 60 content-type: application/json ... {"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", ...}} ```

litellm2026-05-13 07:44:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#27823•Fetched 2026-05-14 03:30:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

b4lduin

Participants

b4lduin

hhhfs9s7y9-code

Timeline (top)

cross-referenced ×2commented ×1labeled ×1

Error Message

The RouterRateLimitError already carries self.cooldown_time as a float attribute. This value should be exposed as a standard Retry-After HTTP header on the 429 response so clients can respect it without parsing error message strings. {"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", "type": "None", "param": "None", "code": "429"}} No retry-after header. The timing is only available by parsing the error message body. {"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", ...}} In _handle_llm_api_exception in litellm/proxy/common_request_processing.py, after headers are assembled and before the raise ProxyException(...) branches, check if the exception is a RouterRateLimitError and add the header:

Fix Action

Fix

In _handle_llm_api_exception in litellm/proxy/common_request_processing.py, after headers are assembled and before the raise ProxyException(...) branches, check if the exception is a RouterRateLimitError and add the header:

from litellm.types.router import RouterRateLimitError

# ...

if isinstance(e, RouterRateLimitError):
    cooldown_time = getattr(e, "cooldown_time", None)
    if cooldown_time is not None:
        headers["retry-after"] = str(int(cooldown_time))

This is distinct from:

#21553 / PR #21648 — forwarding upstream provider Retry-After header (gap #1)
#26070 — exposing retry_after attribute on litellm.RateLimitError from provider messages

This issue covers the case where LiteLLM itself is the rate limiter (router-level cooldown), not the upstream provider.

PR fix notes

PR #27825: fix(proxy): add Retry-After header on RouterRateLimitError

Repository: BerriAI/litellm
Author: b4lduin
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/27825

Description (problem / solution / changelog)

Problem

When all deployments for a model are in cooldown (e.g. after upstream 429 rate limits), LiteLLM raises RouterRateLimitError with a human-readable message like:

No deployments available for selected model, Try again in 120 seconds.

However, no Retry-After HTTP header is set on the response. This means downstream clients (OpenAI SDK, custom agents, API gateways) cannot programmatically determine when to retry — they must parse the error message string, which is fragile and non-standard.

Fix

The cooldown_time is already available as a float attribute on RouterRateLimitError. This PR promotes it to a standard Retry-After HTTP header in _handle_llm_api_exception(), so clients get:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Changes

litellm/proxy/common_request_processing.py: After headers.update(custom_headers), check if the exception is a RouterRateLimitError and add retry-after header from e.cooldown_time.
tests/test_litellm/proxy/test_common_request_processing.py: New test class TestHandleLLMApiExceptionRetryAfterHeader with 3 tests:
- RouterRateLimitError with cooldown_time=60 → header "60"
- RouterRateLimitError with cooldown_time=0 → header "0"
- Non-rate-limit error → no retry-after header

Testing

pytest tests/test_litellm/proxy/test_common_request_processing.py -k "TestHandleLLMApiExceptionRetryAfterHeader" -v

Fixes #27823

Changed files

litellm/llms/gemini/google_genai/transformation.py (modified, +53/-0)
litellm/llms/vertex_ai/google_genai/transformation.py (modified, +3/-0)
litellm/proxy/common_request_processing.py (modified, +7/-0)
tests/test_litellm/google_genai/test_google_genai_transformation.py (modified, +227/-0)
tests/test_litellm/proxy/test_common_request_processing.py (modified, +57/-0)

PR #27826: fix(proxy): add Retry-After header on RouterRateLimitError

Repository: BerriAI/litellm
Author: b4lduin
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/27826

Description (problem / solution / changelog)

Problem

When all deployments for a model are in cooldown (e.g. after upstream 429 rate limits), LiteLLM raises RouterRateLimitError with a human-readable message like:

No deployments available for selected model, Try again in 120 seconds.

Fix

The cooldown_time is already available as a float attribute on RouterRateLimitError. This PR promotes it to a standard Retry-After HTTP header in _handle_llm_api_exception(), so clients get:

HTTP/1.1 429 Too Many Requests
Retry-After: 60

Changes

litellm/proxy/common_request_processing.py: After headers.update(custom_headers), check if the exception is a RouterRateLimitError and add retry-after header from e.cooldown_time.
tests/test_litellm/proxy/test_common_request_processing.py: New test class TestHandleLLMApiExceptionRetryAfterHeader with 3 tests:
- RouterRateLimitError with cooldown_time=60 → header "60"
- RouterRateLimitError with cooldown_time=0 → header "0"
- Non-rate-limit error → no retry-after header

Testing

pytest tests/test_litellm/proxy/test_common_request_processing.py -k "TestHandleLLMApiExceptionRetryAfterHeader" -v

Fixes #27823

Note: Previous PR #27825 was targeting main directly — closed and retargeted to litellm_oss_staging per repo contribution policy.

Changed files

litellm/proxy/common_request_processing.py (modified, +7/-0)
tests/test_litellm/proxy/test_common_request_processing.py (modified, +57/-0)

Code Example

HTTP/1.1 429 Too Many Requests
content-type: application/json
x-litellm-call-id: ...
x-litellm-version: ...

{"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", "type": "None", "param": "None", "code": "429"}}

---

HTTP/1.1 429 Too Many Requests
retry-after: 60
content-type: application/json
...

{"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", ...}}

---

from litellm.types.router import RouterRateLimitError

# ...

if isinstance(e, RouterRateLimitError):
    cooldown_time = getattr(e, "cooldown_time", None)
    if cooldown_time is not None:
        headers["retry-after"] = str(int(cooldown_time))

RAW_BUFFERClick to expand / collapse

What happened?

When all deployments for a model are in cooldown (e.g. after upstream 429s), LiteLLM raises a RouterRateLimitError with the message "No deployments available for selected model, Try again in X seconds". However, no Retry-After HTTP header is included in the response, so downstream clients (OpenAI SDK, custom clients, gateway agents) cannot programmatically determine when to retry.

What should happen?

Current behavior

HTTP/1.1 429 Too Many Requests
content-type: application/json
x-litellm-call-id: ...
x-litellm-version: ...

{"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", "type": "None", "param": "None", "code": "429"}}

No retry-after header. The timing is only available by parsing the error message body.

Expected behavior

HTTP/1.1 429 Too Many Requests
retry-after: 60
content-type: application/json
...

{"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", ...}}

Fix

from litellm.types.router import RouterRateLimitError

# ...

if isinstance(e, RouterRateLimitError):
    cooldown_time = getattr(e, "cooldown_time", None)
    if cooldown_time is not None:
        headers["retry-after"] = str(int(cooldown_time))

This is distinct from:

#21553 / PR #21648 — forwarding upstream provider Retry-After header (gap #1)
#26070 — exposing retry_after attribute on litellm.RateLimitError from provider messages

This issue covers the case where LiteLLM itself is the rate limiter (router-level cooldown), not the upstream provider.

Environment

LiteLLM version: 1.83.0
Deployment: LiteLLM Proxy
Upstream provider: z.ai (GLM-5.1)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

HTTP/1.1 429 Too Many Requests
retry-after: 60
content-type: application/json
...

{"error": {"message": "No deployments available for selected model, Try again in 60 seconds. ...", ...}}

#api #environment setup #docker error #permission error #memory optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: No Retry-After header on RouterRateLimitError (all deployments in cooldown) [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix

PR fix notes

PR #27825: fix(proxy): add Retry-After header on RouterRateLimitError

Description (problem / solution / changelog)

Problem

Fix

Changes

Testing

Changed files

PR #27826: fix(proxy): add Retry-After header on RouterRateLimitError

Description (problem / solution / changelog)

Problem

Fix

Changes

Testing

Changed files

Code Example

What happened?

What should happen?

Current behavior

Expected behavior

Fix

Environment

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING