litellm - 💡(How to fix) Fix Cooldown TTL doesn't distinguish 429-rate-limit from 429-quota-exhausted (question / possible bug)

litellm2026-05-08 15:02:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fix / Workaround

is there a config knob i missed that does this distinction by body keyword?
if not, is "user should configure cooldown_time = 86400 per provider that has monthly quota" the intended workaround?
would you accept a PR that adds an optional body-keyword classifier (off by default, opt-in via config) for users who route across mixed free-tier providers?

RAW_BUFFERClick to expand / collapse

hey, posting this as a question first since i may have misread the code. happy to be told i'm wrong.

reading litellm/router.py at the current main, the Router class uses cooldown_time (a single TTL) and allowed_fails (a count). when a deployment exceeds allowed_fails, it goes into CooldownCache for cooldown_time. operators can configure both per deployment via cooldown_time kwarg or environment.

the failure mode i hit and that prompted this question: a free-tier provider (groq, in this case) returned 429 with body containing "you exceeded your monthly quota". my router (a homegrown thing modeled on this pattern) treated it the same as a transient 429-rate-limit, retried every 60s for hours.

reading the code i couldn't find a path where the body of a 429 is inspected to disambiguate "wait 60 seconds" from "wait until the period rolls over". cooldown_time is uniform.

questions:

is there a config knob i missed that does this distinction by body keyword?
if not, is "user should configure cooldown_time = 86400 per provider that has monthly quota" the intended workaround?
would you accept a PR that adds an optional body-keyword classifier (off by default, opt-in via config) for users who route across mixed free-tier providers?

context: i ran into this hard enough to write a small library that does the disambiguation: https://github.com/eleata/resilient-llm-router

my version is alpha and not feature-equivalent to litellm. if option 3 above sounds useful upstream, i can extract the parser as a small PR rather than recommend my lib.

happy to be wrong about any of the above. let me know.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt template #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix Cooldown TTL doesn't distinguish 429-rate-limit from 429-quota-exhausted (question / possible bug)

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix Cooldown TTL doesn't distinguish 429-rate-limit from 429-quota-exhausted (question / possible bug)

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

Still need to ship something?

RELATED_DISCOVERY

TRENDING