litellm - 💡(How to fix) Fix Cooldown TTL doesn't distinguish 429-rate-limit from 429-quota-exhausted (question / possible bug)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

  1. is there a config knob i missed that does this distinction by body keyword?
  2. if not, is "user should configure cooldown_time = 86400 per provider that has monthly quota" the intended workaround?
  3. would you accept a PR that adds an optional body-keyword classifier (off by default, opt-in via config) for users who route across mixed free-tier providers?
RAW_BUFFERClick to expand / collapse

hey, posting this as a question first since i may have misread the code. happy to be told i'm wrong.

reading litellm/router.py at the current main, the Router class uses cooldown_time (a single TTL) and allowed_fails (a count). when a deployment exceeds allowed_fails, it goes into CooldownCache for cooldown_time. operators can configure both per deployment via cooldown_time kwarg or environment.

the failure mode i hit and that prompted this question: a free-tier provider (groq, in this case) returned 429 with body containing "you exceeded your monthly quota". my router (a homegrown thing modeled on this pattern) treated it the same as a transient 429-rate-limit, retried every 60s for hours.

reading the code i couldn't find a path where the body of a 429 is inspected to disambiguate "wait 60 seconds" from "wait until the period rolls over". cooldown_time is uniform.

questions:

  1. is there a config knob i missed that does this distinction by body keyword?
  2. if not, is "user should configure cooldown_time = 86400 per provider that has monthly quota" the intended workaround?
  3. would you accept a PR that adds an optional body-keyword classifier (off by default, opt-in via config) for users who route across mixed free-tier providers?

context: i ran into this hard enough to write a small library that does the disambiguation: https://github.com/eleata/resilient-llm-router

my version is alpha and not feature-equivalent to litellm. if option 3 above sounds useful upstream, i can extract the parser as a small PR rather than recommend my lib.

happy to be wrong about any of the above. let me know.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix Cooldown TTL doesn't distinguish 429-rate-limit from 429-quota-exhausted (question / possible bug)