litellm - 💡(How to fix) Fix [Bug]: AllowedFailsPolicy.InternalServerErrorAllowedFails is silently ignored in get_allowed_fails_from_policy [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

The method handles 5 error types (AuthenticationError, Timeout, RateLimitError, ContentPolicyViolationError, BadRequestError) but is missing InternalServerError. As a result, the setting is accepted in config but silently ignored at runtime, falling back to the global allowed_fails value. Actual behavior: The InternalServerErrorAllowedFails value is never read. get_allowed_fails_from_policy returns None for InternalServerError, so the router falls back to the global allowed_fails (e.g. 10000), effectively disabling cooldown for this error type. if isinstance(exception, litellm.InternalServerError) and allowed_fails_policy.InternalServerErrorAllowedFails is not None:

Root Cause

  1. Configure proxy with allowed_fails_policy: router_settings: allowed_fails: 10000 allowed_fails_policy: InternalServerErrorAllowedFails: 2 cooldown_time: 60 fallbacks:
    • model-a:
      • modeal-b
  2. Send requests that trigger InternalServerError on model-a
  3. After 2 failures, expect cooldown + fallback to model-b
  4. Observe: cooldown does not trigger because the router uses allowed_fails: 10000 instead of InternalServerErrorAllowedFails: 2

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

AllowedFailsPolicy defines InternalServerErrorAllowedFails as a field in litellm/types/router.py, but get_allowed_fails_from_policy() in litellm/router.py (line ~11652) does not have a branch for InternalServerError.

The method handles 5 error types (AuthenticationError, Timeout, RateLimitError, ContentPolicyViolationError, BadRequestError) but is missing InternalServerError. As a result, the setting is accepted in config but silently ignored at runtime, falling back to the global allowed_fails value.

Expected behavior: When InternalServerErrorAllowedFails: 2 is configured, a deployment should be cooled down after 2 InternalServerError failures.

Actual behavior: The InternalServerErrorAllowedFails value is never read. get_allowed_fails_from_policy returns None for InternalServerError, so the router falls back to the global allowed_fails (e.g. 10000), effectively disabling cooldown for this error type.

Expected fix: Add the missing branch in get_allowed_fails_from_policy:

if isinstance(exception, litellm.InternalServerError) and allowed_fails_policy.InternalServerErrorAllowedFails is not None: return allowed_fails_policy.InternalServerErrorAllowedFails

Related: PR #25644 fixed the same class of bug for RetryPolicy.InternalServerErrorRetries in get_num_retries_from_retry_policy(). The PR description stated "AllowedFailsPolicy has no corresponding InternalServerErrorAllowedFails field", but the field does exist in the Pydantic model — so the parallel fix for get_allowed_fails_from_policy was missed.

Steps to Reproduce

  1. Configure proxy with allowed_fails_policy: router_settings: allowed_fails: 10000 allowed_fails_policy: InternalServerErrorAllowedFails: 2 cooldown_time: 60 fallbacks:
    • model-a:
      • modeal-b
  2. Send requests that trigger InternalServerError on model-a
  3. After 2 failures, expect cooldown + fallback to model-b
  4. Observe: cooldown does not trigger because the router uses allowed_fails: 10000 instead of InternalServerErrorAllowedFails: 2

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.85.2

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: AllowedFailsPolicy.InternalServerErrorAllowedFails is silently ignored in get_allowed_fails_from_policy [2 pull requests]