litellm - ✅(Solved) Fix [Feature]: Expose --limit-max-requests-jitter flag of uvicorn with LiteLLM [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24401Fetched 2026-04-08 01:17:57
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×2cross-referenced ×1

Fix Action

Fix / Workaround

Exposing Uvicorn’s --limit-max-requests-jitter would allow LiteLLM users to better control worker lifecycle behavior under load, leading to improved resilience and smoother traffic handling without requiring external workarounds.

PR fix notes

PR #24405: Add Support for Jitter

Description (problem / solution / changelog)

Relevant issues

Fixes #24401

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Includes the upstream uvicorn flag included with this PR https://github.com/Kludex/uvicorn/pull/2707

Changed files

  • enterprise/litellm_enterprise/enterprise_callbacks/callback_controls.py (modified, +85/-54)
  • enterprise/litellm_enterprise/enterprise_callbacks/send_emails/base_email.py (modified, +3/-3)
  • enterprise/litellm_enterprise/enterprise_callbacks/send_emails/sendgrid_email.py (modified, +1/-1)
  • enterprise/litellm_enterprise/proxy/auth/__init__.py (modified, +1/-1)
  • enterprise/litellm_enterprise/proxy/auth/custom_sso_handler.py (modified, +22/-15)
  • enterprise/litellm_enterprise/proxy/common_utils/check_batch_cost.py (modified, +45/-20)
  • enterprise/litellm_enterprise/proxy/common_utils/check_responses_cost.py (modified, +29/-13)
  • enterprise/litellm_enterprise/proxy/hooks/managed_files.py (modified, +149/-102)
  • enterprise/litellm_enterprise/proxy/hooks/managed_vector_stores.py (modified, +44/-48)
  • enterprise/litellm_enterprise/proxy/management_endpoints/key_management_endpoints.py (modified, +0/-1)
  • enterprise/litellm_enterprise/proxy/vector_stores/endpoints.py (modified, +3/-3)
  • enterprise/litellm_enterprise/types/enterprise_callbacks/send_emails.py (modified, +11/-1)
  • litellm/proxy/management_helpers/audit_logs.py (modified, +9/-5)
  • litellm/proxy/proxy_cli.py (modified, +29/-3)
  • pyproject.toml (modified, +1/-1)
  • requirements.txt (modified, +1/-1)
  • tests/test_litellm/proxy/test_proxy_cli.py (modified, +149/-0)
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

LiteLLM currently supports restarting workers after a fixed number of requests. However, under high load, this can lead to multiple workers restarting at the same time, causing potential availability issues.

To mitigate this, it would be useful to expose Uvicorn’s --limit-max-requests-jitter configuration so that worker restarts can be staggered, reducing the risk of simultaneous restarts and improving overall system stability.

Motivation, pitch

In production environments, especially under sustained or bursty traffic, using a fixed max_requests limit can cause multiple workers to hit the threshold at nearly the same time. This leads to synchronized restarts, which temporarily reduces available capacity and can result in increased latency or request failures.

Introducing jitter to the restart threshold helps stagger worker restarts, ensuring that not all workers are recycled simultaneously. This pattern is commonly used in distributed systems to avoid coordinated behavior that can amplify instability.

Exposing Uvicorn’s --limit-max-requests-jitter would allow LiteLLM users to better control worker lifecycle behavior under load, leading to improved resilience and smoother traffic handling without requiring external workarounds.

What part of LiteLLM is this about?

SDK (litellm Python package)

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Twitter / LinkedIn details

https://linkedin.com/in/iamraghavawasthi

extent analysis

Fix Plan

To address the issue, we need to expose Uvicorn's --limit-max-requests-jitter configuration in LiteLLM. Here are the steps:

  • Update the LiteLLM configuration to include a max_requests_jitter parameter.
  • Pass this parameter to Uvicorn when starting the workers.

Example code:

import uvicorn

# ...

def start_workers(max_requests, max_requests_jitter):
    # ...
    uvicorn.run(
        # ...
        limit_max_requests=max_requests,
        limit_max_requests_jitter=max_requests_jitter,
    )

# ...
  • Add a command-line argument or configuration option to allow users to set the max_requests_jitter value.

Example code:

import argparse

# ...

parser = argparse.ArgumentParser()
parser.add_argument("--max-requests-jitter", type=int, default=0)
args = parser.parse_args()

# ...
start_workers(max_requests=100, max_requests_jitter=args.max_requests_jitter)

Verification

To verify that the fix worked, you can:

  • Start multiple workers with the same max_requests value and a non-zero max_requests_jitter value.
  • Send a large number of requests to the workers and observe that the restarts are staggered.

Extra Tips

  • Make sure to document the new configuration option and its effects on worker behavior.
  • Consider adding a default value for max_requests_jitter to ensure that users get some level of protection against synchronized restarts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING