litellm - ✅(Solved) Fix [Bug]: There is a bug in least_busy.py that causes traffic to some interfaces to be suppressed to zero. [1 pull requests, 1 participants]

litellm2026-04-08 05:27:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25323•Fetched 2026-04-09 07:52:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

JTCT-chen

Participants

JTCT-chen

Timeline (top)

cross-referenced ×2labeled ×2referenced ×2

Fix Action

Fixed

Fixed by PR: fix(router): clamp least_busy request counter to prevent negative drift (https://github.com/BerriAI/litellm/pull/25325)

PR fix notes

PR #25325: fix(router): clamp least_busy request counter to prevent negative drift

Repository: BerriAI/litellm
Author: rudra717
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/25325

Description (problem / solution / changelog)

Summary

Fixes the least_busy routing strategy silently suppressing traffic to certain deployments by clamping the per-deployment request counter to prevent negative values.

Root Cause

The request counter is incremented in log_pre_api_call and decremented in success/failure callbacks. Under race conditions (callback fires before pre-call, or fires twice), the counter goes negative.

A negative count is always less than the 0 assigned to fresh/unused deployments in _get_available_deployments, so the negative-count deployment wins every comparison and attracts ALL traffic, while others gradually starve to zero.

Fix

Added max(value - 1, 0) on all 4 decrement paths:

log_success_event (sync)
log_failure_event (sync)
async_log_success_event
async_log_failure_event

This ensures the counter never goes below 0, preventing any single deployment from monopolizing traffic.

Testing

The fix is a defensive floor guard on an integer counter. Existing routing tests validate the least_busy selection logic.

Disclaimer

AI agents (Claude Code) assisted with this contribution.

Fixes #25323

Changed files

litellm/router_strategy/least_busy.py (modified, +4/-4)
tests/test_litellm/test_least_busy_counter_clamp.py (added, +140/-0)

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

There are four model services under my LiteLLM instance, but traffic to certain endpoints gradually drops to zero and eventually receives almost no further requests.

Steps to Reproduce

def _get_available_deployments( self, healthy_deployments: list, all_deployments: dict, ): """ Helper to get deployments using least busy strategy """ for d in healthy_deployments: ## if healthy deployment not yet used if d["model_info"]["id"] not in all_deployments: all_deployments[d["model_info"]["id"]] = 0 # map deployment to id # pick least busy deployment, with random jitter on ties min_traffic = float("inf") for k, v in all_deployments.items(): if v < min_traffic: min_traffic = v # collect all deployments tied at the minimum min_deployment_ids = [k for k, v in all_deployments.items() if v == min_traffic] if min_deployment_ids: chosen_id = random.choice(min_deployment_ids) for m in healthy_deployments: if m["model_info"]["id"] == chosen_id: return m # chosen_id not in healthy list, fall back return random.choice(healthy_deployments) else: return random.choice(healthy_deployments)

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.82.6

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue with traffic dropping to zero for certain endpoints may be related to the least busy strategy in the _get_available_deployments method, which could be causing an uneven distribution of requests.

Guidance

Review the _get_available_deployments method to ensure it's correctly implementing the least busy strategy, considering the potential for deployments to be tied at the minimum traffic level.
Verify that the all_deployments dictionary is being updated correctly, as the issue may be related to the way deployments are being mapped to their respective traffic levels.
Check if the healthy_deployments list is being populated correctly, as the method relies on this list to determine the available deployments.
Consider adding logging or monitoring to track the traffic levels for each deployment, to better understand how the traffic is being distributed.

Example

# Example of how to add logging to track traffic levels
import logging

def _get_available_deployments(
        self,
        healthy_deployments: list,
        all_deployments: dict,
    ):
    logging.info(f"Healthy deployments: {healthy_deployments}")
    logging.info(f"All deployments: {all_deployments}")
    # ... rest of the method ...

Notes

The provided code snippet seems to be a part of the LiteLLM SDK, and the issue may be related to the specific implementation of the least busy strategy. Without more information about the traffic patterns and the deployment configurations, it's difficult to provide a more specific solution.

Recommendation

Apply workaround: Modify the _get_available_deployments method to include additional logging and monitoring to track the traffic levels for each deployment, to better understand how the traffic is being distributed. This will help identify the root cause of the issue and provide more insight into the behavior of the least busy strategy.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory management #API rate limit #retriever error #indexing error #inference speed

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: There is a bug in least_busy.py that causes traffic to some interfaces to be suppressed to zero. [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #25325: fix(router): clamp least_busy request counter to prevent negative drift

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

Disclaimer

Changed files

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: There is a bug in least_busy.py that causes traffic to some interfaces to be suppressed to zero. [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #25325: fix(router): clamp least_busy request counter to prevent negative drift

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Testing

Disclaimer

Changed files

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING