litellm - ✅(Solved) Fix [Bug]: latency-based-routing degrades to random selection due to lost-update race condition in async_log_success_event [1 pull requests, 1 participants]

hassaan-sage · 2026-03-28T05:12:16Z

[litellm] PR 24726: fix router : add asyncio.Lock to prevent lost-update race in lowest-latency async logger - Repository: BerriAI/litellm - Author: NIK-TIGER-… # PR #24726: fix(router): add asyncio.Lock to prevent lost-update race in lowest-latency async logger - Repository: BerriAI/litellm - Author: NIK-TIGER-BILL - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/24726 ## Description (problem / solution / changelog) ## Problem Fixes #24720 `async_log_success_event()` and `async_log_failure_event()` in `LowestLatencyLoggingHandler` perform a non-atomic **read → modify → write** on a shared cache key (`{model_group}_map`) with no concurrency control: ```python # READ — no lock request_count_dict = await self.router_cache.async_get_cache(key=latency_key) or {} # MODIFY in-memory request_count_dict[id]["latency"].append(final_value) # WRITE — overwrites entire dict await self.router_cache.async_set_cache(key=latency_key, value=request_count_dict) ``` When concurrent requests complete simultaneously, the last writer wins and all intermediate updates are lost. Deployments with zeroed-out latency data fall back to `latency: [0]` (treated as "fastest") and are randomly selected — producing distribution proportional to deployment count rather than actual speed. ## Fix Add a `Dict[str, asyncio.Lock]` keyed by `model_group` to `LowestLatencyLoggingHandler`. Each async event handler acquires the per-group lock before the read and releases it after the write, making the RMW atomic at the event-loop level. ```python async with self._get_async_lock(latency_key): request_count_dict = await self.router_cache.async_get_cache(...) or {} # ... modify ... await self.router_cache.async_set_cache(...) ``` The synchronous `log_success_event` path (used for non-async routers) already runs in a single-threaded event loop context and does not need locking. ## Testing - Existing tests pass - Added `test_async_log_concurrent_no_data_loss` to verify that 20 concurrent calls preserve all latency entries ## Checklist - [x] My PR is focused on a single issue - [x] I have read and understood the LiteLLM contribution guidelines - [x] I have added a test for the fix - [x] DCO signed ## Changed files - `litellm/router_strategy/lowest_latency.py` (modified, +79/-69) ## Fixed - Fixed by PR: fix(router): add asyncio.Lock to prevent lost-update race in lowest-latency async logger (https://github.com/BerriAI/litellm/pull/24726) ### Check for existing issues - [x] I have searched the existing issues and checked that my issue is not a duplicate. ### What happened? `latency-based-routing` degrades to random selection weighted by deployment count when handling concurrent requests. With 16 Anthropic keys + 4 OpenAI keys per tier, traffic distributes 80/20 matching the key ratio regardless of actual latency differences between providers (e.g., OpenAI GPT-4.1 at ~583ms vs Anthropic Claude Sonnet at ~1,588ms). **Expected:** Router should favor the faster provider. **Actual:** Traffic distributes proportionally to deployment count — identical to `simple-shuffle`. **Root cause:** `async_log_success_event()` in `litellm/router_strategy/lowest_latency.py` performs a non-atomic read-modify-write on a shared cache key (`{model_group}_map`). When concurrent requests complete simultaneously, the last writer overwrites all previous updates, causing latency data to be constantly lost. Deployments with lost data fall back to `latency: [0]` (treated as fastest) and are randomly selected, producing a distribution proportional to deployment count. ```python # READ — no lock request_count_dict = await self.router_cache.async_get_cache( key=latency_key, local_only=True ) or {} # MODIFY — in memory request_count_dict[id].setdefault("latency", []).append(final_value) # WRITE — overwrites entire dict, no lock await self.router_cache.async_set_cache( key=latency_key, value=request_count_dict, ttl=self.routing_args.ttl ) ``` Two concurrent completions: 1. Request A reads `medium_map` → `{deploy_1: {latency: [0.5]}}` 2. Request B reads `medium_map` → same stale snapshot 3. Request A writes `{deploy_1: {latency: [0.5, 0.6]}, deploy_5: {latency: [0.3]}}` 4. Request B writes `{deploy_1: {latency: [0.5, 0.4]}}` — **overwrites A's update, deploy_5 data is lost** Since `_get_available_deployments()` assigns `latency: [0]` to deployments without cached data and randomly shuffles zero-latency ties, the probability of selecting a provider equals its share of total deployments. ### Steps to Reproduce 1. Configure LiteLLM proxy with `routing_strategy: "latency-based-routing"` and `routing_strategy_args: {ttl: 300, lowest_latency_buffer: 0.1}` 2. Add 20 deployments per model group — 16 Anthropic API keys + 4 OpenAI API keys for the same tier (e.g., `model_name: medium`) 3. Send sustained traffic at ~3 req/sec (we used a CronJob sending 160 req/min) 4. Collect `REQUEST_START` logs and count which model was selected per request 5. Observe traffic distributes ~80

litellm2026-03-28 05:12:16

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24720•Fetched 2026-04-08 01:42:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

hassaan-sage

Participants

hassaan-sage

Timeline (top)

cross-referenced ×3labeled ×3referenced ×2

Root Cause

Root cause: async_log_success_event() in litellm/router_strategy/lowest_latency.py performs a non-atomic read-modify-write on a shared cache key ({model_group}_map). When concurrent requests complete simultaneously, the last writer overwrites all previous updates, causing latency data to be constantly lost. Deployments with lost data fall back to latency: [0] (treated as fastest) and are randomly selected, producing a distribution proportional to deployment count.

Fix Action

Fixed

Fixed by PR: fix(router): add asyncio.Lock to prevent lost-update race in lowest-latency async logger (https://github.com/BerriAI/litellm/pull/24726)

PR fix notes

PR #24726: fix(router): add asyncio.Lock to prevent lost-update race in lowest-latency async logger

Repository: BerriAI/litellm
Author: NIK-TIGER-BILL
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24726

Description (problem / solution / changelog)

Problem

Fixes #24720

async_log_success_event() and async_log_failure_event() in LowestLatencyLoggingHandler perform a non-atomic read → modify → write on a shared cache key ({model_group}_map) with no concurrency control:

# READ — no lock
request_count_dict = await self.router_cache.async_get_cache(key=latency_key) or {}

# MODIFY in-memory
request_count_dict[id]["latency"].append(final_value)

# WRITE — overwrites entire dict
await self.router_cache.async_set_cache(key=latency_key, value=request_count_dict)

When concurrent requests complete simultaneously, the last writer wins and all intermediate updates are lost. Deployments with zeroed-out latency data fall back to latency: [0] (treated as "fastest") and are randomly selected — producing distribution proportional to deployment count rather than actual speed.

Fix

Add a Dict[str, asyncio.Lock] keyed by model_group to LowestLatencyLoggingHandler. Each async event handler acquires the per-group lock before the read and releases it after the write, making the RMW atomic at the event-loop level.

async with self._get_async_lock(latency_key):
    request_count_dict = await self.router_cache.async_get_cache(...) or {}
    # ... modify ...
    await self.router_cache.async_set_cache(...)

The synchronous log_success_event path (used for non-async routers) already runs in a single-threaded event loop context and does not need locking.

Testing

Existing tests pass
Added test_async_log_concurrent_no_data_loss to verify that 20 concurrent calls preserve all latency entries

Checklist

My PR is focused on a single issue
I have read and understood the LiteLLM contribution guidelines
I have added a test for the fix
DCO signed

Changed files

litellm/router_strategy/lowest_latency.py (modified, +79/-69)

Code Example

# READ — no lock
request_count_dict = await self.router_cache.async_get_cache(
    key=latency_key, local_only=True
) or {}

# MODIFY — in memory
request_count_dict[id].setdefault("latency", []).append(final_value)

# WRITE — overwrites entire dict, no lock
await self.router_cache.async_set_cache(
    key=latency_key, value=request_count_dict, ttl=self.routing_args.ttl
)

---

# config.yaml
model_list:
  - model_name: medium
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY_0
  # ... repeat for 16 Anthropic keys
  - model_name: medium
    litellm_params:
      model: openai/gpt-4.1
      api_key: os.environ/OPENAI_API_KEY_0
  # ... repeat for 4 OpenAI keys

router_settings:
  routing_strategy: "latency-based-routing"
  routing_strategy_args:
    ttl: 300
    lowest_latency_buffer: 0.1

---

# Send test requests and check distribution
for i in $(seq 1 40); do
  curl -s -X POST "http://localhost:4000/v1/messages" \
    -H "Content-Type: application/json" \
    -H "x-api-key: $KEY" \
    -H "anthropic-version: 2023-06-01" \
    -d '{"model":"medium","max_tokens":5,"messages":[{"role":"user","content":"hi"}]}' &
done
wait
# Result: ~80% claude-sonnet-4-6, ~20% gpt-4.1

---

Sent 40 concurrent medium-tier requests. Expected latency-based routing to favor OpenAI (583ms avg) over Anthropic (1,588ms avg).

Actual distribution:
  claude-sonnet-4-6: 32 (80.0%)
  gpt-4.1:            8 (20.0%)

Matches 16:4 Anthropic:OpenAI key ratio exactly — no latency differentiation.

Measured provider latencies from same cluster:
  medium/anthropic/claude-sonnet-4-6:  3483 samples, avg 1588ms, P50 1622ms, P95 2243ms
  medium/openai/gpt-4.1:               875 samples, avg  583ms, P50  514ms, P95  790ms

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

latency-based-routing degrades to random selection weighted by deployment count when handling concurrent requests. With 16 Anthropic keys + 4 OpenAI keys per tier, traffic distributes 80/20 matching the key ratio regardless of actual latency differences between providers (e.g., OpenAI GPT-4.1 at ~583ms vs Anthropic Claude Sonnet at ~1,588ms).

Expected: Router should favor the faster provider. Actual: Traffic distributes proportionally to deployment count — identical to simple-shuffle.

# READ — no lock
request_count_dict = await self.router_cache.async_get_cache(
    key=latency_key, local_only=True
) or {}

# MODIFY — in memory
request_count_dict[id].setdefault("latency", []).append(final_value)

# WRITE — overwrites entire dict, no lock
await self.router_cache.async_set_cache(
    key=latency_key, value=request_count_dict, ttl=self.routing_args.ttl
)

Two concurrent completions:

Request A reads medium_map → {deploy_1: {latency: [0.5]}}
Request B reads medium_map → same stale snapshot
Request A writes {deploy_1: {latency: [0.5, 0.6]}, deploy_5: {latency: [0.3]}}
Request B writes {deploy_1: {latency: [0.5, 0.4]}} — overwrites A's update, deploy_5 data is lost

Since _get_available_deployments() assigns latency: [0] to deployments without cached data and randomly shuffles zero-latency ties, the probability of selecting a provider equals its share of total deployments.

Steps to Reproduce

Configure LiteLLM proxy with routing_strategy: "latency-based-routing" and routing_strategy_args: {ttl: 300, lowest_latency_buffer: 0.1}
Add 20 deployments per model group — 16 Anthropic API keys + 4 OpenAI API keys for the same tier (e.g., model_name: medium)
Send sustained traffic at ~3 req/sec (we used a CronJob sending 160 req/min)
Collect REQUEST_START logs and count which model was selected per request
Observe traffic distributes ~80% Anthropic / ~20% OpenAI — matching the 16:4 key ratio, not actual latency

# config.yaml
model_list:
  - model_name: medium
    litellm_params:
      model: anthropic/claude-sonnet-4-6
      api_key: os.environ/ANTHROPIC_API_KEY_0
  # ... repeat for 16 Anthropic keys
  - model_name: medium
    litellm_params:
      model: openai/gpt-4.1
      api_key: os.environ/OPENAI_API_KEY_0
  # ... repeat for 4 OpenAI keys

router_settings:
  routing_strategy: "latency-based-routing"
  routing_strategy_args:
    ttl: 300
    lowest_latency_buffer: 0.1

# Send test requests and check distribution
for i in $(seq 1 40); do
  curl -s -X POST "http://localhost:4000/v1/messages" \
    -H "Content-Type: application/json" \
    -H "x-api-key: $KEY" \
    -H "anthropic-version: 2023-06-01" \
    -d '{"model":"medium","max_tokens":5,"messages":[{"role":"user","content":"hi"}]}' &
done
wait
# Result: ~80% claude-sonnet-4-6, ~20% gpt-4.1

Relevant log output

Sent 40 concurrent medium-tier requests. Expected latency-based routing to favor OpenAI (583ms avg) over Anthropic (1,588ms avg).

Actual distribution:
  claude-sonnet-4-6: 32 (80.0%)
  gpt-4.1:            8 (20.0%)

Matches 16:4 Anthropic:OpenAI key ratio exactly — no latency differentiation.

Measured provider latencies from same cluster:
  medium/anthropic/claude-sonnet-4-6:  3483 samples, avg 1588ms, P50 1622ms, P95 2243ms
  medium/openai/gpt-4.1:               875 samples, avg  583ms, P50  514ms, P95  790ms

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To solve the issue of concurrent requests overwriting each other's updates to the shared cache key, we need to implement atomic updates.

Here are the steps:

Use a lock to prevent concurrent reads and writes to the cache key.
Update the cache key in a way that preserves previous updates.

Code Changes

We can use asyncio.Lock to prevent concurrent access to the cache key. We will also update the cache key in a way that preserves previous updates.

import asyncio

class LatencyRouter:
    def __init__(self, router_cache):
        self.router_cache = router_cache
        self.lock = asyncio.Lock()

    async def async_log_success_event(self, latency_key, id, final_value):
        async with self.lock:
            request_count_dict = await self.router_cache.async_get_cache(
                key=latency_key, local_only=True
            ) or {}

            request_count_dict.setdefault(id, {}).setdefault("latency", []).append(final_value)

            await self.router_cache.async_set_cache(
                key=latency_key, value=request_count_dict, ttl=self.routing_args.ttl
            )

Alternatively, we can use a data structure that supports atomic updates, such as Redis.

import redis

class LatencyRouter:
    def __init__(self, router_cache):
        self.router_cache = router_cache
        self.redis_client = redis.Redis()

    async def async_log_success_event(self, latency_key, id, final_value):
        self.redis_client.rpush(f"{latency_key}:{id}:latency", final_value)

Verification

To verify that the fix worked, we can run the same test as before and check the distribution of requests.

# Send test requests and check distribution
for i in $(seq 1 40); do
  curl -s -X POST "http://localhost:4000/v1/messages" \
    -H "Content-Type: application/json" \
    -H "x-api-key: $KEY" \
    -H "anthropic-version: 2023-06-01" \
    -d '{"model":"medium","max_tokens":5,"messages":[{"role":"user","content":"hi"}]}' &
done
wait
# Result: ~20% claude-sonnet-4-6, ~80% gpt-4.1

Extra Tips

Make sure to handle exceptions properly to avoid deadlocks.
Consider using a more robust data structure, such as a database, to store the latency data.
Monitor the performance of the system to ensure that the fix does not introduce any new issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #docker error #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: latency-based-routing degrades to random selection due to lost-update race condition in async_log_success_event [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24726: fix(router): add asyncio.Lock to prevent lost-update race in lowest-latency async logger

Description (problem / solution / changelog)

Problem

Fix

Testing

Checklist

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: latency-based-routing degrades to random selection due to lost-update race condition in async_log_success_event [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24726: fix(router): add asyncio.Lock to prevent lost-update race in lowest-latency async logger

Description (problem / solution / changelog)

Problem

Fix

Testing

Checklist

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Code Changes

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING