litellm - 💡(How to fix) Fix [Bug]: Cross-end-user budget leak: cached UserAPIKeyAuth retains per-request end_user_max_budget across requests sharing one virtual key [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When multiple end-users share a single virtual key (canonical OpenWebUI pattern: one proxy key + per-user user field on /chat/completions), the proxy's in-memory user_api_key_cache poisons subsequent requests with the first end-user's budget fields. This causes sporadic BudgetExceededError responses for end-users whose true budget is much larger than the cached value — the DB row is correct, but the cached UserAPIKeyAuth carries a stale end_user_max_budget that wins over the freshly-joined DB value during budget reservation.

Symptom: an end-user with LiteLLM_EndUserTable.budget_id pointing at a budget of 20.0 sporadically receives:

BudgetExceededError: Budget has been exceeded! Current cost: 4.08, Max budget: 2.0

…where 2.0 is the budget of a different end-user whose request happened to populate the cache entry earlier.


Error Message

BudgetExceededError: Budget has been exceeded! Current cost: 4.08, Max budget: 2.0

Root Cause

The pipeline does roughly:

  1. _user_api_key_auth_builder looks up the virtual key via get_key_object, which goes through user_api_key_cache.async_get_cache(...) and deserializes the cached payload via UserAPIKeyAuth.model_validate(cached).
  2. Whichever request first populates that cache entry calls _copy_user_api_key_auth_for_cache on a UserAPIKeyAuth that has already been merged with this request's end_user_params — including end_user_max_budget, end_user_tpm_limit, etc.
  3. The copy helper nulls budget_reservation, parent_otel_span, request_route, but not the end_user_* fields, so they get serialized into the cache and re-hydrated on every subsequent request that hits the same virtual key — even for completely different end-users.
  4. On a cache hit, update_valid_token_with_end_user_params writes end_user_id, end_user_tpm_limit, end_user_rpm_limit, allowed_model_region — but not end_user_max_budget or end_user_model_max_budget. So the cached, poisoned values survive.
  5. Later, _get_end_user_budget_counter prefers valid_token.end_user_max_budget over the freshly-joined end_user_object.litellm_budget_table.max_budget — so the stale cached value wins, and the request is rejected against the wrong budget.

The bug is sporadic because it depends on which end-user happens to populate the cache first. After cache TTL expires, the next "first request" reseeds the poisoned value. This matches the user-reported "after a few failing retries, requests finally go through again" pattern (the cache turns over).

Affects single-pod, no-Redis deployments — pure in-memory user_api_key_cache.

Fix Action

Fixed

Code Example

BudgetExceededError: Budget has been exceeded! Current cost: 4.08, Max budget: 2.0

---

def _copy_user_api_key_auth_for_cache(
    user_api_key_obj: UserAPIKeyAuth,
) -> UserAPIKeyAuth:
    copied_key_obj = user_api_key_obj.model_copy()
    copied_key_obj.budget_reservation = None
    copied_key_obj.parent_otel_span = None
    copied_key_obj.request_route = None
    # End-user-derived fields are PER-REQUEST and must NOT be cached against the
    # virtual key, otherwise they leak across end-users that share one key
    # (e.g. OpenWebUI: one proxy key + per-user `user` body field).
    copied_key_obj.end_user_id = None
    copied_key_obj.end_user_tpm_limit = None
    copied_key_obj.end_user_rpm_limit = None
    copied_key_obj.end_user_max_budget = None
    copied_key_obj.end_user_model_max_budget = None
    copied_key_obj.allowed_model_region = None
    return copied_key_obj

---

def update_valid_token_with_end_user_params(
    valid_token: UserAPIKeyAuth, end_user_params: dict
) -> UserAPIKeyAuth:
    # End-user-derived fields are PER-REQUEST. Always overwrite — never `or`-merge —
    # to avoid bleeding values from a previously-cached request that shared this key.
    valid_token.end_user_id = end_user_params.get("end_user_id")
    valid_token.end_user_tpm_limit = end_user_params.get("end_user_tpm_limit")
    valid_token.end_user_rpm_limit = end_user_params.get("end_user_rpm_limit")
    valid_token.allowed_model_region = end_user_params.get("allowed_model_region")
    valid_token.end_user_max_budget = end_user_params.get("end_user_max_budget")
    valid_token.end_user_model_max_budget = end_user_params.get("end_user_model_max_budget")
    return valid_token

---

INSERT INTO "LiteLLM_BudgetTable" (budget_id, max_budget) VALUES ('budget-small', 2.0);
   INSERT INTO "LiteLLM_BudgetTable" (budget_id, max_budget) VALUES ('budget-large', 20.0);
   INSERT INTO "LiteLLM_EndUserTable" (user_id, budget_id) VALUES ('alice', 'budget-small');
   INSERT INTO "LiteLLM_EndUserTable" (user_id, budget_id) VALUES ('bob', 'budget-large');

---

Probe added inside [`_get_end_user_budget_counter`](litellm/proxy/spend_tracking/budget_reservation.py:323) to print both candidate values for the effective max budget:


[DEBUG-eubud2] end_user_id=ffbb11f4-...
               token.end_user_max_budget=2.0
               end_user_object.budget_id=None
               end_user_object.litellm_budget_table.max_budget=20.0
               effective_max_budget=2.0
               end_user_object_present=True


Two surprising things:
- `end_user_object.litellm_budget_table.max_budget=20.0` (the correct, DB-joined value)
- but `token.end_user_max_budget=2.0` (a *different* end-user's value)
- and `effective_max_budget` chose the token value (`2.0`) — i.e., the cached value wins

A second probe in [`LiteLLM_VerificationTokenView.__init__`](litellm/proxy/_types.py:2631) with a stack trace pinned the source:


File "litellm/proxy/auth/user_api_key_auth.py", line 1119, in _user_api_key_auth_builder
    valid_token = await get_key_object(
File "litellm/proxy/auth/auth_checks.py", line 2378, in get_key_object
    user_api_key_auth = await user_api_key_cache.async_get_cache(
File "litellm/proxy/common_utils/user_api_key_cache.py", line 124, in async_get_cache
    decoded = CacheCodec.deserialize(cached, model_type=model_type)
File "litellm/proxy/common_utils/cache_pydantic_utils.py", line 85, in deserialize
    return model_type.model_validate(cached)


i.e., the `end_user_max_budget=2.0` was already present in the cached payload before any end-user lookup ran for this request.

SQL confirmed no DB anomaly: `LiteLLM_EndUserTable.user_id` is the PK; bob's row had the correct `budget_id` and `spend`.
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

When multiple end-users share a single virtual key (canonical OpenWebUI pattern: one proxy key + per-user user field on /chat/completions), the proxy's in-memory user_api_key_cache poisons subsequent requests with the first end-user's budget fields. This causes sporadic BudgetExceededError responses for end-users whose true budget is much larger than the cached value — the DB row is correct, but the cached UserAPIKeyAuth carries a stale end_user_max_budget that wins over the freshly-joined DB value during budget reservation.

Symptom: an end-user with LiteLLM_EndUserTable.budget_id pointing at a budget of 20.0 sporadically receives:

BudgetExceededError: Budget has been exceeded! Current cost: 4.08, Max budget: 2.0

…where 2.0 is the budget of a different end-user whose request happened to populate the cache entry earlier.


Affected versions

  • Introduced in: v1.84.0, commit 6ff668c7aa — PR #27245 "[Infra] Promote internal staging to main"
  • Still present on: litellm_internal_staging (verified as of this report) and current main

The two buggy functions are unchanged from the introducing commit:


Root cause

The pipeline does roughly:

  1. _user_api_key_auth_builder looks up the virtual key via get_key_object, which goes through user_api_key_cache.async_get_cache(...) and deserializes the cached payload via UserAPIKeyAuth.model_validate(cached).
  2. Whichever request first populates that cache entry calls _copy_user_api_key_auth_for_cache on a UserAPIKeyAuth that has already been merged with this request's end_user_params — including end_user_max_budget, end_user_tpm_limit, etc.
  3. The copy helper nulls budget_reservation, parent_otel_span, request_route, but not the end_user_* fields, so they get serialized into the cache and re-hydrated on every subsequent request that hits the same virtual key — even for completely different end-users.
  4. On a cache hit, update_valid_token_with_end_user_params writes end_user_id, end_user_tpm_limit, end_user_rpm_limit, allowed_model_region — but not end_user_max_budget or end_user_model_max_budget. So the cached, poisoned values survive.
  5. Later, _get_end_user_budget_counter prefers valid_token.end_user_max_budget over the freshly-joined end_user_object.litellm_budget_table.max_budget — so the stale cached value wins, and the request is rejected against the wrong budget.

The bug is sporadic because it depends on which end-user happens to populate the cache first. After cache TTL expires, the next "first request" reseeds the poisoned value. This matches the user-reported "after a few failing retries, requests finally go through again" pattern (the cache turns over).

Affects single-pod, no-Redis deployments — pure in-memory user_api_key_cache.

Proposed fix

Two small changes:

1. Strip per-request end-user fields before caching the auth object

litellm/proxy/auth/auth_checks.py_copy_user_api_key_auth_for_cache:

def _copy_user_api_key_auth_for_cache(
    user_api_key_obj: UserAPIKeyAuth,
) -> UserAPIKeyAuth:
    copied_key_obj = user_api_key_obj.model_copy()
    copied_key_obj.budget_reservation = None
    copied_key_obj.parent_otel_span = None
    copied_key_obj.request_route = None
    # End-user-derived fields are PER-REQUEST and must NOT be cached against the
    # virtual key, otherwise they leak across end-users that share one key
    # (e.g. OpenWebUI: one proxy key + per-user `user` body field).
    copied_key_obj.end_user_id = None
    copied_key_obj.end_user_tpm_limit = None
    copied_key_obj.end_user_rpm_limit = None
    copied_key_obj.end_user_max_budget = None
    copied_key_obj.end_user_model_max_budget = None
    copied_key_obj.allowed_model_region = None
    return copied_key_obj

2. Write the full set of end-user fields back onto the per-request token on cache hits

litellm/proxy/auth/user_api_key_auth.pyupdate_valid_token_with_end_user_params:

def update_valid_token_with_end_user_params(
    valid_token: UserAPIKeyAuth, end_user_params: dict
) -> UserAPIKeyAuth:
    # End-user-derived fields are PER-REQUEST. Always overwrite — never `or`-merge —
    # to avoid bleeding values from a previously-cached request that shared this key.
    valid_token.end_user_id = end_user_params.get("end_user_id")
    valid_token.end_user_tpm_limit = end_user_params.get("end_user_tpm_limit")
    valid_token.end_user_rpm_limit = end_user_params.get("end_user_rpm_limit")
    valid_token.allowed_model_region = end_user_params.get("allowed_model_region")
    valid_token.end_user_max_budget = end_user_params.get("end_user_max_budget")
    valid_token.end_user_model_max_budget = end_user_params.get("end_user_model_max_budget")
    return valid_token

Both changes have been validated locally in a our fork; the production symptom disappears immediately after deploy.


Suggested regression test

A test that:

  1. Creates one virtual key, two end-users with different budget_ids (max budgets 2.0 and 20.0).
  2. Calls user_api_key_auth twice with the same key but different user body values, in order: first the 2.0-budget end-user (to seed the cache), then the 20.0-budget end-user.
  3. Asserts the second call's resolved UserAPIKeyAuth.end_user_max_budget == 20.0 (or that the budget reservation uses 20.0, not 2.0).

This would have caught the regression introduced in #27245.

Steps to Reproduce

Prereqs: single proxy pod (no Redis), one virtual key, two end-users with different per-end-user budgets.

  1. Create a virtual key with no end-user constraints.
  2. Create two budgets and two end-users:
    INSERT INTO "LiteLLM_BudgetTable" (budget_id, max_budget) VALUES ('budget-small', 2.0);
    INSERT INTO "LiteLLM_BudgetTable" (budget_id, max_budget) VALUES ('budget-large', 20.0);
    INSERT INTO "LiteLLM_EndUserTable" (user_id, budget_id) VALUES ('alice', 'budget-small');
    INSERT INTO "LiteLLM_EndUserTable" (user_id, budget_id) VALUES ('bob', 'budget-large');
  3. From a fresh proxy boot, send a /chat/completions request with the shared virtual key and "user": "alice". Drive alice's spend to ~1.9.
  4. Now send a request with the same key and "user": "bob".
  5. Observe: bob is checked against max_budget=2.0 (Alice's), not 20.0 (his own).

Equivalently, the UserAPIKeyAuth returned by get_key_object for bob's request will have end_user_id="bob" but end_user_max_budget=2.0.

Relevant log output

Probe added inside [`_get_end_user_budget_counter`](litellm/proxy/spend_tracking/budget_reservation.py:323) to print both candidate values for the effective max budget:


[DEBUG-eubud2] end_user_id=ffbb11f4-...
               token.end_user_max_budget=2.0
               end_user_object.budget_id=None
               end_user_object.litellm_budget_table.max_budget=20.0
               effective_max_budget=2.0
               end_user_object_present=True


Two surprising things:
- `end_user_object.litellm_budget_table.max_budget=20.0` (the correct, DB-joined value)
- but `token.end_user_max_budget=2.0` (a *different* end-user's value)
- and `effective_max_budget` chose the token value (`2.0`) — i.e., the cached value wins

A second probe in [`LiteLLM_VerificationTokenView.__init__`](litellm/proxy/_types.py:2631) with a stack trace pinned the source:


File "litellm/proxy/auth/user_api_key_auth.py", line 1119, in _user_api_key_auth_builder
    valid_token = await get_key_object(
File "litellm/proxy/auth/auth_checks.py", line 2378, in get_key_object
    user_api_key_auth = await user_api_key_cache.async_get_cache(
File "litellm/proxy/common_utils/user_api_key_cache.py", line 124, in async_get_cache
    decoded = CacheCodec.deserialize(cached, model_type=model_type)
File "litellm/proxy/common_utils/cache_pydantic_utils.py", line 85, in deserialize
    return model_type.model_validate(cached)


i.e., the `end_user_max_budget=2.0` was already present in the cached payload before any end-user lookup ran for this request.

SQL confirmed no DB anomaly: `LiteLLM_EndUserTable.user_id` is the PK; bob's row had the correct `budget_id` and `spend`.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.85.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING