litellm - ✅(Solved) Fix [Bug]: Redis user_api_key_cache deserialization always fails for team-scoped keys (LiteLLM_UserTable.user_id required) [1 pull requests, 1 participants]

aivong-openhands · 2026-05-13T22:18:12Z

[litellm] After upgrading to a build that includes 26202 https://github.com/BerriAI/litellm/pull/26202 the CacheCodec token-verification query optimization , t… After upgrading to a build that includes [#26202](https://github.com/BerriAI/litellm/pull/26202) (the `CacheCodec` token-verification query optimization), the Redis-backed `user_api_key_cache` emits a continuous stream of `ERROR`-level deserialization failures for **team-scoped** virtual keys. Functionally requests still succeed (the codec correctly treats a `ValidationError` as a cache miss and falls through to the DB), but every team-scoped lookup pays an extra DB round-trip and the logs are very noisy. # PR #26202: Litellm token verification query optimization - Repository: BerriAI/litellm - Author: harish-berri - State: closed | merged: True - Link: https://github.com/BerriAI/litellm/pull/26202 ## Description (problem / solution / changelog) ## Relevant issues ## Problem In **Kubernetes** or any **multi-replica** deployment, **per-process memory cache is not shared**. Requests can hit **different pods**, so **in-memory-only** `user_api_key_cache` does **not** deduplicate work across nodes and can yield **inconsistent** views of the same key. Using **Redis** with **DualCache** shares cached key metadata across replicas and **reduces redundant database lookups**—**if** serialized values are **JSON-safe** on write and reads **rehydrate** to the types the proxy expects (e.g. `UserAPIKeyAuth`). This can be amplified if each pod has multiple workers and since each worker is a subprocess , they dont share memory and the number of db calls in this path is ```num_pods X num_workers_in_each_pod X cache_miss_ratio```. Without a single encode/decode rule, **Redis** often returns a **`dict`** after JSON round-trip while **memory** may hold **`BaseModel`** instances. Code that assumes attributes (e.g. `.spend`) then fails with **`AttributeError`**, and spend/cache updates become fragile. ## Approach **`CacheCodec`** (`litellm/proxy/common_utils/cache_pydantic_utils.py`) centralizes the boundary: - **`serialize`** — validate + **`model_dump(mode="json", exclude_none=True)`** when writing (aligned with Redis / `json.dumps`). - **`deserialize`** — **`model_validate`** on **`dict`** hits; **`ValidationError`** → treat as miss (**`None`**) + **warning** log (no bad data served). **Scope note:** **`CacheCodec` is for Pydantic `BaseModel` + `dict` only**; stdlib **`dataclasses`** are **not** supported—convert at the call site (e.g. `dataclasses.asdict`) or use a Pydantic model. ## Pre-Submission checklist **Please complete all items before asking a LiteLLM maintainer to review your PR** - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) - [ ] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [ ] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Delays in PR merge? If you're seeing a delay in your PR being merged, ping the LiteLLM Team on [Slack (#pr-review)](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA). ## CI (LiteLLM team) > **CI status guideline:** > > - 50-55 passing tests: main is stable with minor issues. > - 45-49 passing tests: acceptable but needs attention > - <= 40 passing tests: unstable; be careful with your merges and assess the risk. - [ ] **Branch creation CI run** Link: - [ ] **CI run for the last commit** Link: - [ ] **Merge / cherry-pick CI run** Links: ## Screenshots / Proof of Fix ## Type 🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test ## Changes ## Code changes (summary) | Area | Change | |------|--------| | `cache_pydantic_utils.py` | `CacheCodec.serialize` / `deserialize`, validation-on-read behavior, dataclass disclaimer in docstrings | | `proxy_server.py` | `_update_key_cache`: `deserialize` before attribute access; `serialize(..., UserAPIKeyAuth)` for pipeline writes | | `auth_checks.py` (and related) | Use `CacheCodec` when reading/writing cached key/team/project objects | | Caching / auth guards | Avoid **`None`** Redis keys and noisy errors where payloads are missing (e.g. team id / team object) | | Tests | `tests/test_litell

litellm2026-05-13 22:18:12

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#27874•Fetched 2026-05-14 03:30:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

aivong-openhands

Participants

aivong-openhands

After upgrading to a build that includes #26202 (the CacheCodec token-verification query optimization), the Redis-backed user_api_key_cache emits a continuous stream of ERROR-level deserialization failures for team-scoped virtual keys. Functionally requests still succeed (the codec correctly treats a ValidationError as a cache miss and falls through to the DB), but every team-scoped lookup pays an extra DB round-trip and the logs are very noisy.

Error Message

Observability: ERROR-level logs at high cardinality; pollutes log aggregation / alerting.

Root Cause

Root cause (likely)

PR fix notes

PR #26202: Litellm token verification query optimization

Repository: BerriAI/litellm
Author: harish-berri
State: closed | merged: True
Link: https://github.com/BerriAI/litellm/pull/26202

Description (problem / solution / changelog)

Relevant issues

Problem

In Kubernetes or any multi-replica deployment, per-process memory cache is not shared. Requests can hit different pods, so in-memory-only user_api_key_cache does not deduplicate work across nodes and can yield inconsistent views of the same key.

Using Redis with DualCache shares cached key metadata across replicas and reduces redundant database lookups—if serialized values are JSON-safe on write and reads rehydrate to the types the proxy expects (e.g. UserAPIKeyAuth).

This can be amplified if each pod has multiple workers and since each worker is a subprocess , they dont share memory and the number of db calls in this path is num_pods X num_workers_in_each_pod X cache_miss_ratio.

Without a single encode/decode rule, Redis often returns a dict after JSON round-trip while memory may hold BaseModel instances. Code that assumes attributes (e.g. .spend) then fails with AttributeError, and spend/cache updates become fragile.

Approach

CacheCodec (litellm/proxy/common_utils/cache_pydantic_utils.py) centralizes the boundary:

serialize — validate + model_dump(mode="json", exclude_none=True) when writing (aligned with Redis / json.dumps).
deserialize — model_validate on dict hits; ValidationError → treat as miss (None) + warning log (no bad data served).

Scope note: CacheCodec is for Pydantic BaseModel + dict only; stdlib dataclasses are not supported—convert at the call site (e.g. dataclasses.asdict) or use a Pydantic model.

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Type

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Code changes (summary)

Area	Change
`cache_pydantic_utils.py`	`CacheCodec.serialize` / `deserialize`, validation-on-read behavior, dataclass disclaimer in docstrings
`proxy_server.py`	`_update_key_cache`: `deserialize` before attribute access; `serialize(..., UserAPIKeyAuth)` for pipeline writes
`auth_checks.py` (and related)	Use `CacheCodec` when reading/writing cached key/team/project objects
Caching / auth guards	Avoid `None` Redis keys and noisy errors where payloads are missing (e.g. team id / team object)
Tests	`tests/test_litellm/proxy/common_utils/test_cache_codec.py`

Testing

Status	Item
Done	Unit tests for `CacheCodec` in `tests/test_litellm/`
DONE	End-to-end: Proxy with Redis-backed `user_api_key_cache`, multiple replicas (or workers), verify auth + spend paths, no `dict` vs model errors in logs
*DONE	One efficacy check: Prove Redis round-trip—write JSON-safe payload, read from another process / cold memory, assert behavior matches single-node baseline (e.g. key spend update or `get_key_object`)

Load Test (4 Pods)

Without Redis — combined view only (`SELECT v.* … FROM "LiteLLM_VerificationToken" AS v …`)

user_api_key_cache in-memory TTL: 60 seconds (UserAPIKeyCacheTTLEnum; no general_settings.user_api_key_cache_ttl override unless you set one).

Window	calls	total_ms	avg_ms	rows
5 min	1249	269.35	0.22	1249
10 min	1858	394.99	0.21	1858

With Redis — combined view only (same query)

With Redis enabled, the same default 60s TTL is applied to both in-memory and Redis when user_api_key_cache_ttl is not set (proxy_server mirrors TTL on both layers).

Window	calls	total_ms	avg_ms	rows
5 min	52	35.36	0.68	51
10 min	93	35.36	0.68	51

Locust Results

15 different virtual keys
200 concurrent requests, with a 10 second ramp up
The left is without the redis cache and the right is with the cache
8 instances / pods of litellm run

Changed files

.gitignore (modified, +2/-2)
litellm/caching/dual_cache.py (modified, +19/-0)
litellm/caching/redis_cache.py (modified, +9/-1)
litellm/integrations/prometheus.py (modified, +1/-0)
litellm/proxy/auth/auth_checks.py (modified, +165/-133)
litellm/proxy/auth/handle_jwt.py (modified, +16/-12)
litellm/proxy/auth/user_api_key_auth.py (modified, +35/-23)
litellm/proxy/common_utils/cache_coordinator.py (modified, +22/-3)
litellm/proxy/common_utils/cache_pydantic_utils.py (added, +93/-0)
litellm/proxy/common_utils/expired_ui_session_key_cleanup_manager.py (modified, +2/-2)
litellm/proxy/common_utils/user_api_key_cache.py (added, +162/-0)
litellm/proxy/management_endpoints/access_group_endpoints.py (modified, +9/-11)
litellm/proxy/management_endpoints/key_management_endpoints.py (modified, +10/-10)
litellm/proxy/management_endpoints/ui_sso.py (modified, +11/-10)
litellm/proxy/management_helpers/team_member_permission_checks.py (modified, +2/-2)
litellm/proxy/proxy_server.py (modified, +124/-81)
litellm/proxy/utils.py (modified, +3/-2)
tests/proxy_unit_tests/test_auth_checks.py (modified, +9/-2)
tests/proxy_unit_tests/test_user_api_key_auth.py (modified, +6/-1)
tests/test_litellm/caching/test_dual_cache.py (modified, +70/-0)
tests/test_litellm/proxy/auth/test_auth_checks.py (modified, +19/-17)
tests/test_litellm/proxy/auth/test_handle_jwt.py (modified, +6/-4)
tests/test_litellm/proxy/common_utils/test_cache_codec.py (added, +126/-0)
tests/test_litellm/proxy/common_utils/test_user_api_key_cache.py (added, +219/-0)
tests/test_litellm/proxy/management_endpoints/test_access_group_endpoints.py (modified, +43/-14)
tests/test_litellm/proxy/management_endpoints/test_team_endpoints.py (modified, +3/-0)
tests/test_litellm/proxy/test_redis_auth_cache_flag.py (added, +145/-0)

Code Example

WARNING cache_pydantic_utils.py:87
  CacheCodec.deserialize: validation failed for LiteLLM_UserTable
  (1 validation error for LiteLLM_UserTable
   user_id
     Field required [type=missing, input_value={'team_alias': '<uuid>', ..., 'teams': []}, input_type=dict])

ERROR user_api_key_cache.py:126
  UserApiKeyCache.async_get_cache failed to deserialize cached value
  for key='<uuid>' model_type=LiteLLM_UserTable

---

general_settings:
  user_api_key_cache_ttl: 5

RAW_BUFFERClick to expand / collapse

Version

litellm/litellm-database:1.84.0-rc.1

Summary

Symptom

Repeating pair of log lines, once per team-scoped request:

WARNING cache_pydantic_utils.py:87
  CacheCodec.deserialize: validation failed for LiteLLM_UserTable
  (1 validation error for LiteLLM_UserTable
   user_id
     Field required [type=missing, input_value={'team_alias': '<uuid>', ..., 'teams': []}, input_type=dict])

ERROR user_api_key_cache.py:126
  UserApiKeyCache.async_get_cache failed to deserialize cached value
  for key='<uuid>' model_type=LiteLLM_UserTable

The input_value snippet shows team_alias is set but user_id is absent. The same handful of UUIDs cycles in the logs — these are our team-scoped virtual keys (no associated user).

Root cause (likely)

Inconsistency introduced by CacheCodec (litellm/proxy/common_utils/cache_pydantic_utils.py):

Writer: CacheCodec.serialize uses model_dump(mode=\"json\", exclude_none=True).
Reader: CacheCodec.deserialize uses model_validate, and LiteLLM_UserTable.user_id is declared required.

For any cached LiteLLM_UserTable instance where user_id is None — i.e. team-scoped API keys with no associated user — the writer strips the field via exclude_none=True, then the reader rejects the cached payload because user_id is required. The entry is guaranteed to fail deserialization on every read, for the lifetime of the key.

Reproduction

Deploy LiteLLM proxy ≥ the build containing #26202 with Redis-backed user_api_key_cache (multi-replica).
Create a virtual key scoped to a team with no user (so the underlying LiteLLM_UserTable row has user_id = NULL).
Send requests using that key.
Observe the ERROR/WARNING pair above repeating on every cache lookup.

Impact

Functional: none. Requests succeed; the codec treats the validation failure as a miss.
Operational: every team-scoped auth check round-trips to Postgres instead of being served from Redis — defeats the optimization #26202 was designed to deliver for exactly this multi-replica scenario.
Observability: ERROR-level logs at high cardinality; pollutes log aggregation / alerting.

Suggested fixes (for discussion)

One of:

Make LiteLLM_UserTable.user_id Optional[str] = None for the cache model (it can legitimately be None for team-scoped keys).
Have CacheCodec.serialize drop exclude_none=True for models with required fields, so absent values round-trip as None rather than missing.
Use model_dump(mode=\"json\") without exclude_none for LiteLLM_UserTable specifically, or have the reader tolerate missing optional-by-semantics fields.

Configuration

general_settings:
  user_api_key_cache_ttl: 5

(Redis configured via REDIS_HOST / REDIS_PORT env vars; standard Bitnami redis chart deployed in-cluster.)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #optimization #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Redis user_api_key_cache deserialization always fails for team-scoped keys (LiteLLM_UserTable.user_id required) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (likely)

PR fix notes

PR #26202: Litellm token verification query optimization

Description (problem / solution / changelog)

Relevant issues

Problem

Approach

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Code changes (summary)

Testing

Load Test (4 Pods)

Without Redis — combined view only (SELECT v.* … FROM "LiteLLM_VerificationToken" AS v …)

With Redis — combined view only (same query)

Locust Results

Changed files

Code Example

Version

Summary

Symptom

Root cause (likely)

Reproduction

Impact

Suggested fixes (for discussion)

Configuration

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Without Redis — combined view only (`SELECT v.* … FROM "LiteLLM_VerificationToken" AS v …`)