litellm - 💡(How to fix) Fix Prisma Query Engine startup race condition causes spend data loss during rolling deployments

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

During Kubernetes rolling deployments, LiteLLM starts Uvicorn and immediately schedules background jobs (update_spend, load_credentials, add_deployment, budget cache population) before the embedded Prisma Query Engine subprocess is ready to accept connections. This causes a ~5–6 minute window per pod where all DB operations fail silently, resulting in lost spend data and degraded startup state.

Error Message

httpx.ConnectError: All connection attempts failed LiteLLM Prisma Client Exception get_generic_data: All connection attempts failed Error in spend logs queue monitor: All connection attempts failed Budget lookup failed for user; cache will not be populated. Each request will hit the database. litellm.proxy.proxy_server.py::ProxyConfig:add_deployment - All connection attempts failed litellm.proxy_server.py::get_credentials() - Error getting credentials from DB - All connection attempts failed

Root Cause

The Prisma client in LiteLLM communicates with an embedded Query Engine subprocess over localhost HTTP (not directly to the database). The engine process starts asynchronously alongside the application. There is no readiness gate between "Uvicorn is accepting traffic" and "Prisma engine is accepting queries."

Reference: prisma/engine/http.pyprisma/engine/query.pyhttpx.ConnectError when the local engine socket is not yet bound.

Code Example

prisma:warn Prisma doesn't know which engines to download for the Linux distro "wolfi".
Falling back to Prisma engines built "debian".

---

httpx.ConnectError: All connection attempts failed
LiteLLM Prisma Client Exception get_generic_data: All connection attempts failed
Error in spend logs queue monitor: All connection attempts failed
Budget lookup failed for user; cache will not be populated. Each request will hit the database.
litellm.proxy.proxy_server.py::ProxyConfig:add_deployment - All connection attempts failed
litellm.proxy_server.py::get_credentials() - Error getting credentials from DB - All connection attempts failed

---

Error in spend logs queue monitor: All connection attempts failed
  File "litellm/proxy/utils.py", line 3880, in update_spend_logs
    await prisma_client.db.litellm_spendlogs.create_many(data=batch_with_dates, skip_duplicates=True)

---

Budget lookup failed for user; cache will not be populated. Each request will hit the database.
Error: All connection attempts failed.

---

async def wait_for_prisma_engine(retries=30, delay=5):
    """Wait for the Prisma Query Engine subprocess to be ready."""
    for attempt in range(retries):
        try:
            await prisma_client.db.execute_raw("SELECT 1")
            return
        except Exception:
            await asyncio.sleep(delay)
    raise RuntimeError("Prisma engine did not become ready in time")
RAW_BUFFERClick to expand / collapse

Summary

During Kubernetes rolling deployments, LiteLLM starts Uvicorn and immediately schedules background jobs (update_spend, load_credentials, add_deployment, budget cache population) before the embedded Prisma Query Engine subprocess is ready to accept connections. This causes a ~5–6 minute window per pod where all DB operations fail silently, resulting in lost spend data and degraded startup state.

Environment

  • LiteLLM version: 1.83.14
  • Deployment: Kubernetes rolling deploy (multiple pods)
  • Container base image: Wolfi-based (wolfi Linux distro)
  • Database: PostgreSQL via Prisma

Steps to Reproduce

  1. Deploy LiteLLM as a Kubernetes rolling deployment (multiple replicas)
  2. Trigger a rolling pod restart (e.g. version upgrade)
  3. Observe logs on newly started pods

What Happens

On pod startup, Uvicorn comes up and immediately starts background jobs. The embedded Prisma Query Engine (a Rust binary that runs as a local subprocess) is still initializing — especially slow on Wolfi-based containers because Prisma does not recognize the wolfi distro and falls back to Debian engine binaries at runtime:

prisma:warn Prisma doesn't know which engines to download for the Linux distro "wolfi".
Falling back to Prisma engines built "debian".

This adds extra engine startup latency. Before the engine is ready, all background jobs fail with:

httpx.ConnectError: All connection attempts failed
LiteLLM Prisma Client Exception get_generic_data: All connection attempts failed
Error in spend logs queue monitor: All connection attempts failed
Budget lookup failed for user; cache will not be populated. Each request will hit the database.
litellm.proxy.proxy_server.py::ProxyConfig:add_deployment - All connection attempts failed
litellm.proxy_server.py::get_credentials() - Error getting credentials from DB - All connection attempts failed

The failure window lasts ~5–6 minutes per pod during a rolling deploy across a multi-pod fleet.

Impact

  1. Spend data is permanently lostlitellm_spendlogs.create_many() fails with ConnectError and the batched spend records are discarded. No retry mechanism exists:
Error in spend logs queue monitor: All connection attempts failed
  File "litellm/proxy/utils.py", line 3880, in update_spend_logs
    await prisma_client.db.litellm_spendlogs.create_many(data=batch_with_dates, skip_duplicates=True)
  1. Budget cache is not populated — every request during the window hits the database directly:
Budget lookup failed for user; cache will not be populated. Each request will hit the database.
Error: All connection attempts failed.
  1. Model deployments and credentials not loaded — routing may be incomplete for the first several minutes after a pod starts.

Root Cause

The Prisma client in LiteLLM communicates with an embedded Query Engine subprocess over localhost HTTP (not directly to the database). The engine process starts asynchronously alongside the application. There is no readiness gate between "Uvicorn is accepting traffic" and "Prisma engine is accepting queries."

Reference: prisma/engine/http.pyprisma/engine/query.pyhttpx.ConnectError when the local engine socket is not yet bound.

Expected Behavior

Background jobs that depend on DB connectivity should either:

  • Wait for Prisma engine readiness before being scheduled (e.g. probe the local engine with a SELECT 1 equivalent)
  • Retry with exponential backoff on startup DB operations rather than failing immediately and dropping data

Suggested Fix

In proxy_server.py or the scheduler initialization, add a readiness check before starting DB-dependent background tasks:

async def wait_for_prisma_engine(retries=30, delay=5):
    """Wait for the Prisma Query Engine subprocess to be ready."""
    for attempt in range(retries):
        try:
            await prisma_client.db.execute_raw("SELECT 1")
            return
        except Exception:
            await asyncio.sleep(delay)
    raise RuntimeError("Prisma engine did not become ready in time")

Alternatively, _monitor_spend_logs_queue and other startup jobs should implement retry/backoff rather than immediately propagating ConnectError as a fatal failure.

Additional Context

This issue is exacerbated by the wolfi distro fallback — adding Wolfi to Prisma's supported distros list would reduce the engine startup latency that widens the race window. However, the core issue exists regardless of distro.

Related PRs that touch the startup path: #24682 (increment_spend_counters), #23019 (Redis transaction buffer startup guard).

cc @michelligabriele @yuneng-berri

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING