litellm - 💡(How to fix) Fix Prisma Query Engine startup race condition causes spend data loss during rolling deployments

Error Message

httpx.ConnectError: All connection attempts failed LiteLLM Prisma Client Exception get_generic_data: All connection attempts failed Error in spend logs queue monitor: All connection attempts failed Budget lookup failed for user; cache will not be populated. Each request will hit the database. litellm.proxy.proxy_server.py::ProxyConfig:add_deployment - All connection attempts failed litellm.proxy_server.py::get_credentials() - Error getting credentials from DB - All connection attempts failed

Root Cause

The Prisma client in LiteLLM communicates with an embedded Query Engine subprocess over localhost HTTP (not directly to the database). The engine process starts asynchronously alongside the application. There is no readiness gate between "Uvicorn is accepting traffic" and "Prisma engine is accepting queries."

Reference: prisma/engine/http.py → prisma/engine/query.py → httpx.ConnectError when the local engine socket is not yet bound.

Code Example

prisma:warn Prisma doesn't know which engines to download for the Linux distro "wolfi".
Falling back to Prisma engines built "debian".

---

httpx.ConnectError: All connection attempts failed
LiteLLM Prisma Client Exception get_generic_data: All connection attempts failed
Error in spend logs queue monitor: All connection attempts failed
Budget lookup failed for user; cache will not be populated. Each request will hit the database.
litellm.proxy.proxy_server.py::ProxyConfig:add_deployment - All connection attempts failed
litellm.proxy_server.py::get_credentials() - Error getting credentials from DB - All connection attempts failed

---

Error in spend logs queue monitor: All connection attempts failed
  File "litellm/proxy/utils.py", line 3880, in update_spend_logs
    await prisma_client.db.litellm_spendlogs.create_many(data=batch_with_dates, skip_duplicates=True)

---

Budget lookup failed for user; cache will not be populated. Each request will hit the database.
Error: All connection attempts failed.

---

async def wait_for_prisma_engine(retries=30, delay=5):
    """Wait for the Prisma Query Engine subprocess to be ready."""
    for attempt in range(retries):
        try:
            await prisma_client.db.execute_raw("SELECT 1")
            return
        except Exception:
            await asyncio.sleep(delay)
    raise RuntimeError("Prisma engine did not become ready in time")

Summary

During Kubernetes rolling deployments, LiteLLM starts Uvicorn and immediately schedules background jobs (update_spend, load_credentials, add_deployment, budget cache population) before the embedded Prisma Query Engine subprocess is ready to accept connections. This causes a ~5–6 minute window per pod where all DB operations fail silently, resulting in lost spend data and degraded startup state.

Environment

LiteLLM version: 1.83.14
Deployment: Kubernetes rolling deploy (multiple pods)
Container base image: Wolfi-based (wolfi Linux distro)
Database: PostgreSQL via Prisma

Steps to Reproduce

Deploy LiteLLM as a Kubernetes rolling deployment (multiple replicas)
Trigger a rolling pod restart (e.g. version upgrade)
Observe logs on newly started pods

What Happens

On pod startup, Uvicorn comes up and immediately starts background jobs. The embedded Prisma Query Engine (a Rust binary that runs as a local subprocess) is still initializing — especially slow on Wolfi-based containers because Prisma does not recognize the wolfi distro and falls back to Debian engine binaries at runtime:

prisma:warn Prisma doesn't know which engines to download for the Linux distro "wolfi".
Falling back to Prisma engines built "debian".

This adds extra engine startup latency. Before the engine is ready, all background jobs fail with:

httpx.ConnectError: All connection attempts failed
LiteLLM Prisma Client Exception get_generic_data: All connection attempts failed
Error in spend logs queue monitor: All connection attempts failed
Budget lookup failed for user; cache will not be populated. Each request will hit the database.
litellm.proxy.proxy_server.py::ProxyConfig:add_deployment - All connection attempts failed
litellm.proxy_server.py::get_credentials() - Error getting credentials from DB - All connection attempts failed

The failure window lasts ~5–6 minutes per pod during a rolling deploy across a multi-pod fleet.

Impact

Spend data is permanently lost — litellm_spendlogs.create_many() fails with ConnectError and the batched spend records are discarded. No retry mechanism exists:

Error in spend logs queue monitor: All connection attempts failed
  File "litellm/proxy/utils.py", line 3880, in update_spend_logs
    await prisma_client.db.litellm_spendlogs.create_many(data=batch_with_dates, skip_duplicates=True)

Budget cache is not populated — every request during the window hits the database directly:

Budget lookup failed for user; cache will not be populated. Each request will hit the database.
Error: All connection attempts failed.

Model deployments and credentials not loaded — routing may be incomplete for the first several minutes after a pod starts.

Root Cause

Reference: prisma/engine/http.py → prisma/engine/query.py → httpx.ConnectError when the local engine socket is not yet bound.

Expected Behavior

Background jobs that depend on DB connectivity should either:

Wait for Prisma engine readiness before being scheduled (e.g. probe the local engine with a SELECT 1 equivalent)
Retry with exponential backoff on startup DB operations rather than failing immediately and dropping data

Suggested Fix

In proxy_server.py or the scheduler initialization, add a readiness check before starting DB-dependent background tasks:

async def wait_for_prisma_engine(retries=30, delay=5):
    """Wait for the Prisma Query Engine subprocess to be ready."""
    for attempt in range(retries):
        try:
            await prisma_client.db.execute_raw("SELECT 1")
            return
        except Exception:
            await asyncio.sleep(delay)
    raise RuntimeError("Prisma engine did not become ready in time")

Alternatively, _monitor_spend_logs_queue and other startup jobs should implement retry/backoff rather than immediately propagating ConnectError as a fatal failure.

Additional Context

This issue is exacerbated by the wolfi distro fallback — adding Wolfi to Prisma's supported distros list would reduce the engine startup latency that widens the race window. However, the core issue exists regardless of distro.

Related PRs that touch the startup path: #24682 (increment_spend_counters), #23019 (Redis transaction buffer startup guard).

cc @michelligabriele @yuneng-berri

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix Prisma Query Engine startup race condition causes spend data loss during rolling deployments

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Steps to Reproduce

What Happens

Impact

Root Cause

Expected Behavior

Suggested Fix

Additional Context

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix Prisma Query Engine startup race condition causes spend data loss during rolling deployments

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Steps to Reproduce

What Happens

Impact

Root Cause

Expected Behavior

Suggested Fix

Additional Context

Still need to ship something?

RELATED_DISCOVERY

TRENDING