litellm - 💡(How to fix) Fix Proxy silently drops all models after extended runtime (config-only, no DB) [1 comments, 1 participants]

litellm2026-04-08 14:57:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#25350•Fetched 2026-04-09 07:52:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Erlend2k

Participants

Erlend2k

Timeline (top)

closed ×1commented ×1labeled ×1

After extended runtime (8-14+ hours), the LiteLLM proxy silently drops all models. /v1/models and /model/info return {"data": []} (0 models), while /health continues to report healthy with DB connected.

All downstream consumers that route through the proxy start failing with model-not-found errors. There is no log output indicating the models were dropped.

Error Message

Models silently disappear from llm_router.model_list after extended runtime. No error is logged. The health endpoint masks the issue by reporting healthy.

Root Cause

Suspected Root Cause

Fix Action

Workaround

We run an external health check (systemd timer, every 10 minutes) that queries /v1/models, checks model count, and auto-restarts the proxy via systemctl restart if count drops to 0.

RAW_BUFFERClick to expand / collapse

Bug Report

Environment

LiteLLM version: 1.83.4 (latest as of 2026-04-08)
OS: Ubuntu 24.04
Python: 3.12
Setup: Config-only (config.yaml), no Prisma/DB for model storage
Deployment: systemd user service, --num_workers 1

Description

All downstream consumers that route through the proxy start failing with model-not-found errors. There is no log output indicating the models were dropped.

Reproduction Steps

Start LiteLLM proxy with a config.yaml containing multiple model definitions (we have 26 models across Vertex AI and Anthropic providers)
Verify models are loaded: curl /v1/models returns all 26
Leave the proxy running for 8-14+ hours under normal load
Check again: curl /v1/models returns {"data": []} — 0 models
/health still reports healthy
Restarting the proxy immediately restores all 26 models

Expected Behavior

Models loaded from config.yaml should persist for the lifetime of the proxy process.

Actual Behavior

Models silently disappear from llm_router.model_list after extended runtime. No error is logged. The health endpoint masks the issue by reporting healthy.

Suspected Root Cause

Based on code review of proxy_server.py:

_schedule_regular_config_update() runs a background task that periodically reloads config. If this reload silently fails or partially executes, it could wipe llm_router.model_list
The model_cost_map_reload scheduled task (runs every 24h by default) may also contribute — see #24349 and #24308 for related issues with this task causing unexpected proxy state

Workaround

We run an external health check (systemd timer, every 10 minutes) that queries /v1/models, checks model count, and auto-restarts the proxy via systemctl restart if count drops to 0.

Related Issues

#12711 — config.yaml models disappear after /model/update API call (different trigger, same symptom)
PR #22215 — fixes a related architectural issue where proxy_server.py only reads from llm_router.model_list; if the router drops its list, /model/info silently returns 0

Impact

This is a silent failure that affects all consumers. The health endpoint reporting "healthy" while serving 0 models makes it particularly dangerous for production use. We've hit this bug multiple times over the past week.

extent analysis

TL;DR

The most likely fix involves addressing the periodic config reload and model cost map reload tasks in proxy_server.py to prevent silent failures that wipe the llm_router.model_list.

Guidance

Investigate the _schedule_regular_config_update() function to ensure it properly handles config reloads without clearing the llm_router.model_list.
Review the model_cost_map_reload task to determine if it's contributing to the issue, considering the related issues #24349 and #24308.
Verify that the llm_router.model_list is being updated correctly after each config reload and model cost map reload.
Consider implementing additional logging to detect when the llm_router.model_list is being cleared or modified unexpectedly.

Example

No code snippet is provided due to the lack of specific code details in the issue.

Notes

The provided workaround using an external health check to auto-restart the proxy may mitigate the issue but does not address the root cause. The related issues #12711 and PR #22215 may provide additional context for resolving the problem.

Recommendation

Apply a workaround by implementing the external health check as described, while concurrently investigating and addressing the root cause related to the periodic config and model cost map reload tasks. This approach ensures immediate mitigation of the production impact while working towards a more permanent fix.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix Proxy silently drops all models after extended runtime (config-only, no DB) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Suspected Root Cause

Fix Action

Workaround

Bug Report

Environment

Description

Reproduction Steps

Expected Behavior

Actual Behavior

Suspected Root Cause

Workaround

Related Issues

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix Proxy silently drops all models after extended runtime (config-only, no DB) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Suspected Root Cause

Fix Action

Workaround

Bug Report

Environment

Description

Reproduction Steps

Expected Behavior

Actual Behavior

Suspected Root Cause

Workaround

Related Issues

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING