litellm - 💡(How to fix) Fix upsert_deployment does not clean up stale entries in team_pattern_routers / pattern_router on model update

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

After DB fix, alternating between correct and stale deployment:

Success (correct deployment):

litellm.acompletion(model=openai/gpt-4o) 200 OK

Failure (stale deployment with double prefix still in pattern router):

litellm.acompletion(model=openai/openai/gpt-4o) Exception litellm.BadRequestError: OpenAIException - invalid model ID

Root Cause

We discovered this in production after patching a team BYOK wildcard model (openai/*) that had a double-prefixed litellm_params.model (openai/openai/*). After the DB fix via PATCH /model/{id}/update, ~50% of requests still failed because stale deployment entries persisted in the in-memory pattern router across all replicas. Only a full pod restart resolved the issue.

Fix Action

Fix / Workaround

When a wildcard deployment's litellm_params are updated (e.g. correcting the model field via PATCH /model/{id}/update), upsert_deployment removes the old deployment from model_list but does not remove the stale entry from team_pattern_routers or pattern_router. The subsequent add_deployment call appends a new entry alongside the stale one. The router then round-robins between the correct and stale deployments, causing intermittent routing failures.

We discovered this in production after patching a team BYOK wildcard model (openai/*) that had a double-prefixed litellm_params.model (openai/openai/*). After the DB fix via PATCH /model/{id}/update, ~50% of requests still failed because stale deployment entries persisted in the in-memory pattern router across all replicas. Only a full pod restart resolved the issue.

  1. Create a team wildcard model with litellm_params.model = "openai/openai/*" (intentionally wrong)
  2. Send requests to openai/gpt-4o — all fail (expected, double prefix)
  3. PATCH the model to fix litellm_params.model = "openai/*"
  4. Wait >30s for the DB polling cycle to trigger upsert_deployment
  5. Send requests to openai/gpt-4o repeatedly — ~50% fail, ~50% succeed

Code Example

self.team_pattern_routers[_team_id].add_pattern(
    _team_public_model_name, deployment.to_json(exclude_none=True)
)

---

if regex not in self.patterns:
    self.patterns[regex] = []
self.patterns[regex].append(llm_deployment)

---

# After DB fix, alternating between correct and stale deployment:
# Success (correct deployment):
litellm.acompletion(model=openai/gpt-4o) 200 OK

# Failure (stale deployment with double prefix still in pattern router):
litellm.acompletion(model=openai/openai/gpt-4o) Exception
litellm.BadRequestError: OpenAIException - invalid model ID
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When a wildcard deployment's litellm_params are updated (e.g. correcting the model field via PATCH /model/{id}/update), upsert_deployment removes the old deployment from model_list but does not remove the stale entry from team_pattern_routers or pattern_router. The subsequent add_deployment call appends a new entry alongside the stale one. The router then round-robins between the correct and stale deployments, causing intermittent routing failures.

We discovered this in production after patching a team BYOK wildcard model (openai/*) that had a double-prefixed litellm_params.model (openai/openai/*). After the DB fix via PATCH /model/{id}/update, ~50% of requests still failed because stale deployment entries persisted in the in-memory pattern router across all replicas. Only a full pod restart resolved the issue.

Root Cause:

upsert_deployment (router.py:7910) handles updates by:

  1. Removing the old deployment from self.model_list (line 7945)
  2. Calling self.add_deployment(deployment) (line 7953) → _add_deployment

_add_deployment (router.py:7536) registers the deployment in team_pattern_routers:

self.team_pattern_routers[_team_id].add_pattern(
    _team_public_model_name, deployment.to_json(exclude_none=True)
)

PatternMatchRouter.add_pattern (pattern_match_deployments.py:64) only appends:

if regex not in self.patterns:
    self.patterns[regex] = []
self.patterns[regex].append(llm_deployment)

There is no removal of the previous deployment from the pattern dict. After upsert, the pattern openai/(.*) contains both the old (stale) and new (correct) deployment dicts. The same issue applies to self.pattern_router for non-team wildcards.

Steps to Reproduce

  1. Create a team wildcard model with litellm_params.model = "openai/openai/*" (intentionally wrong)
  2. Send requests to openai/gpt-4o — all fail (expected, double prefix)
  3. PATCH the model to fix litellm_params.model = "openai/*"
  4. Wait >30s for the DB polling cycle to trigger upsert_deployment
  5. Send requests to openai/gpt-4o repeatedly — ~50% fail, ~50% succeed

Expected: 100% success after the fix is applied and replicas refresh from DB. Actual: Intermittent failures persist until pods are restarted, because the stale deployment entry is never evicted from the pattern router.

Relevant log output

# After DB fix, alternating between correct and stale deployment:
# Success (correct deployment):
litellm.acompletion(model=openai/gpt-4o) 200 OK

# Failure (stale deployment with double prefix still in pattern router):
litellm.acompletion(model=openai/openai/gpt-4o) Exception
litellm.BadRequestError: OpenAIException - invalid model ID

Suggested Fix

In upsert_deployment, when removing an old deployment, also clean up pattern router entries. Either:

  • Remove the specific deployment from team_pattern_routers[team_id].patterns[regex] by matching on model_info.id
  • Or rebuild the PatternMatchRouter for the affected team entirely on upsert

The same cleanup should apply to self.pattern_router for non-team wildcard deployments.

What part of LiteLLM is this about?

Router, Proxy

What LiteLLM version are you on ?

v1.85.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix upsert_deployment does not clean up stale entries in team_pattern_routers / pattern_router on model update