litellm - 💡(How to fix) Fix Helm chart's migration job defaults to v1 prisma resolver, defeating the v2 fix from #26194 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26780Fetched 2026-04-30 06:20:02
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
subscribed ×1

Error Message

prisma.errors.DataError: The column LiteLLM_ObjectPermissionTable.mcp_toolsets does not exist in the current database. prisma.errors.DataError: The column LiteLLM_MCPServerTable.instructions does not exist in the current database. prisma.errors.DataError: The column LiteLLM_TeamMembership.total_spend does not exist in the current database.

Root Cause

Audit confirmed four entire tables (LiteLLM_MCPToolsetTable, LiteLLM_MemoryTable, LiteLLM_AdaptiveRouterState, LiteLLM_AdaptiveRouterSession) and three columns from migrations recorded as applied were missing from the live schema — exactly the v1 diff-and-force failure mode #22998 describes. Manually running the same image with --use_v2_migration_resolver reported "no pending migrations to apply" because v1 had already stamped the ledger; recovery required replaying the missing DDL by hand.

Code Example

prisma.errors.DataError: The column `LiteLLM_ObjectPermissionTable.mcp_toolsets` does not exist in the current database.
prisma.errors.DataError: The column `LiteLLM_MCPServerTable.instructions` does not exist in the current database.
prisma.errors.DataError: The column `LiteLLM_TeamMembership.total_spend` does not exist in the current database.
RAW_BUFFERClick to expand / collapse

Problem

PR #26194 added --use_v2_migration_resolver to address the diff-and-force schema-thrashing bug tracked in #22998. The flag works as advertised. However, the bundled Helm chart (deploy/charts/litellm-helm) hasn't been updated to take advantage of it:

  • deploy/charts/litellm-helm/templates/migrations-job.yaml runs command: ["python", "litellm/proxy/prisma_migration.py"].
  • litellm/proxy/prisma_migration.py calls run_server(["--skip_server_startup"], standalone_mode=False) with no flag and no envvar fallback (the click option at proxy_cli.py ~line 581 has no envvar=).
  • There is no command:/args:/extraArgs: value on the chart's migration job to override this.

Net effect: any operator using the official chart silently gets the v1 resolver, even on the LiteLLM versions that ship the v2 fix. The bug from #22998 keeps reproducing under the chart's default rolling-deploy strategy.

Concrete repro (just lived this)

Bumped the chart's image.tag from main-v1.82.3v1.83.14.rc.1. The migration Helm-hook job ran and reported success. _prisma_migrations showed all 11 new migrations in the batch as finished_at = <deploy time> with no rolled_back_at. The proxy then crashed in a request loop with errors like:

prisma.errors.DataError: The column `LiteLLM_ObjectPermissionTable.mcp_toolsets` does not exist in the current database.
prisma.errors.DataError: The column `LiteLLM_MCPServerTable.instructions` does not exist in the current database.
prisma.errors.DataError: The column `LiteLLM_TeamMembership.total_spend` does not exist in the current database.

Audit confirmed four entire tables (LiteLLM_MCPToolsetTable, LiteLLM_MemoryTable, LiteLLM_AdaptiveRouterState, LiteLLM_AdaptiveRouterSession) and three columns from migrations recorded as applied were missing from the live schema — exactly the v1 diff-and-force failure mode #22998 describes. Manually running the same image with --use_v2_migration_resolver reported "no pending migrations to apply" because v1 had already stamped the ledger; recovery required replaying the missing DDL by hand.

The startup banner literally points at the fix:

Using default (v1) migration resolver. If your deployment has seen schema thrashing during rolling deploys, try --use_v2_migration_resolver (safer: avoids the diff-and-force recovery that caused the thrash).

But there's no chart-level path to take that advice without forking the chart or piggybacking on extraInitContainers, which is what we ended up doing.

Suggested fixes (any one would close this)

  1. Forward extra args in prisma_migration.py — append sys.argv[1:] to the args list passed to run_server. Then add a migrationJob.extraArgs value on the chart that gets spliced into the container's args:. Keeps v1 default; one-line opt-in for chart users.
  2. Add an envvar binding to the click optionenvvar="USE_V2_MIGRATION_RESOLVER" on --use_v2_migration_resolver. I see PR #26194's discussion went the other way deliberately, but the chart use case argues for it: chart users configure via env, not argv. If you'd rather keep the flag explicit, option 1 is fine.
  3. Flip the chart's default to v2 — bake --use_v2_migration_resolver into prisma_migration.py or the chart command. The v1-stays-default rationale (operators with corrupted ledgers from prior thrashing) applies to direct CLI users, not fresh chart deploys; new chart installs are never in that state.

Happy to send a PR if you have a preference on which path. Most surgical is (1).

Related

  • #22998 (root bug)
  • #26194 (added v2 resolver)
  • #26712 (recent v2 polish — pooler URL handling)

Environment

  • Chart: litellm-helm 1.82.3 (current latest)
  • Image: ghcr.io/berriai/litellm-database:v1.83.14.rc.1
  • DB: PostgreSQL 16 (AWS RDS)

extent analysis

TL;DR

The Helm chart for LiteLLM needs to be updated to utilize the --use_v2_migration_resolver flag to prevent schema thrashing during rolling deploys.

Guidance

  • Update the prisma_migration.py script to forward extra args to run_server, allowing chart users to opt-in to the v2 migration resolver via migrationJob.extraArgs.
  • Consider adding an envvar binding to the --use_v2_migration_resolver click option to enable configuration via environment variables.
  • Alternatively, flip the chart's default to use the v2 migration resolver, as new chart installs are not affected by prior thrashing issues.

Example

# prisma_migration.py
import sys

# ...

run_server(["--skip_server_startup"] + sys.argv[1:], standalone_mode=False)

Notes

The suggested fixes aim to address the issue without introducing significant changes to the existing codebase. However, the best approach may depend on the specific requirements and constraints of the project.

Recommendation

Apply workaround by forwarding extra args in prisma_migration.py, as it provides a flexible and non-intrusive solution that allows chart users to opt-in to the v2 migration resolver.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix Helm chart's migration job defaults to v1 prisma resolver, defeating the v2 fix from #26194 [1 participants]