litellm - 💡(How to fix) Fix Tag budgets never reset: ResetBudgetJob has no tag handler, tags blocked permanently after first overage

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Tags with an attached budget that has budget_duration set (e.g. "1d", "30d") never have their spend column reset. The LiteLLM_BudgetTable.budget_reset_at field correctly advances each cycle, but LiteLLM_TagTable.spend is untouched. Result: once a tag's spend exceeds max_budget, every subsequent request bearing that tag returns HTTP 400 budget_exceeded indefinitely — across reset boundaries, days, weeks. Recovery requires manual DB update.

Error Message

{"error":{"message":"Budget has been exceeded! Tag=tenant:acme Current cost: 0.063, Max budget: 0.05","type":"budget_exceeded","param":null,"code":"400"}}

Root Cause

litellm/proxy/common_utils/reset_budget_job.py contains zero occurrences of the word "tag" (case-insensitive grep). The job has handlers for end-users, team members, keys, and orgs:

# reset_budget_job.py:175-181
await self.reset_budget_for_litellm_team_members(budgets_to_reset=budgets_to_reset)
await self.reset_budget_for_keys_linked_to_budgets(budgets_to_reset=budgets_to_reset)
# (similar dispatch for endusers below)

But no reset_budget_for_tags_linked_to_budgets exists. _tag_max_budget_check reads tag_object.spend from LiteLLM_TagTable (litellm/proxy/auth/auth_checks.py:3530), so the gate keeps reading the stale over-cap value forever.

Fix Action

Fix / Workaround

Tested on v1.83.14-stable.patch.2, fresh docker-compose with proxy_budget_rescheduler_min_time: 1 and proxy_budget_rescheduler_max_time: 5 (so reset polling fires every 1–5s instead of the default 600s).

# reset_budget_job.py:175-181
await self.reset_budget_for_litellm_team_members(budgets_to_reset=budgets_to_reset)
await self.reset_budget_for_keys_linked_to_budgets(budgets_to_reset=budgets_to_reset)
# (similar dispatch for endusers below)

Then wire it into reset_budget_for_litellm_budget_table next to the existing dispatches:

Code Example

{"error":{"message":"Budget has been exceeded! Tag=tenant:acme Current cost: 0.063, Max budget: 0.05","type":"budget_exceeded","param":null,"code":"400"}}

---

# reset_budget_job.py:175-181
await self.reset_budget_for_litellm_team_members(budgets_to_reset=budgets_to_reset)
await self.reset_budget_for_keys_linked_to_budgets(budgets_to_reset=budgets_to_reset)
# (similar dispatch for endusers below)

---

async def reset_budget_for_tags_linked_to_budgets(
    self, budgets_to_reset: List[LiteLLM_BudgetTableFull]
) -> None:
    budget_ids = [b.budget_id for b in budgets_to_reset if b.budget_id is not None]
    if not budget_ids:
        return
    tags_to_reset = await self.prisma_client.get_data(
        table_name="tag",
        query_type="find_all",
        budget_id_list=budget_ids,
    )
    if not tags_to_reset:
        return
    updated_tags = []
    for tag in tags_to_reset:
        tag.spend = 0.0
        updated_tags.append(tag)
    await self.prisma_client.update_data(
        query_type="update_many",
        data_list=updated_tags,
        table_name="tag",
    )

---

# reset_budget_job.py around line 181
await self.reset_budget_for_keys_linked_to_budgets(budgets_to_reset=budgets_to_reset)
await self.reset_budget_for_tags_linked_to_budgets(budgets_to_reset=budgets_to_reset)  # NEW
RAW_BUFFERClick to expand / collapse

Summary

Tags with an attached budget that has budget_duration set (e.g. "1d", "30d") never have their spend column reset. The LiteLLM_BudgetTable.budget_reset_at field correctly advances each cycle, but LiteLLM_TagTable.spend is untouched. Result: once a tag's spend exceeds max_budget, every subsequent request bearing that tag returns HTTP 400 budget_exceeded indefinitely — across reset boundaries, days, weeks. Recovery requires manual DB update.

Reproduction

Tested on v1.83.14-stable.patch.2, fresh docker-compose with proxy_budget_rescheduler_min_time: 1 and proxy_budget_rescheduler_max_time: 5 (so reset polling fires every 1–5s instead of the default 600s).

  1. Create a budget: {"max_budget": 0.05, "budget_duration": "1m"} → returns budget_id=B, budget_reset_at=T0+60s.
  2. Create a tag bound to B.
  3. Drive body-tagged traffic to push tag spend past $0.05; observe HTTP 400 firing as expected.
  4. Wait through three full reset cycles (3+ minutes).
  5. Send another body-tagged request.

Observed state across cycles:

Time (UTC)BudgetTable.budget_reset_atTagTable.spendTagTable.updated_at
T0 (creation)T0+60s0.000
T0+58s (post-overage)T0+120s ✅ advanced0.063T0+5s
T0+142s (1 reset later)T0+180s ✅ advanced0.063 ❌ unchangedT0+5s
T0+197s (2 resets later)T0+240s ✅ advanced0.063 ❌ unchangedT0+5s

Step 5 result: HTTP 400, same body verbatim:

{"error":{"message":"Budget has been exceeded! Tag=tenant:acme Current cost: 0.063, Max budget: 0.05","type":"budget_exceeded","param":null,"code":"400"}}

Three full budget_duration periods elapsed. The budget object dutifully advanced its budget_reset_at each cycle. The tag's spend never moved.

Root cause

litellm/proxy/common_utils/reset_budget_job.py contains zero occurrences of the word "tag" (case-insensitive grep). The job has handlers for end-users, team members, keys, and orgs:

# reset_budget_job.py:175-181
await self.reset_budget_for_litellm_team_members(budgets_to_reset=budgets_to_reset)
await self.reset_budget_for_keys_linked_to_budgets(budgets_to_reset=budgets_to_reset)
# (similar dispatch for endusers below)

But no reset_budget_for_tags_linked_to_budgets exists. _tag_max_budget_check reads tag_object.spend from LiteLLM_TagTable (litellm/proxy/auth/auth_checks.py:3530), so the gate keeps reading the stale over-cap value forever.

Suggested fix

Add a tag handler analogous to reset_budget_for_keys_linked_to_budgets. Sketch:

async def reset_budget_for_tags_linked_to_budgets(
    self, budgets_to_reset: List[LiteLLM_BudgetTableFull]
) -> None:
    budget_ids = [b.budget_id for b in budgets_to_reset if b.budget_id is not None]
    if not budget_ids:
        return
    tags_to_reset = await self.prisma_client.get_data(
        table_name="tag",
        query_type="find_all",
        budget_id_list=budget_ids,
    )
    if not tags_to_reset:
        return
    updated_tags = []
    for tag in tags_to_reset:
        tag.spend = 0.0
        updated_tags.append(tag)
    await self.prisma_client.update_data(
        query_type="update_many",
        data_list=updated_tags,
        table_name="tag",
    )

Then wire it into reset_budget_for_litellm_budget_table next to the existing dispatches:

# reset_budget_job.py around line 181
await self.reset_budget_for_keys_linked_to_budgets(budgets_to_reset=budgets_to_reset)
await self.reset_budget_for_tags_linked_to_budgets(budgets_to_reset=budgets_to_reset)  # NEW

Cache invalidation note: _tag_max_budget_check reads through get_tag_objects_batch (auth_checks.py:1064), which uses DualCache. The reset handler should also bust the tag:<name> cache key after writing — same pattern the end-user reset uses.

Test gap

tests/test_litellm/proxy/common_utils/test_reset_budget_job.py covers key/user/team/end-user/project resets (lines 138–244) but contains no tag reset coverage. A test asserting LiteLLM_TagTable.spend returns to 0 after budget_reset_at elapses should land alongside the fix.

Severity

High for any deployment that uses tag budgets for tenant/customer enforcement: a single overage permanently blocks a tenant until ops manually UPDATE LiteLLM_TagTable SET spend=0 WHERE tag_name='...'. No alerting, no automatic recovery.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix Tag budgets never reset: ResetBudgetJob has no tag handler, tags blocked permanently after first overage