litellm - 💡(How to fix) Fix [Bug]: DailySpend values silently failing to store in database

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

_update_daily_spend() only processes the first 100 transactions per flush cycle and silently discards the rest. Under any workload that produces more than 100 unique (entity_id, date, api_key, model, provider, endpoint) combinations between flush intervals (~10s), spend data is permanently lost from all Daily Spend tables.

This affects LiteLLM_DailyTagSpend, LiteLLM_DailyUserSpend, LiteLLM_DailyTeamSpend, LiteLLM_DailyOrgSpend, LiteLLM_DailyEndUserSpend, and LiteLLM_DailyAgentSpend.

Root Cause

In _update_daily_spend() (db_spend_update_writer.py), the retry loop processes only the first BATCH_SIZE=100 sorted entries, then breaks:

for i in range(n_retry_times + 1):          # retry loop
    transactions_to_process = dict(
        sorted(daily_spend_transactions.items(), key=...)[:BATCH_SIZE]  # takes first 100
    )
    # ... upsert to DB ...
    for key in transactions_to_process.keys():
        daily_spend_transactions.pop(key, None)  # remove processed entries
    break                                        # exits — remaining entries are never processed

After the function returns, the caller's local variable (daily_tag_spend_update_transactions, etc.) goes out of scope and unprocessed entries are garbage collected.

Code Example

for i in range(n_retry_times + 1):          # retry loop
    transactions_to_process = dict(
        sorted(daily_spend_transactions.items(), key=...)[:BATCH_SIZE]  # takes first 100
    )
    # ... upsert to DB ...
    for key in transactions_to_process.keys():
        daily_spend_transactions.pop(key, None)  # remove processed entries
    break                                        # exits — remaining entries are never processed

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

This issue was previously identified and attempted to be fixed in PR #23452 . We are silently losing spend tracking data, which makes billing and usage metrics unreliable. The bug still exists. I copy/pasted the description below:

Summary

_update_daily_spend() only processes the first 100 transactions per flush cycle and silently discards the rest. Under any workload that produces more than 100 unique (entity_id, date, api_key, model, provider, endpoint) combinations between flush intervals (~10s), spend data is permanently lost from all Daily Spend tables.

This affects LiteLLM_DailyTagSpend, LiteLLM_DailyUserSpend, LiteLLM_DailyTeamSpend, LiteLLM_DailyOrgSpend, LiteLLM_DailyEndUserSpend, and LiteLLM_DailyAgentSpend.

Root cause

In _update_daily_spend() (db_spend_update_writer.py), the retry loop processes only the first BATCH_SIZE=100 sorted entries, then breaks:

for i in range(n_retry_times + 1):          # retry loop
    transactions_to_process = dict(
        sorted(daily_spend_transactions.items(), key=...)[:BATCH_SIZE]  # takes first 100
    )
    # ... upsert to DB ...
    for key in transactions_to_process.keys():
        daily_spend_transactions.pop(key, None)  # remove processed entries
    break                                        # exits — remaining entries are never processed

After the function returns, the caller's local variable (daily_tag_spend_update_transactions, etc.) goes out of scope and unprocessed entries are garbage collected.

Impact

The data loss is biased by sort order. Since transactions are sorted alphabetically by (date, entity_id, api_key, model, provider), entities that sort earlier always win the 100-slot batch. Entities that sort later are systematically under-reported.

In a production workload with ~200 API keys across multiple models and tags, we observed:

  • Entries with alphabetically early entity IDs: 100% capture rate (DailySpend matches SpendLogs exactly)
  • Entries with alphabetically later entity IDs: ~48% capture rate (more than half of spend data lost)
  • LiteLLM_SpendLogs and LiteLLM_VerificationToken.spend remain accurate — only the Daily Spend aggregation tables are affected

The UI's tag/team/org spend dashboards report significantly lower spend than reality because they read from these tables.

Steps to Reproduce

  1. Configure LiteLLM with multiple API keys (virtual keys) and multiple model deployments
  2. Send concurrent requests across >100 unique (tag/user/team, date, api_key, model, provider, endpoint) combinations within a single flush interval (default 10s)
  3. Compare LiteLLM_SpendLogs totals with LiteLLM_DailyTagSpend (or any Daily Spend table) totals
  4. Observe that DailySpend totals are lower, with alphabetically-later entity IDs most affected

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.87.0-rc.2

Twitter / LinkedIn details

http://linkedin.com/in/farukcankaya

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: DailySpend values silently failing to store in database