litellm - 💡(How to fix) Fix [Bug]: PR #25888 fallback path reads input/output cost from db model_info but skips cache_read_input_token_cost / cache_creation_input_token_cost [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26806Fetched 2026-04-30 06:19:41
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Code Example

db_model_info = model.get("model_info", {})
mode = db_model_info.get("mode", "chat")
input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
# ↑ cache fields not read; ModelMapInfo gets cache_read_input_token_cost=None even
#   when db_model_info has it set.

---

input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
cache_read_input_token_cost = db_model_info.get("cache_read_input_token_cost")
cache_creation_input_token_cost = db_model_info.get("cache_creation_input_token_cost")
supports_prompt_caching = db_model_info.get("supports_prompt_caching")
# tier variants too:
input_cost_per_token_above_128k_tokens = db_model_info.get("input_cost_per_token_above_128k_tokens")
input_cost_per_token_above_200k_tokens = db_model_info.get("input_cost_per_token_above_200k_tokens")
input_cost_per_token_above_272k_tokens = db_model_info.get("input_cost_per_token_above_272k_tokens")
output_cost_per_token_above_128k_tokens = db_model_info.get("output_cost_per_token_above_128k_tokens")
output_cost_per_token_above_200k_tokens = db_model_info.get("output_cost_per_token_above_200k_tokens")
output_cost_per_token_above_272k_tokens = db_model_info.get("output_cost_per_token_above_272k_tokens")
cache_read_input_token_cost_above_128k_tokens = db_model_info.get("cache_read_input_token_cost_above_128k_tokens")
cache_read_input_token_cost_above_200k_tokens = db_model_info.get("cache_read_input_token_cost_above_200k_tokens")
cache_read_input_token_cost_above_272k_tokens = db_model_info.get("cache_read_input_token_cost_above_272k_tokens")
RAW_BUFFERClick to expand / collapse

Check for existing issues

Related: #25839, #25950, #24774, #25204, #11364

What happened?

PR #25888 (merged 2026-04-25) fixed the fallback path in Router.get_model_group_info() so that when a model is not in LiteLLM's built-in model_prices_and_context_window.json, the fallback ModelMapInfo reads input_cost_per_token / output_cost_per_token from the user's db_model_info.

The same fallback does not pass through cache pricing fields. As a result, on any model that isn't present in the built-in cost map, LiteLLM applies the user-configured input/output prices but silently drops cache_read_input_token_cost, cache_creation_input_token_cost, and the tier variants — even when those fields are set on model_info and supports_prompt_caching: true is set.

The downstream effect is the one reported in #25839, #25950, #24774, #11364: cached tokens are billed at $0 instead of the configured cache_read rate, on any custom-hosted or self-hosted model that isn't in the upstream pricing JSON.

Where in the code

litellm/router.py ~line 8088 (the block #25888 modified):

db_model_info = model.get("model_info", {})
mode = db_model_info.get("mode", "chat")
input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
# ↑ cache fields not read; ModelMapInfo gets cache_read_input_token_cost=None even
#   when db_model_info has it set.

The fix should mirror #25888 for the additional fields:

input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
cache_read_input_token_cost = db_model_info.get("cache_read_input_token_cost")
cache_creation_input_token_cost = db_model_info.get("cache_creation_input_token_cost")
supports_prompt_caching = db_model_info.get("supports_prompt_caching")
# tier variants too:
input_cost_per_token_above_128k_tokens = db_model_info.get("input_cost_per_token_above_128k_tokens")
input_cost_per_token_above_200k_tokens = db_model_info.get("input_cost_per_token_above_200k_tokens")
input_cost_per_token_above_272k_tokens = db_model_info.get("input_cost_per_token_above_272k_tokens")
output_cost_per_token_above_128k_tokens = db_model_info.get("output_cost_per_token_above_128k_tokens")
output_cost_per_token_above_200k_tokens = db_model_info.get("output_cost_per_token_above_200k_tokens")
output_cost_per_token_above_272k_tokens = db_model_info.get("output_cost_per_token_above_272k_tokens")
cache_read_input_token_cost_above_128k_tokens = db_model_info.get("cache_read_input_token_cost_above_128k_tokens")
cache_read_input_token_cost_above_200k_tokens = db_model_info.get("cache_read_input_token_cost_above_200k_tokens")
cache_read_input_token_cost_above_272k_tokens = db_model_info.get("cache_read_input_token_cost_above_272k_tokens")

…and pass each through to the ModelMapInfo(...) constructor.

Expected behavior

When a model is not in the built-in cost map, ModelMapInfo returned by get_model_group_info() should reflect every pricing-related field the user set on model_info, not just input_cost_per_token and output_cost_per_token.

Actual behavior

model_map_value reaching the cost calculator has cache_read_input_token_cost: None and supports_prompt_caching: None even when both are populated in DB / /model/info. The cost calculator then bills the cached portion at $0 (no cache_read rate to apply).

Reproduction

  1. Configure a custom-hosted model whose model_name is not present in model_prices_and_context_window.json (e.g. any vendor-org/Model-Name form not in the built-in JSON). Set model_info.input_cost_per_token, model_info.cache_read_input_token_cost, and model_info.supports_prompt_caching: true.
  2. Send a chat completion request that produces prompt_tokens_details.cached_tokens > 0.
  3. Inspect LiteLLM_SpendLogs.metadata.model_map_information.model_map_value. Observe cache_read_input_token_cost: null, supports_prompt_caching: null, despite the values being correctly persisted in /model/info.
  4. Inspect cost_breakdown.input_cost. Observe cached_tokens × cache_read_input_token_cost is missing — the cached portion is billed at $0.

Severity

The same bug surfaces under several open issues (#25839, #25950, #24774, #25204, #11364) with different providers. A targeted fix at the router fallback path would fix all of them at once for self-hosted / custom-routed models, and is a small, mechanical extension of #25888.

Twitter / LinkedIn details

N/A

extent analysis

TL;DR

The issue can be fixed by modifying the Router.get_model_group_info() function to pass through cache pricing fields from the user's db_model_info to the ModelMapInfo constructor.

Guidance

  • Modify the litellm/router.py file at line 8088 to read cache pricing fields from db_model_info and pass them to the ModelMapInfo constructor.
  • Verify that the ModelMapInfo object returned by get_model_group_info() reflects the user-set pricing fields, including cache_read_input_token_cost and supports_prompt_caching.
  • Test the fix by reproducing the issue using the steps provided and inspecting the LiteLLM_SpendLogs.metadata.model_map_information.model_map_value to ensure that the cache pricing fields are correctly populated.
  • Review the related issues (#25839, #25950, #24774, #25204, #11364) to ensure that the fix resolves the problems reported in those issues.

Example

cache_read_input_token_cost = db_model_info.get("cache_read_input_token_cost")
cache_creation_input_token_cost = db_model_info.get("cache_creation_input_token_cost")
# ... (include all cache pricing fields)
ModelMapInfo(..., cache_read_input_token_cost=cache_read_input_token_cost, ...)

Notes

The fix is a mechanical extension of the changes made in PR #25888 and should resolve the issues reported in the related tickets. However, thorough testing is necessary to ensure that the fix does not introduce any new problems.

Recommendation

Apply the workaround by modifying the Router.get_model_group_info() function to pass through cache pricing fields. This fix should resolve the issues reported in the related tickets and provide the correct pricing information for custom-hosted and self-hosted models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a model is not in the built-in cost map, ModelMapInfo returned by get_model_group_info() should reflect every pricing-related field the user set on model_info, not just input_cost_per_token and output_cost_per_token.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: PR #25888 fallback path reads input/output cost from db model_info but skips cache_read_input_token_cost / cache_creation_input_token_cost [1 participants]