litellm - 💡(How to fix) Fix [Bug]: PR #25888 fallback path reads input/output cost from db model_info but skips cache_read_input_token_cost / cache_creation_input_token

Q: Expected behavior

When a model is not in the built-in cost map, `ModelMapInfo` returned by `get_model_group_info()` should reflect every pricing-related field the user set on `model_info`, not just `input_cost_per_token` and `output_cost_per_token`.

Code Example

db_model_info = model.get("model_info", {})
mode = db_model_info.get("mode", "chat")
input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
# ↑ cache fields not read; ModelMapInfo gets cache_read_input_token_cost=None even
#   when db_model_info has it set.

---

input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
cache_read_input_token_cost = db_model_info.get("cache_read_input_token_cost")
cache_creation_input_token_cost = db_model_info.get("cache_creation_input_token_cost")
supports_prompt_caching = db_model_info.get("supports_prompt_caching")
# tier variants too:
input_cost_per_token_above_128k_tokens = db_model_info.get("input_cost_per_token_above_128k_tokens")
input_cost_per_token_above_200k_tokens = db_model_info.get("input_cost_per_token_above_200k_tokens")
input_cost_per_token_above_272k_tokens = db_model_info.get("input_cost_per_token_above_272k_tokens")
output_cost_per_token_above_128k_tokens = db_model_info.get("output_cost_per_token_above_128k_tokens")
output_cost_per_token_above_200k_tokens = db_model_info.get("output_cost_per_token_above_200k_tokens")
output_cost_per_token_above_272k_tokens = db_model_info.get("output_cost_per_token_above_272k_tokens")
cache_read_input_token_cost_above_128k_tokens = db_model_info.get("cache_read_input_token_cost_above_128k_tokens")
cache_read_input_token_cost_above_200k_tokens = db_model_info.get("cache_read_input_token_cost_above_200k_tokens")
cache_read_input_token_cost_above_272k_tokens = db_model_info.get("cache_read_input_token_cost_above_272k_tokens")

Check for existing issues

Related: #25839, #25950, #24774, #25204, #11364

What happened?

PR #25888 (merged 2026-04-25) fixed the fallback path in Router.get_model_group_info() so that when a model is not in LiteLLM's built-in model_prices_and_context_window.json, the fallback ModelMapInfo reads input_cost_per_token / output_cost_per_token from the user's db_model_info.

The same fallback does not pass through cache pricing fields. As a result, on any model that isn't present in the built-in cost map, LiteLLM applies the user-configured input/output prices but silently drops cache_read_input_token_cost, cache_creation_input_token_cost, and the tier variants — even when those fields are set on model_info and supports_prompt_caching: true is set.

The downstream effect is the one reported in #25839, #25950, #24774, #11364: cached tokens are billed at $0 instead of the configured cache_read rate, on any custom-hosted or self-hosted model that isn't in the upstream pricing JSON.

Where in the code

litellm/router.py ~line 8088 (the block #25888 modified):

db_model_info = model.get("model_info", {})
mode = db_model_info.get("mode", "chat")
input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
# ↑ cache fields not read; ModelMapInfo gets cache_read_input_token_cost=None even
#   when db_model_info has it set.

The fix should mirror #25888 for the additional fields:

input_cost_per_token = db_model_info.get("input_cost_per_token")
output_cost_per_token = db_model_info.get("output_cost_per_token")
cache_read_input_token_cost = db_model_info.get("cache_read_input_token_cost")
cache_creation_input_token_cost = db_model_info.get("cache_creation_input_token_cost")
supports_prompt_caching = db_model_info.get("supports_prompt_caching")
# tier variants too:
input_cost_per_token_above_128k_tokens = db_model_info.get("input_cost_per_token_above_128k_tokens")
input_cost_per_token_above_200k_tokens = db_model_info.get("input_cost_per_token_above_200k_tokens")
input_cost_per_token_above_272k_tokens = db_model_info.get("input_cost_per_token_above_272k_tokens")
output_cost_per_token_above_128k_tokens = db_model_info.get("output_cost_per_token_above_128k_tokens")
output_cost_per_token_above_200k_tokens = db_model_info.get("output_cost_per_token_above_200k_tokens")
output_cost_per_token_above_272k_tokens = db_model_info.get("output_cost_per_token_above_272k_tokens")
cache_read_input_token_cost_above_128k_tokens = db_model_info.get("cache_read_input_token_cost_above_128k_tokens")
cache_read_input_token_cost_above_200k_tokens = db_model_info.get("cache_read_input_token_cost_above_200k_tokens")
cache_read_input_token_cost_above_272k_tokens = db_model_info.get("cache_read_input_token_cost_above_272k_tokens")

…and pass each through to the ModelMapInfo(...) constructor.

Expected behavior

When a model is not in the built-in cost map, ModelMapInfo returned by get_model_group_info() should reflect every pricing-related field the user set on model_info, not just input_cost_per_token and output_cost_per_token.

Actual behavior

model_map_value reaching the cost calculator has cache_read_input_token_cost: None and supports_prompt_caching: None even when both are populated in DB / /model/info. The cost calculator then bills the cached portion at $0 (no cache_read rate to apply).

Reproduction

Configure a custom-hosted model whose model_name is not present in model_prices_and_context_window.json (e.g. any vendor-org/Model-Name form not in the built-in JSON). Set model_info.input_cost_per_token, model_info.cache_read_input_token_cost, and model_info.supports_prompt_caching: true.
Send a chat completion request that produces prompt_tokens_details.cached_tokens > 0.
Inspect LiteLLM_SpendLogs.metadata.model_map_information.model_map_value. Observe cache_read_input_token_cost: null, supports_prompt_caching: null, despite the values being correctly persisted in /model/info.
Inspect cost_breakdown.input_cost. Observe cached_tokens × cache_read_input_token_cost is missing — the cached portion is billed at $0.

Severity

The same bug surfaces under several open issues (#25839, #25950, #24774, #25204, #11364) with different providers. A targeted fix at the router fallback path would fix all of them at once for self-hosted / custom-routed models, and is a small, mechanical extension of #25888.

Twitter / LinkedIn details

N/A

extent analysis

TL;DR

The issue can be fixed by modifying the Router.get_model_group_info() function to pass through cache pricing fields from the user's db_model_info to the ModelMapInfo constructor.

Guidance

Modify the litellm/router.py file at line 8088 to read cache pricing fields from db_model_info and pass them to the ModelMapInfo constructor.
Verify that the ModelMapInfo object returned by get_model_group_info() reflects the user-set pricing fields, including cache_read_input_token_cost and supports_prompt_caching.
Test the fix by reproducing the issue using the steps provided and inspecting the LiteLLM_SpendLogs.metadata.model_map_information.model_map_value to ensure that the cache pricing fields are correctly populated.
Review the related issues (#25839, #25950, #24774, #25204, #11364) to ensure that the fix resolves the problems reported in those issues.

Example

cache_read_input_token_cost = db_model_info.get("cache_read_input_token_cost")
cache_creation_input_token_cost = db_model_info.get("cache_creation_input_token_cost")
# ... (include all cache pricing fields)
ModelMapInfo(..., cache_read_input_token_cost=cache_read_input_token_cost, ...)

Notes

The fix is a mechanical extension of the changes made in PR #25888 and should resolve the issues reported in the related tickets. However, thorough testing is necessary to ensure that the fix does not introduce any new problems.

Recommendation

Apply the workaround by modifying the Router.get_model_group_info() function to pass through cache pricing fields. This fix should resolve the issues reported in the related tickets and provide the correct pricing information for custom-hosted and self-hosted models.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: PR #25888 fallback path reads input/output cost from db model_info but skips cache_read_input_token_cost / cache_creation_input_token_cost [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Check for existing issues

What happened?

Where in the code

Expected behavior

Actual behavior

Reproduction

Severity

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: PR #25888 fallback path reads input/output cost from db model_info but skips cache_read_input_token_cost / cache_creation_input_token_cost [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Check for existing issues

What happened?

Where in the code

Expected behavior

Actual behavior

Reproduction

Severity

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING