litellm - ✅(Solved) Fix [Bug]: Cached prompt tokens billed as regular input in custom pricing cost path [2 pull requests, 1 participants]

litellm2026-04-29 20:20:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#26807•Fetched 2026-04-30 06:19:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

GabrielZirondi

Participants

GabrielZirondi

Timeline (top)

labeled ×3cross-referenced ×2subscribed ×1

Fix Action

Fixed

Fixed by PR: fix: price cached tokens in custom cost calculator (https://github.com/BerriAI/litellm/pull/26811)
Fixed by PR: Fix/cached token custom cost (https://github.com/BerriAI/litellm/pull/26816)

PR fix notes

PR #26811: fix: price cached tokens in custom cost calculator

Repository: BerriAI/litellm
Author: GabrielZirondi
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/26811

Description (problem / solution / changelog)

Relevant issues

Fixes #26807

Linear ticket

N/A

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Reproduced the bug with completion_cost(...) using custom_cost_per_token and prompt_tokens_details.cached_tokens.

Before the fix, cached prompt tokens were billed as regular input tokens:

prompt_tokens = 6074 cached_tokens = 3456 completion_tokens = 285

input_cost_per_token = 0.0000025 cache_read_input_token_cost = 0.00000025 output_cost_per_token = 0.000015

Actual before fix: total_cost = 0.01946

[!NOTE] Medium Risk Changes core cost calculation paths and breakdown reporting for custom pricing, which can affect billing accuracy across cache/tier/threshold combinations if edge cases are missed.

Overview Custom custom_cost_per_token pricing now supports prompt caching: when cache-related keys are provided, the calculator routes through the same token-cost logic as the model cost map, so cached reads and cache creation are billed at their dedicated rates (including service tier suffixes like _flex/_priority and above-threshold pricing).

Cost breakdown logging is updated to compute cache_read_cost/cache_creation_cost consistently (including custom pricing) and to subtract those amounts from the reported input_cost so raw input processing vs cache costs are split correctly. Extensive new tests cover cached-token pricing with explicit custom pricing, service tiers, above-threshold pricing, cache-creation token details, and breakdown behavior for custom_openai/... model-map entries.

<sup>Reviewed by Cursor Bugbot for commit fe9504d9835ac66dcd98db292ab80eb28e7ffa9b. Bugbot is set up for automated code reviews on this repo. Configure here.</sup>

Changed files

litellm/cost_calculator.py (modified, +166/-30)
litellm/litellm_core_utils/llm_cost_calc/utils.py (modified, +103/-44)
tests/test_litellm/test_cost_calculator.py (modified, +389/-1)

PR #26816: Fix/cached token custom cost

Repository: BerriAI/litellm
Author: GabrielZirondi
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/26816

Description (problem / solution / changelog)

Relevant issues

Fixes #26807

Linear ticket

N/A

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem
I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Screenshots / Proof of Fix

Reproduced the bug with completion_cost(...) using custom_cost_per_token and prompt_tokens_details.cached_tokens.

Before the fix, cached prompt tokens were billed as regular input tokens:

prompt_tokens = 6074
cached_tokens = 3456
completion_tokens = 285

input_cost_per_token = 0.0000025
cache_read_input_token_cost = 0.00000025
output_cost_per_token = 0.000015

Actual before fix:
total_cost = 0.01946

## Changed files

- `litellm/cost_calculator.py` (modified, +166/-30)
- `litellm/litellm_core_utils/llm_cost_calc/utils.py` (modified, +103/-44)
- `tests/test_litellm/test_cost_calculator.py` (modified, +389/-1)

Code Example

import litellm
from litellm.types.utils import ModelResponse, PromptTokensDetailsWrapper, Usage

usage = Usage(
    prompt_tokens=6074,
    completion_tokens=285,
    total_tokens=6359,
    prompt_tokens_details=PromptTokensDetailsWrapper(
        cached_tokens=3456,
        audio_tokens=0,
    ),
)

response = ModelResponse(
    id="test-id",
    created=1234567890,
    model="openai/gpt-5.4",
    object="chat.completion",
    choices=[],
    usage=usage,
)

cost = litellm.completion_cost(
    completion_response=response,
    model="openai/gpt-5.4",
    custom_llm_provider="openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

print(cost)

---

0.01946

---

0.011684

---

cost = litellm.completion_cost(
    completion_response=response,
    model="custom_openai/openai/gpt-5.4",
    custom_llm_provider="custom_openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When completion_cost receives custom token pricing that includes cache_read_input_token_cost, cached prompt tokens are still billed at input_cost_per_token.

On current main, the normal model-cost-map path works. The failing path is the custom pricing shortcut, which returns before the generic cache-aware calculator runs.

Older LiteLLM versions may also show this through dashboard/DB-created models if custom pricing registration drops cache pricing fields.

Steps to Reproduce

Run this minimal Python reproduction:

import litellm
from litellm.types.utils import ModelResponse, PromptTokensDetailsWrapper, Usage

usage = Usage(
    prompt_tokens=6074,
    completion_tokens=285,
    total_tokens=6359,
    prompt_tokens_details=PromptTokensDetailsWrapper(
        cached_tokens=3456,
        audio_tokens=0,
    ),
)

response = ModelResponse(
    id="test-id",
    created=1234567890,
    model="openai/gpt-5.4",
    object="chat.completion",
    choices=[],
    usage=usage,
)

cost = litellm.completion_cost(
    completion_response=response,
    model="openai/gpt-5.4",
    custom_llm_provider="openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

print(cost)

The returned cost is:

0.01946

The expected cost is:

0.011684

The same issue reproduces when using the custom OpenAI-prefixed model name:

cost = litellm.completion_cost(
    completion_response=response,
    model="custom_openai/openai/gpt-5.4",
    custom_llm_provider="custom_openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

This reproduces only when pricing is passed through the custom pricing shortcut. The normal model cost map path on current main applies cached token pricing correctly.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.3

Twitter / LinkedIn details

https://www.linkedin.com/in/gabrielzirondi/

extent analysis

TL;DR

The issue can be fixed by modifying the custom pricing shortcut to apply the cache_read_input_token_cost to cached prompt tokens.

Guidance

Review the custom pricing shortcut logic to ensure it correctly applies the cache_read_input_token_cost to cached prompt tokens.
Verify that the cache_read_input_token_cost is being passed correctly to the pricing calculation function.
Check if the issue is specific to the custom_openai provider or if it affects other custom providers as well.
Consider updating the litellm.completion_cost function to handle custom pricing shortcuts and cache-aware calculations consistently.

Example

# Example of how the custom pricing shortcut could be modified
def custom_pricing_shortcut(completion_response, custom_cost_per_token):
    # ...
    if 'cache_read_input_token_cost' in custom_cost_per_token:
        cached_tokens = completion_response.usage.prompt_tokens_details.cached_tokens
        cost += cached_tokens * custom_cost_per_token['cache_read_input_token_cost']
    # ...

Notes

The issue seems to be specific to the custom pricing shortcut and does not affect the normal model cost map path. The provided reproduction code and expected cost calculation suggest that the issue is with the application of the cache_read_input_token_cost to cached prompt tokens.

Recommendation

Apply workaround: Modify the custom pricing shortcut to correctly apply the cache_read_input_token_cost to cached prompt tokens, as shown in the example code snippet. This should fix the issue without requiring an upgrade to a newer version of LiteLLM.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#container setup #orchestration issue #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Cached prompt tokens billed as regular input in custom pricing cost path [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #26811: fix: price cached tokens in custom cost calculator

Description (problem / solution / changelog)

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Changed files

PR #26816: Fix/cached token custom cost

Description (problem / solution / changelog)

Relevant issues

Linear ticket

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING