litellm - ✅(Solved) Fix [Bug]: Cached prompt tokens billed as regular input in custom pricing cost path [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#26807Fetched 2026-04-30 06:19:39
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×2subscribed ×1

Fix Action

Fixed

PR fix notes

PR #26811: fix: price cached tokens in custom cost calculator

Description (problem / solution / changelog)

Relevant issues

Fixes #26807

Linear ticket

N/A

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Reproduced the bug with completion_cost(...) using custom_cost_per_token and prompt_tokens_details.cached_tokens.

Before the fix, cached prompt tokens were billed as regular input tokens:

prompt_tokens = 6074 cached_tokens = 3456 completion_tokens = 285

input_cost_per_token = 0.0000025 cache_read_input_token_cost = 0.00000025 output_cost_per_token = 0.000015

Actual before fix: total_cost = 0.01946

<!-- CURSOR_SUMMARY -->

[!NOTE] Medium Risk Changes core cost calculation paths and breakdown reporting for custom pricing, which can affect billing accuracy across cache/tier/threshold combinations if edge cases are missed.

Overview Custom custom_cost_per_token pricing now supports prompt caching: when cache-related keys are provided, the calculator routes through the same token-cost logic as the model cost map, so cached reads and cache creation are billed at their dedicated rates (including service tier suffixes like _flex/_priority and above-threshold pricing).

Cost breakdown logging is updated to compute cache_read_cost/cache_creation_cost consistently (including custom pricing) and to subtract those amounts from the reported input_cost so raw input processing vs cache costs are split correctly. Extensive new tests cover cached-token pricing with explicit custom pricing, service tiers, above-threshold pricing, cache-creation token details, and breakdown behavior for custom_openai/... model-map entries.

<sup>Reviewed by Cursor Bugbot for commit fe9504d9835ac66dcd98db292ab80eb28e7ffa9b. Bugbot is set up for automated code reviews on this repo. Configure here.</sup>

<!-- /CURSOR_SUMMARY -->

Changed files

  • litellm/cost_calculator.py (modified, +166/-30)
  • litellm/litellm_core_utils/llm_cost_calc/utils.py (modified, +103/-44)
  • tests/test_litellm/test_cost_calculator.py (modified, +389/-1)

PR #26816: Fix/cached token custom cost

Description (problem / solution / changelog)

Relevant issues

Fixes #26807

Linear ticket

N/A

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

Reproduced the bug with completion_cost(...) using custom_cost_per_token and prompt_tokens_details.cached_tokens.

Before the fix, cached prompt tokens were billed as regular input tokens:

prompt_tokens = 6074
cached_tokens = 3456
completion_tokens = 285

input_cost_per_token = 0.0000025
cache_read_input_token_cost = 0.00000025
output_cost_per_token = 0.000015

Actual before fix:
total_cost = 0.01946

## Changed files

- `litellm/cost_calculator.py` (modified, +166/-30)
- `litellm/litellm_core_utils/llm_cost_calc/utils.py` (modified, +103/-44)
- `tests/test_litellm/test_cost_calculator.py` (modified, +389/-1)

Code Example

import litellm
from litellm.types.utils import ModelResponse, PromptTokensDetailsWrapper, Usage

usage = Usage(
    prompt_tokens=6074,
    completion_tokens=285,
    total_tokens=6359,
    prompt_tokens_details=PromptTokensDetailsWrapper(
        cached_tokens=3456,
        audio_tokens=0,
    ),
)

response = ModelResponse(
    id="test-id",
    created=1234567890,
    model="openai/gpt-5.4",
    object="chat.completion",
    choices=[],
    usage=usage,
)

cost = litellm.completion_cost(
    completion_response=response,
    model="openai/gpt-5.4",
    custom_llm_provider="openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

print(cost)

---

0.01946

---

0.011684

---

cost = litellm.completion_cost(
    completion_response=response,
    model="custom_openai/openai/gpt-5.4",
    custom_llm_provider="custom_openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When completion_cost receives custom token pricing that includes cache_read_input_token_cost, cached prompt tokens are still billed at input_cost_per_token.

On current main, the normal model-cost-map path works. The failing path is the custom pricing shortcut, which returns before the generic cache-aware calculator runs.

Older LiteLLM versions may also show this through dashboard/DB-created models if custom pricing registration drops cache pricing fields.

Steps to Reproduce

  1. Run this minimal Python reproduction:
import litellm
from litellm.types.utils import ModelResponse, PromptTokensDetailsWrapper, Usage

usage = Usage(
    prompt_tokens=6074,
    completion_tokens=285,
    total_tokens=6359,
    prompt_tokens_details=PromptTokensDetailsWrapper(
        cached_tokens=3456,
        audio_tokens=0,
    ),
)

response = ModelResponse(
    id="test-id",
    created=1234567890,
    model="openai/gpt-5.4",
    object="chat.completion",
    choices=[],
    usage=usage,
)

cost = litellm.completion_cost(
    completion_response=response,
    model="openai/gpt-5.4",
    custom_llm_provider="openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

print(cost)
  1. The returned cost is:
0.01946
  1. The expected cost is:
0.011684
  1. The same issue reproduces when using the custom OpenAI-prefixed model name:
cost = litellm.completion_cost(
    completion_response=response,
    model="custom_openai/openai/gpt-5.4",
    custom_llm_provider="custom_openai",
    custom_cost_per_token={
        "input_cost_per_token": 0.0000025,
        "output_cost_per_token": 0.000015,
        "cache_read_input_token_cost": 0.00000025,
    },
)

This reproduces only when pricing is passed through the custom pricing shortcut. The normal model cost map path on current main applies cached token pricing correctly.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.3

Twitter / LinkedIn details

https://www.linkedin.com/in/gabrielzirondi/

extent analysis

TL;DR

The issue can be fixed by modifying the custom pricing shortcut to apply the cache_read_input_token_cost to cached prompt tokens.

Guidance

  • Review the custom pricing shortcut logic to ensure it correctly applies the cache_read_input_token_cost to cached prompt tokens.
  • Verify that the cache_read_input_token_cost is being passed correctly to the pricing calculation function.
  • Check if the issue is specific to the custom_openai provider or if it affects other custom providers as well.
  • Consider updating the litellm.completion_cost function to handle custom pricing shortcuts and cache-aware calculations consistently.

Example

# Example of how the custom pricing shortcut could be modified
def custom_pricing_shortcut(completion_response, custom_cost_per_token):
    # ...
    if 'cache_read_input_token_cost' in custom_cost_per_token:
        cached_tokens = completion_response.usage.prompt_tokens_details.cached_tokens
        cost += cached_tokens * custom_cost_per_token['cache_read_input_token_cost']
    # ...

Notes

The issue seems to be specific to the custom pricing shortcut and does not affect the normal model cost map path. The provided reproduction code and expected cost calculation suggest that the issue is with the application of the cache_read_input_token_cost to cached prompt tokens.

Recommendation

Apply workaround: Modify the custom pricing shortcut to correctly apply the cache_read_input_token_cost to cached prompt tokens, as shown in the example code snippet. This should fix the issue without requiring an upgrade to a newer version of LiteLLM.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Cached prompt tokens billed as regular input in custom pricing cost path [2 pull requests, 1 participants]