litellm - 💡(How to fix) Fix [Bug]: bedrock Anthropic: cache_creation_input_token_cost and cache_read_input_token_cost silently billed at $0

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

This is not a minor rounding error. Due to this bug, all cost tracking, budget enforcement, team spend limits, and usage-based access controls in LiteLLM are rendered ineffective for any bedrock_converse model that uses prompt caching.

Code Example

"input_cost_per_token":            0.0000033,
  "output_cost_per_token":           0.0000165,                                                                                                                                                                    
  "cache_read_input_token_cost":     3.3e-7,                                                                                                                                                                       
  "cache_creation_input_token_cost": 0.000004125

---

text_tokens:                  1
  cache_creation_input_tokens:  1190  (ephemeral_5m)                                                                                                                                                               
  cache_read_input_tokens:     32634
  completion_tokens:              672

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

What happened?

For models using the bedrock_converse provider with prompt caching enabled, both cache_creation_input_token_cost and cache_read_input_token_cost are completely ignored by the cost calculator — even when both fields are correctly present in the model map with non-zero values.

cost_breakdown.input_cost only reflects uncached text_tokens at the base input_cost_per_token rate. Cache creation tokens and cache read tokens each contribute $0.00, causing massive and silent
under-billing.

Affected versions: 1.84.1, 1.85.1, 1.86.1 , 1.86.2
Provider: bedrock_converse
Model: eu.anthropic.claude-* (sonnet, haiuku, Opos 4.5, 4.6 4.7)

Model config (as reported in model_map_information of spend logs)

"input_cost_per_token":            0.0000033,
"output_cost_per_token":           0.0000165,                                                                                                                                                                    
"cache_read_input_token_cost":     3.3e-7,                                                                                                                                                                       
"cache_creation_input_token_cost": 0.000004125

All four fields are present and correct. The calculator ignores the bottom two entirely and applies wrong rates for the top two.

Reproduction

text_tokens:                  1
cache_creation_input_tokens:  1190  (ephemeral_5m)                                                                                                                                                               
cache_read_input_tokens:     32634
completion_tokens:              672
Token typeTokensRateExpected cost
Regular input1$0.0000033$0.0000033
Cache creation1,190$0.000004125$0.004909
Cache read32,634$0.00000033$0.010769
Output672$0.0000165$0.011088
Expected total$0.026769

Reported: input_cost: $0.000003 / output_cost: $0.010080 / total: $0.010083
Under-billed by ~$0.0167 (62%)

Why adding cache costs to model_info in config.yaml does NOT fix this

The model_map_information in the spend logs already contains the correct cache cost fields with non-zero values. The issue is not missing configuration — the cost calculator simply does not read
cache_creation_input_tokens and cache_read_input_tokens when computing input_cost for bedrock_converse models.

⚠️ Critical Billing Failure — Budget Limits and Spend Controls Are Ineffective

This is not a minor rounding error. Due to this bug, all cost tracking, budget enforcement, team spend limits, and usage-based access controls in LiteLLM are rendered ineffective for any bedrock_converse model that uses prompt caching.

Concrete impact:

  • Team budgets configured in LiteLLM are not enforced — teams can vastly exceed their allocated spend without LiteLLM intervening
  • Per-API-key spend limits are ineffective for cache-heavy workloads
  • Cost dashboards and spend reports show a fraction of real costs
  • Operators have no reliable visibility into actual AWS Bedrock expenditure through LiteLLM

Enterprise Support Request

We are a LiteLLM Enterprise customer and are urgently requesting prioritized attention on this issue.

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.86.2

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING