litellm - 💡(How to fix) Fix [Feature]: Add 1-hour cache write pricing tier for Vertex AI Anthropic models (vertex_ai/claude-sonnet-4-6, vertex_ai/claude-haiku-4-5)

litellm2026-05-12 22:29:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues.

The Feature

Add cache_creation_input_token_cost_above_1hr to the Vertex AI Anthropic model entries in model_prices_and_context_window.json (and the _backup.json mirror). GCP/Vertex publishes two cache write tiers for Claude on Vertex; LiteLLM currently only stores the 5-minute tier, which causes systematic undercounting of spend for any request that uses cache_control: {\"ttl\": \"1h\"}.

Models confirmed affected:

vertex_ai/claude-sonnet-4-6 — currently has only cache_creation_input_token_cost: 3.75e-06; missing 1h tier (6e-06 per GCP)
vertex_ai/claude-haiku-4-5 — currently has only cache_creation_input_token_cost: 1.25e-06; missing 1h tier (2e-06 per GCP)
vertex_ai/claude-haiku-4-5@20251001 (pinned variant) — same gap

GCP source: https://cloud.google.com/gemini-enterprise-agent-platform/generative-ai/pricing#partner-models

Motivation, pitch

Customer context: While a customer was validating LiteLLM-tracked spend against GCP Billing for a single GCP project on May 11, 2026, their backend data (LiteLLM_DailyUserSpend SQL) showed $0.99 for the day vs GCP's $1.10 after savings — a ~10% delta. After ruling out the UI date-window bug (see companion issue #27780), the remaining delta is best explained by the missing 1-hour cache write tier: GCP charges $6.00/M tokens for 1h cache writes vs $3.75/M for 5m, so any traffic with ttl: \"1h\" is undercounted by $2.25/M. Customer specifically noticed the missing fields when comparing the LiteLLM entry against the GCP pricing page.

Implementation notes (already verified)

The runtime side already supports this — only the price registry is missing data:

VertexAIAnthropicConfig extends AnthropicConfig, so the transformation that populates ephemeral_1h_input_tokens in the usage object (litellm/llms/anthropic/chat/transformation.py:1897) already runs for Vertex.
generic_cost_per_token → _calculate_cache_creation_cost (litellm/litellm_core_utils/llm_cost_calc/utils.py) already reads cache_creation_input_token_cost_above_1hr and applies it to those tokens.
ModelInfo typed dict and registry loader (litellm/utils.py:5777) already pass the field through.
Bedrock Anthropic models use this exact pattern today and are covered by tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py, which also enforces the 1.6× ratio between 5m and 1h tiers (matches GCP's $6.00 / $3.75 ratio for Sonnet 4.6).

So the change is:

Add cache_creation_input_token_cost_above_1hr to the Sonnet 4.6 and Haiku 4.5 entries (and any pinned @YYYYMMDD / regional variants) in both model_prices_and_context_window.json and model_prices_and_context_window_backup.json.
Optionally extend the existing 1h-cache-pricing test to also cover Vertex AI Anthropic models.

What part of LiteLLM is this about?

SDK (litellm Python package) — model pricing registry. Affects both SDK callers and proxy spend tracking.

LiteLLM hiring

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Feature]: Add 1-hour cache write pricing tier for Vertex AI Anthropic models (vertex_ai/claude-sonnet-4-6, vertex_ai/claude-haiku-4-5)

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

The Feature

Motivation, pitch

Implementation notes (already verified)

What part of LiteLLM is this about?

LiteLLM hiring

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Feature]: Add 1-hour cache write pricing tier for Vertex AI Anthropic models (vertex_ai/claude-sonnet-4-6, vertex_ai/claude-haiku-4-5)

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

The Feature

Motivation, pitch

Implementation notes (already verified)

What part of LiteLLM is this about?

LiteLLM hiring

Still need to ship something?

RELATED_DISCOVERY

TRENDING