litellm - 💡(How to fix) Fix [Feature]: Add 1-hour cache write pricing tier for Vertex AI Anthropic models (vertex_ai/claude-sonnet-4-6, vertex_ai/claude-haiku-4-5)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues.

The Feature

Add cache_creation_input_token_cost_above_1hr to the Vertex AI Anthropic model entries in model_prices_and_context_window.json (and the _backup.json mirror). GCP/Vertex publishes two cache write tiers for Claude on Vertex; LiteLLM currently only stores the 5-minute tier, which causes systematic undercounting of spend for any request that uses cache_control: {\"ttl\": \"1h\"}.

Models confirmed affected:

  • vertex_ai/claude-sonnet-4-6 — currently has only cache_creation_input_token_cost: 3.75e-06; missing 1h tier (6e-06 per GCP)
  • vertex_ai/claude-haiku-4-5 — currently has only cache_creation_input_token_cost: 1.25e-06; missing 1h tier (2e-06 per GCP)
  • vertex_ai/claude-haiku-4-5@20251001 (pinned variant) — same gap

GCP source: https://cloud.google.com/gemini-enterprise-agent-platform/generative-ai/pricing#partner-models

Motivation, pitch

Customer context: While a customer was validating LiteLLM-tracked spend against GCP Billing for a single GCP project on May 11, 2026, their backend data (LiteLLM_DailyUserSpend SQL) showed $0.99 for the day vs GCP's $1.10 after savings — a ~10% delta. After ruling out the UI date-window bug (see companion issue #27780), the remaining delta is best explained by the missing 1-hour cache write tier: GCP charges $6.00/M tokens for 1h cache writes vs $3.75/M for 5m, so any traffic with ttl: \"1h\" is undercounted by $2.25/M. Customer specifically noticed the missing fields when comparing the LiteLLM entry against the GCP pricing page.

Implementation notes (already verified)

The runtime side already supports this — only the price registry is missing data:

  • VertexAIAnthropicConfig extends AnthropicConfig, so the transformation that populates ephemeral_1h_input_tokens in the usage object (litellm/llms/anthropic/chat/transformation.py:1897) already runs for Vertex.
  • generic_cost_per_token_calculate_cache_creation_cost (litellm/litellm_core_utils/llm_cost_calc/utils.py) already reads cache_creation_input_token_cost_above_1hr and applies it to those tokens.
  • ModelInfo typed dict and registry loader (litellm/utils.py:5777) already pass the field through.
  • Bedrock Anthropic models use this exact pattern today and are covered by tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py, which also enforces the 1.6× ratio between 5m and 1h tiers (matches GCP's $6.00 / $3.75 ratio for Sonnet 4.6).

So the change is:

  1. Add cache_creation_input_token_cost_above_1hr to the Sonnet 4.6 and Haiku 4.5 entries (and any pinned @YYYYMMDD / regional variants) in both model_prices_and_context_window.json and model_prices_and_context_window_backup.json.
  2. Optionally extend the existing 1h-cache-pricing test to also cover Vertex AI Anthropic models.

What part of LiteLLM is this about?

SDK (litellm Python package) — model pricing registry. Affects both SDK callers and proxy spend tracking.

LiteLLM hiring

No

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Feature]: Add 1-hour cache write pricing tier for Vertex AI Anthropic models (vertex_ai/claude-sonnet-4-6, vertex_ai/claude-haiku-4-5)