litellm - 💡(How to fix) Fix [Bug]: Vertex AI Anthropic cost_per_token() returns negative cost when prompt caching is enabled

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

In the cost calculator, the prompt cost formula effectively does:

uncached = input_tokens - cache_creation_input_tokens - cache_read_input_tokens
prompt_cost = uncached * input_price + cache_creation * create_price + cache_read * read_price

But Anthropic's input_tokens already excludes cached tokens. So when cache_read_input_tokens (965,602) >> input_tokens (1), the uncached term is 1 - 225 - 965,602 = -965,826, and prompt_cost goes deeply negative.

The Bedrock fix in PR #15292 corrected this for bedrock_converse by including cacheWriteInputTokens in prompt_tokens before the subtraction. The same fix needs to be applied to the Vertex AI / generic Anthropic path.

Code Example

# Tested on litellm v1.82.3 — deterministic, no provider credentials required
from litellm import cost_per_token, Usage

# Usage block with high cache reads (common for long system prompts)
usage = Usage(
    input_tokens=1,                    # Anthropic's uncached input count
    output_tokens=82,
    cache_creation_input_tokens=225,
    cache_read_input_tokens=965_602,   # large cached system prompt
)

for model in [
    "anthropic/claude-opus-4-7",
    "vertex_ai/claude-opus-4-7",
    "anthropic/claude-opus-4-6",
    "anthropic/claude-sonnet-4-6",
]:
    prompt_cost, completion_cost = cost_per_token(model=model, usage_object=usage)
    total = prompt_cost + completion_cost
    print(f"{model:<40} prompt={prompt_cost:>12.6f}  completion={completion_cost:>12.6f}  total={total:>12.6f}")

---

anthropic/claude-opus-4-7                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
vertex_ai/claude-opus-4-7               prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-opus-4-6                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-sonnet-4-6              prompt=   -4.344928  completion=    0.000000  total=   -4.344928

---

uncached = input_tokens - cache_creation_input_tokens - cache_read_input_tokens
prompt_cost = uncached * input_price + cache_creation * create_price + cache_read * read_price
RAW_BUFFERClick to expand / collapse

What happened?

cost_per_token() returns a negative total cost for Vertex AI Anthropic models (Claude Opus 4-6, 4-7, Sonnet 4-6) when cache_read_input_tokens exceeds input_tokens. The function subtracts cache_read_input_tokens from prompt_tokens before pricing, but Anthropic's input_tokens is already the uncached count — so the subtraction double-counts and drives the input cost negative.

This was fixed for Bedrock in PR #15292 (issue #15263), but the Vertex AI code path still has the bug. Related: #11364 (open, general Anthropic cache cost), #19680 (closed, overcharging variant).

In a production LiteLLM proxy deployment, this caused usage.cost in streamed SSE chunks to be negative, which downstream billing logic interpreted as a credit.

Minimal reproduction (no API call needed)

# Tested on litellm v1.82.3 — deterministic, no provider credentials required
from litellm import cost_per_token, Usage

# Usage block with high cache reads (common for long system prompts)
usage = Usage(
    input_tokens=1,                    # Anthropic's uncached input count
    output_tokens=82,
    cache_creation_input_tokens=225,
    cache_read_input_tokens=965_602,   # large cached system prompt
)

for model in [
    "anthropic/claude-opus-4-7",
    "vertex_ai/claude-opus-4-7",
    "anthropic/claude-opus-4-6",
    "anthropic/claude-sonnet-4-6",
]:
    prompt_cost, completion_cost = cost_per_token(model=model, usage_object=usage)
    total = prompt_cost + completion_cost
    print(f"{model:<40} prompt={prompt_cost:>12.6f}  completion={completion_cost:>12.6f}  total={total:>12.6f}")

Output:

anthropic/claude-opus-4-7                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
vertex_ai/claude-opus-4-7               prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-opus-4-6                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-sonnet-4-6              prompt=   -4.344928  completion=    0.000000  total=   -4.344928

Expected: all positive (~$0.49 per call based on Anthropic's published pricing).

Affected models

Confirmed negative cost output for all Anthropic models with prompt caching via Vertex AI and the generic anthropic/ provider prefix:

  • anthropic/claude-opus-4-7
  • anthropic/claude-opus-4-6
  • anthropic/claude-sonnet-4-6
  • anthropic/claude-opus-4-5
  • vertex_ai/claude-opus-4-7
  • vertex_ai/claude-opus-4-6

OpenAI and Gemini models are unaffected.

Root cause

In the cost calculator, the prompt cost formula effectively does:

uncached = input_tokens - cache_creation_input_tokens - cache_read_input_tokens
prompt_cost = uncached * input_price + cache_creation * create_price + cache_read * read_price

But Anthropic's input_tokens already excludes cached tokens. So when cache_read_input_tokens (965,602) >> input_tokens (1), the uncached term is 1 - 225 - 965,602 = -965,826, and prompt_cost goes deeply negative.

The Bedrock fix in PR #15292 corrected this for bedrock_converse by including cacheWriteInputTokens in prompt_tokens before the subtraction. The same fix needs to be applied to the Vertex AI / generic Anthropic path.

What LiteLLM version are you on?

v1.82.3 (pinned in production). Also confirmed the bug still exists in the cost calculator logic as of current main.

Are you a ML Ops Team?

No

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING