litellm - 💡(How to fix) Fix [Bug]: Vertex AI Anthropic cost_per_token() returns negative cost when prompt caching is enabled

Root Cause

In the cost calculator, the prompt cost formula effectively does:

uncached = input_tokens - cache_creation_input_tokens - cache_read_input_tokens
prompt_cost = uncached * input_price + cache_creation * create_price + cache_read * read_price

But Anthropic's input_tokens already excludes cached tokens. So when cache_read_input_tokens (965,602) >> input_tokens (1), the uncached term is 1 - 225 - 965,602 = -965,826, and prompt_cost goes deeply negative.

The Bedrock fix in PR #15292 corrected this for bedrock_converse by including cacheWriteInputTokens in prompt_tokens before the subtraction. The same fix needs to be applied to the Vertex AI / generic Anthropic path.

Code Example

# Tested on litellm v1.82.3 — deterministic, no provider credentials required
from litellm import cost_per_token, Usage

# Usage block with high cache reads (common for long system prompts)
usage = Usage(
    input_tokens=1,                    # Anthropic's uncached input count
    output_tokens=82,
    cache_creation_input_tokens=225,
    cache_read_input_tokens=965_602,   # large cached system prompt
)

for model in [
    "anthropic/claude-opus-4-7",
    "vertex_ai/claude-opus-4-7",
    "anthropic/claude-opus-4-6",
    "anthropic/claude-sonnet-4-6",
]:
    prompt_cost, completion_cost = cost_per_token(model=model, usage_object=usage)
    total = prompt_cost + completion_cost
    print(f"{model:<40} prompt={prompt_cost:>12.6f}  completion={completion_cost:>12.6f}  total={total:>12.6f}")

---

anthropic/claude-opus-4-7                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
vertex_ai/claude-opus-4-7               prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-opus-4-6                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-sonnet-4-6              prompt=   -4.344928  completion=    0.000000  total=   -4.344928

---

uncached = input_tokens - cache_creation_input_tokens - cache_read_input_tokens
prompt_cost = uncached * input_price + cache_creation * create_price + cache_read * read_price

What happened?

cost_per_token() returns a negative total cost for Vertex AI Anthropic models (Claude Opus 4-6, 4-7, Sonnet 4-6) when cache_read_input_tokens exceeds input_tokens. The function subtracts cache_read_input_tokens from prompt_tokens before pricing, but Anthropic's input_tokens is already the uncached count — so the subtraction double-counts and drives the input cost negative.

This was fixed for Bedrock in PR #15292 (issue #15263), but the Vertex AI code path still has the bug. Related: #11364 (open, general Anthropic cache cost), #19680 (closed, overcharging variant).

In a production LiteLLM proxy deployment, this caused usage.cost in streamed SSE chunks to be negative, which downstream billing logic interpreted as a credit.

Minimal reproduction (no API call needed)

# Tested on litellm v1.82.3 — deterministic, no provider credentials required
from litellm import cost_per_token, Usage

# Usage block with high cache reads (common for long system prompts)
usage = Usage(
    input_tokens=1,                    # Anthropic's uncached input count
    output_tokens=82,
    cache_creation_input_tokens=225,
    cache_read_input_tokens=965_602,   # large cached system prompt
)

for model in [
    "anthropic/claude-opus-4-7",
    "vertex_ai/claude-opus-4-7",
    "anthropic/claude-opus-4-6",
    "anthropic/claude-sonnet-4-6",
]:
    prompt_cost, completion_cost = cost_per_token(model=model, usage_object=usage)
    total = prompt_cost + completion_cost
    print(f"{model:<40} prompt={prompt_cost:>12.6f}  completion={completion_cost:>12.6f}  total={total:>12.6f}")

Output:

anthropic/claude-opus-4-7                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
vertex_ai/claude-opus-4-7               prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-opus-4-6                prompt=   -4.344928  completion=    0.000000  total=   -4.344928
anthropic/claude-sonnet-4-6              prompt=   -4.344928  completion=    0.000000  total=   -4.344928

Expected: all positive (~$0.49 per call based on Anthropic's published pricing).

Affected models

Confirmed negative cost output for all Anthropic models with prompt caching via Vertex AI and the generic anthropic/ provider prefix:

anthropic/claude-opus-4-7
anthropic/claude-opus-4-6
anthropic/claude-sonnet-4-6
anthropic/claude-opus-4-5
vertex_ai/claude-opus-4-7
vertex_ai/claude-opus-4-6

OpenAI and Gemini models are unaffected.

Root cause

In the cost calculator, the prompt cost formula effectively does:

uncached = input_tokens - cache_creation_input_tokens - cache_read_input_tokens
prompt_cost = uncached * input_price + cache_creation * create_price + cache_read * read_price

What LiteLLM version are you on?

v1.82.3 (pinned in production). Also confirmed the bug still exists in the cost calculator logic as of current main.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Vertex AI Anthropic cost_per_token() returns negative cost when prompt caching is enabled

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What happened?

Minimal reproduction (no API call needed)

Affected models

Root cause

What LiteLLM version are you on?

Are you a ML Ops Team?

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Vertex AI Anthropic cost_per_token() returns negative cost when prompt caching is enabled

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

What happened?

Minimal reproduction (no API call needed)

Affected models

Root cause

What LiteLLM version are you on?

Are you a ML Ops Team?

Still need to ship something?

RELATED_DISCOVERY

TRENDING