litellm - ✅(Solved) Fix [Bug]: Fireworks AI cost calculator ignores cache token pricing [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24774Fetched 2026-04-08 01:54:08
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2labeled ×2referenced ×2

Root Cause

Root cause:

Fix Action

Fixed

PR fix notes

PR #24775: fix(fireworks_ai): use generic_cost_per_token to handle cache token pricing

Description (problem / solution / changelog)

Summary

Fixes #24774.

The Fireworks AI cost calculator ignores cache_read_input_tokens and cache_creation_input_tokens when calculating costs. Cached tokens are billed at full input_cost_per_token instead of the discounted cache_read_input_token_cost.

Root Cause

fireworks_ai/cost_calculator.py:cost_per_token() manually computes:

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]

This ignores all cache token fields in the Usage object.

Fix

Delegate to generic_cost_per_token() — the same approach used by DeepSeek's cost calculator. generic_cost_per_token() already handles:

  • cache_read_input_tokens at cache_read_input_token_cost
  • cache_creation_input_tokens at cache_creation_input_token_cost
  • Prompt token details parsing (audio, image, text tokens)
  • Service tier pricing
  • Above-128k pricing tiers

The get_base_model_for_pricing() fallback for unmapped models is preserved.

Changes

  • litellm/llms/fireworks_ai/cost_calculator.py: Replace manual cost calculation with generic_cost_per_token() call, with fallback to parameter-based pricing for unmapped models
  • Added tests for cache token pricing, basic pricing, fallback pricing, and model category extraction

Testing

  • 7 new test cases covering cache tokens, no cache tokens, unmapped model fallback, and model parameter extraction

Changed files

  • litellm/llms/fireworks_ai/cost_calculator.py (modified, +17/-17)
  • tests/test_litellm/llms/fireworks_ai/test_fireworks_cost_calculator.py (added, +123/-0)

Code Example

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]
completion_cost = usage["completion_tokens"] * model_info["output_cost_per_token"]

---

cache_read = usage.get('cache_read_input_tokens', 0) or 0
if cache_read and (cache_rate := model_info.get('cache_read_input_token_cost')):
    prompt_cost += cache_read * (cache_rate - model_info['input_cost_per_token'])

---

from litellm import cost_per_token

model = "fireworks_ai/accounts/fireworks/models/kimi-k2p5"

# Cache read cost returns 0 even though model_info has cache_read_input_token_cost=1e-07
inp, out = cost_per_token(model, prompt_tokens=0, completion_tokens=0, cache_read_input_tokens=1024)
print(inp, out)  # 0.0 0.0 — should be 0.000102

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The Fireworks AI cost calculator in litellm/llms/fireworks_ai/cost_calculator.py doesn't account for cache_read_input_tokens or cache_creation_input_tokens when calculating costs. It only uses prompt_tokens * input_cost_per_token, even though:

  1. The model info already has cache_read_input_token_cost correctly set (e.g. 1e-07 for kimi-k2p5)
  2. Fireworks returns cached_tokens in the usage response
  3. cost_per_token receives the full Usage object which contains these fields

Root cause:

fireworks_ai/cost_calculator.py:cost_per_token only calculates:

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]
completion_cost = usage["completion_tokens"] * model_info["output_cost_per_token"]

It never reads cache_read_input_tokens or cache_creation_input_tokens from the usage block.

Expected behavior:

Cached tokens should be priced at cache_read_input_token_cost instead of input_cost_per_token. The difference should be applied:

cache_read = usage.get('cache_read_input_tokens', 0) or 0
if cache_read and (cache_rate := model_info.get('cache_read_input_token_cost')):
    prompt_cost += cache_read * (cache_rate - model_info['input_cost_per_token'])

Same for cache_creation_input_tokens / cache_creation_input_token_cost.

Steps to Reproduce

from litellm import cost_per_token

model = "fireworks_ai/accounts/fireworks/models/kimi-k2p5"

# Cache read cost returns 0 even though model_info has cache_read_input_token_cost=1e-07
inp, out = cost_per_token(model, prompt_tokens=0, completion_tokens=0, cache_read_input_tokens=1024)
print(inp, out)  # 0.0 0.0 — should be 0.000102

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.82.6

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to modify the cost_per_token function in fireworks_ai/cost_calculator.py to account for cache_read_input_tokens and cache_creation_input_tokens. Here are the steps:

  • Update the cost_per_token function to calculate the cost of cached tokens:
def cost_per_token(usage, model_info):
    prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]
    completion_cost = usage["completion_tokens"] * model_info["output_cost_per_token"]

    # Calculate cache read cost
    cache_read = usage.get('cache_read_input_tokens', 0) or 0
    if cache_read and (cache_rate := model_info.get('cache_read_input_token_cost')):
        prompt_cost += cache_read * (cache_rate - model_info['input_cost_per_token'])

    # Calculate cache creation cost
    cache_creation = usage.get('cache_creation_input_tokens', 0) or 0
    if cache_creation and (cache_creation_rate := model_info.get('cache_creation_input_token_cost')):
        prompt_cost += cache_creation * (cache_creation_rate - model_info['input_cost_per_token'])

    return prompt_cost, completion_cost
  • Ensure that the model_info dictionary contains the correct values for cache_read_input_token_cost and cache_creation_input_token_cost.

Verification

To verify that the fix worked, you can use the following test case:

from litellm import cost_per_token

model = "fireworks_ai/accounts/fireworks/models/kimi-k2p5"

inp, out = cost_per_token(model, prompt_tokens=0, completion_tokens=0, cache_read_input_tokens=1024)
print(inp, out)  # Should print the correct cache read cost

Extra Tips

  • Make sure to update the model_info dictionary with the correct values for cache_read_input_token_cost and cache_creation_input_token_cost for each model.
  • Consider adding additional logging or error handling to ensure that the cost_per_token function is working correctly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING