litellm - ✅(Solved) Fix [Bug]: Fireworks AI cost calculator ignores cache token pricing [1 pull requests, 1 participants]

litellm2026-03-30 05:39:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24774•Fetched 2026-04-08 01:54:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jph00

Participants

jph00

Timeline (top)

cross-referenced ×2labeled ×2referenced ×2

Root Cause

Root cause:

Fix Action

Fixed

Fixed by PR: fix(fireworks_ai): use generic_cost_per_token to handle cache token pricing (https://github.com/BerriAI/litellm/pull/24775)

PR fix notes

PR #24775: fix(fireworks_ai): use generic_cost_per_token to handle cache token pricing

Repository: BerriAI/litellm
Author: voidborne-d
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24775

Description (problem / solution / changelog)

Summary

Fixes #24774.

The Fireworks AI cost calculator ignores cache_read_input_tokens and cache_creation_input_tokens when calculating costs. Cached tokens are billed at full input_cost_per_token instead of the discounted cache_read_input_token_cost.

Root Cause

fireworks_ai/cost_calculator.py:cost_per_token() manually computes:

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]

This ignores all cache token fields in the Usage object.

Fix

Delegate to generic_cost_per_token() — the same approach used by DeepSeek's cost calculator. generic_cost_per_token() already handles:

cache_read_input_tokens at cache_read_input_token_cost
cache_creation_input_tokens at cache_creation_input_token_cost
Prompt token details parsing (audio, image, text tokens)
Service tier pricing
Above-128k pricing tiers

The get_base_model_for_pricing() fallback for unmapped models is preserved.

Changes

litellm/llms/fireworks_ai/cost_calculator.py: Replace manual cost calculation with generic_cost_per_token() call, with fallback to parameter-based pricing for unmapped models
Added tests for cache token pricing, basic pricing, fallback pricing, and model category extraction

Testing

7 new test cases covering cache tokens, no cache tokens, unmapped model fallback, and model parameter extraction

Changed files

litellm/llms/fireworks_ai/cost_calculator.py (modified, +17/-17)
tests/test_litellm/llms/fireworks_ai/test_fireworks_cost_calculator.py (added, +123/-0)

Code Example

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]
completion_cost = usage["completion_tokens"] * model_info["output_cost_per_token"]

---

cache_read = usage.get('cache_read_input_tokens', 0) or 0
if cache_read and (cache_rate := model_info.get('cache_read_input_token_cost')):
    prompt_cost += cache_read * (cache_rate - model_info['input_cost_per_token'])

---

from litellm import cost_per_token

model = "fireworks_ai/accounts/fireworks/models/kimi-k2p5"

# Cache read cost returns 0 even though model_info has cache_read_input_token_cost=1e-07
inp, out = cost_per_token(model, prompt_tokens=0, completion_tokens=0, cache_read_input_tokens=1024)
print(inp, out)  # 0.0 0.0 — should be 0.000102

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

The Fireworks AI cost calculator in litellm/llms/fireworks_ai/cost_calculator.py doesn't account for cache_read_input_tokens or cache_creation_input_tokens when calculating costs. It only uses prompt_tokens * input_cost_per_token, even though:

The model info already has cache_read_input_token_cost correctly set (e.g. 1e-07 for kimi-k2p5)
Fireworks returns cached_tokens in the usage response
cost_per_token receives the full Usage object which contains these fields

Root cause:

fireworks_ai/cost_calculator.py:cost_per_token only calculates:

prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]
completion_cost = usage["completion_tokens"] * model_info["output_cost_per_token"]

It never reads cache_read_input_tokens or cache_creation_input_tokens from the usage block.

Expected behavior:

Cached tokens should be priced at cache_read_input_token_cost instead of input_cost_per_token. The difference should be applied:

cache_read = usage.get('cache_read_input_tokens', 0) or 0
if cache_read and (cache_rate := model_info.get('cache_read_input_token_cost')):
    prompt_cost += cache_read * (cache_rate - model_info['input_cost_per_token'])

Same for cache_creation_input_tokens / cache_creation_input_token_cost.

Steps to Reproduce

from litellm import cost_per_token

model = "fireworks_ai/accounts/fireworks/models/kimi-k2p5"

# Cache read cost returns 0 even though model_info has cache_read_input_token_cost=1e-07
inp, out = cost_per_token(model, prompt_tokens=0, completion_tokens=0, cache_read_input_tokens=1024)
print(inp, out)  # 0.0 0.0 — should be 0.000102

Relevant log output

What part of LiteLLM is this about?

SDK (litellm Python package)

What LiteLLM version are you on ?

1.82.6

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to modify the cost_per_token function in fireworks_ai/cost_calculator.py to account for cache_read_input_tokens and cache_creation_input_tokens. Here are the steps:

Update the cost_per_token function to calculate the cost of cached tokens:

def cost_per_token(usage, model_info):
    prompt_cost = usage["prompt_tokens"] * model_info["input_cost_per_token"]
    completion_cost = usage["completion_tokens"] * model_info["output_cost_per_token"]

    # Calculate cache read cost
    cache_read = usage.get('cache_read_input_tokens', 0) or 0
    if cache_read and (cache_rate := model_info.get('cache_read_input_token_cost')):
        prompt_cost += cache_read * (cache_rate - model_info['input_cost_per_token'])

    # Calculate cache creation cost
    cache_creation = usage.get('cache_creation_input_tokens', 0) or 0
    if cache_creation and (cache_creation_rate := model_info.get('cache_creation_input_token_cost')):
        prompt_cost += cache_creation * (cache_creation_rate - model_info['input_cost_per_token'])

    return prompt_cost, completion_cost

Ensure that the model_info dictionary contains the correct values for cache_read_input_token_cost and cache_creation_input_token_cost.

Verification

To verify that the fix worked, you can use the following test case:

from litellm import cost_per_token

model = "fireworks_ai/accounts/fireworks/models/kimi-k2p5"

inp, out = cost_per_token(model, prompt_tokens=0, completion_tokens=0, cache_read_input_tokens=1024)
print(inp, out)  # Should print the correct cache read cost

Extra Tips

Make sure to update the model_info dictionary with the correct values for cache_read_input_token_cost and cache_creation_input_token_cost for each model.
Consider adding additional logging or error handling to ensure that the cost_per_token function is working correctly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: Fireworks AI cost calculator ignores cache token pricing [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24775: fix(fireworks_ai): use generic_cost_per_token to handle cache token pricing

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Changes

Testing

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: Fireworks AI cost calculator ignores cache token pricing [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24775: fix(fireworks_ai): use generic_cost_per_token to handle cache token pricing

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Changes

Testing

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING