litellm - ✅(Solved) Fix [Bug]: For preventing double count tokens for models like haiku [1 pull requests, 1 comments, 1 participants]

litellm2026-03-25 15:12:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24574•Fetched 2026-04-08 01:32:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Himanshupdt09

Participants

Himanshupdt09

Timeline (top)

labeled ×3commented ×1cross-referenced ×1mentioned ×1

Fix Action

Fixed

Fixed by PR: Fix for preventing double count tokens for models like haiku (https://github.com/BerriAI/litellm/pull/24573)

PR fix notes

PR #24573: Fix for preventing double count tokens for models like haiku

Repository: BerriAI/litellm
Author: Himanshupdt09
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24573

Description (problem / solution / changelog)

Relevant issues #24574

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

[ *] I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
[ *] My PR passes all unit tests on make test-unit
[* ] My PR's scope is as isolated as possible, it only solves 1 specific problem
[ *] I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

50-55 passing tests: main is stable with minor issues.

45-49 passing tests: acceptable but needs attention

<= 40 passing tests: unstable; be careful with your merges and assess the risk.

Branch creation CI run
Link:
CI run for the last commit
Link:
Merge / cherry-pick CI run
Links:

Type

Bug Fixes

Changes

Remove redundant check for text_tokens equals 0 that. was stopping from preventing duplicate counting of reasoning or audio or image tokens in text_tokens like in - "usage_object": { "total_tokens": 46190, "prompt_tokens": 45731, "completion_tokens": 459, "prompt_tokens_details": { "text_tokens": null, "audio_tokens": null, "image_tokens": null, "cached_tokens": 41305, "cache_creation_tokens": 0 }, "cache_read_input_tokens": 41305, "completion_tokens_details": { "text_tokens": 459, "audio_tokens": null, "image_tokens": null, "reasoning_tokens": 20, "accepted_prediction_tokens": null, "rejected_prediction_tokens": null }, here as text_tokens is not 0 it doesn't go to check double counting before and counts text_tokens as 459 but it should be 459-20=439 , this is an example of haiku model usage object, without this change it would charge reasoning tokens with reasoning_cost_per_token and output_cost_per_token both.

Changed files

litellm/litellm_core_utils/llm_cost_calc/utils.py (modified, +17/-15)
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_llm_cost_calc_utils.py (modified, +1/-1)
tests/test_litellm/litellm_core_utils/llm_cost_calc/test_utils.py (added, +57/-0)

Code Example

"total_tokens": 46190,
"prompt_tokens": 45731,
"completion_tokens": 459,
"prompt_tokens_details": {
"text_tokens": null,
"audio_tokens": null,
"image_tokens": null,
"cached_tokens": 41305,
"cache_creation_tokens": 0
},
"cache_read_input_tokens": 41305,
"completion_tokens_details": {
"text_tokens": 459,
"audio_tokens": null,
"image_tokens": null,
"reasoning_tokens": 20,
"accepted_prediction_tokens": null,
"rejected_prediction_tokens": null
},

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

A bug happened! For anthropic models like haiku "usage_object": { "total_tokens": 46190, "prompt_tokens": 45731, "completion_tokens": 459, "prompt_tokens_details": { "text_tokens": null, "audio_tokens": null, "image_tokens": null, "cached_tokens": 41305, "cache_creation_tokens": 0 }, "cache_read_input_tokens": 41305, "completion_tokens_details": { "text_tokens": 459, "audio_tokens": null, "image_tokens": null, "reasoning_tokens": 20, "accepted_prediction_tokens": null, "rejected_prediction_tokens": null }, it gives us reasoning tokens embedded in text tokens leading to double counting of reasoning cost leading to inaccurate costs for output tokens .

Steps to Reproduce

Using haiku 4.5 and spend logs enable , make a model call via api or sdk , you will get reasoning tokens embedded in text tokens i.e text_tokens= completion_tokens

Relevant log output

"total_tokens": 46190,
"prompt_tokens": 45731,
"completion_tokens": 459,
"prompt_tokens_details": {
"text_tokens": null,
"audio_tokens": null,
"image_tokens": null,
"cached_tokens": 41305,
"cache_creation_tokens": 0
},
"cache_read_input_tokens": 41305,
"completion_tokens_details": {
"text_tokens": 459,
"audio_tokens": null,
"image_tokens": null,
"reasoning_tokens": 20,
"accepted_prediction_tokens": null,
"rejected_prediction_tokens": null
},

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of double counting reasoning tokens embedded in text tokens, we need to modify the logic for calculating text_tokens and reasoning_tokens in the completion_tokens_details section.

Step-by-Step Solution

Update the calculation logic: Modify the code to subtract reasoning_tokens from text_tokens to avoid double counting.
Adjust the API response: Ensure the API returns the corrected text_tokens value.

Example Code Snippet

def calculate_text_tokens(completion_tokens_details):
    text_tokens = completion_tokens_details['text_tokens']
    reasoning_tokens = completion_tokens_details['reasoning_tokens']
    corrected_text_tokens = text_tokens - reasoning_tokens
    return corrected_text_tokens

# Example usage:
completion_tokens_details = {
    "text_tokens": 459,
    "audio_tokens": None,
    "image_tokens": None,
    "reasoning_tokens": 20,
    "accepted_prediction_tokens": None,
    "rejected_prediction_tokens": None
}

corrected_text_tokens = calculate_text_tokens(completion_tokens_details)
print(corrected_text_tokens)  # Output: 439

Verification

To verify the fix, make a model call via API or SDK and check the response. The text_tokens value should now be accurate, without double counting the reasoning tokens.

Extra Tips

Ensure to update the documentation to reflect the corrected calculation logic.
Consider adding tests to verify the correctness of the text_tokens calculation.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #task chaining #parallel task #integration issue #index setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.