litellm - ✅(Solved) Fix [Bug]: For preventing double count tokens for models like haiku [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24574Fetched 2026-04-08 01:32:38
View on GitHub
Comments
1
Participants
1
Timeline
7
Reactions
0
Participants
Timeline (top)
labeled ×3commented ×1cross-referenced ×1mentioned ×1

Fix Action

Fixed

PR fix notes

PR #24573: Fix for preventing double count tokens for models like haiku

Description (problem / solution / changelog)

Relevant issues #24574

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • [ *] I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • [ *] My PR passes all unit tests on make test-unit
  • [* ] My PR's scope is as isolated as possible, it only solves 1 specific problem
  • [ *] I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

<!-- Select the type of Pull Request --> Bug Fixes

Changes

Remove redundant check for text_tokens equals 0 that. was stopping from preventing duplicate counting of reasoning or audio or image tokens in text_tokens like in - "usage_object": { "total_tokens": 46190, "prompt_tokens": 45731, "completion_tokens": 459, "prompt_tokens_details": { "text_tokens": null, "audio_tokens": null, "image_tokens": null, "cached_tokens": 41305, "cache_creation_tokens": 0 }, "cache_read_input_tokens": 41305, "completion_tokens_details": { "text_tokens": 459, "audio_tokens": null, "image_tokens": null, "reasoning_tokens": 20, "accepted_prediction_tokens": null, "rejected_prediction_tokens": null }, here as text_tokens is not 0 it doesn't go to check double counting before and counts text_tokens as 459 but it should be 459-20=439 , this is an example of haiku model usage object, without this change it would charge reasoning tokens with reasoning_cost_per_token and output_cost_per_token both.

Changed files

  • litellm/litellm_core_utils/llm_cost_calc/utils.py (modified, +17/-15)
  • tests/test_litellm/litellm_core_utils/llm_cost_calc/test_llm_cost_calc_utils.py (modified, +1/-1)
  • tests/test_litellm/litellm_core_utils/llm_cost_calc/test_utils.py (added, +57/-0)

Code Example

"total_tokens": 46190,
"prompt_tokens": 45731,
"completion_tokens": 459,
"prompt_tokens_details": {
"text_tokens": null,
"audio_tokens": null,
"image_tokens": null,
"cached_tokens": 41305,
"cache_creation_tokens": 0
},
"cache_read_input_tokens": 41305,
"completion_tokens_details": {
"text_tokens": 459,
"audio_tokens": null,
"image_tokens": null,
"reasoning_tokens": 20,
"accepted_prediction_tokens": null,
"rejected_prediction_tokens": null
},
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

A bug happened! For anthropic models like haiku "usage_object": { "total_tokens": 46190, "prompt_tokens": 45731, "completion_tokens": 459, "prompt_tokens_details": { "text_tokens": null, "audio_tokens": null, "image_tokens": null, "cached_tokens": 41305, "cache_creation_tokens": 0 }, "cache_read_input_tokens": 41305, "completion_tokens_details": { "text_tokens": 459, "audio_tokens": null, "image_tokens": null, "reasoning_tokens": 20, "accepted_prediction_tokens": null, "rejected_prediction_tokens": null }, it gives us reasoning tokens embedded in text tokens leading to double counting of reasoning cost leading to inaccurate costs for output tokens .

Steps to Reproduce

  1. Using haiku 4.5 and spend logs enable , make a model call via api or sdk , you will get reasoning tokens embedded in text tokens i.e text_tokens= completion_tokens

Relevant log output

"total_tokens": 46190,
"prompt_tokens": 45731,
"completion_tokens": 459,
"prompt_tokens_details": {
"text_tokens": null,
"audio_tokens": null,
"image_tokens": null,
"cached_tokens": 41305,
"cache_creation_tokens": 0
},
"cache_read_input_tokens": 41305,
"completion_tokens_details": {
"text_tokens": 459,
"audio_tokens": null,
"image_tokens": null,
"reasoning_tokens": 20,
"accepted_prediction_tokens": null,
"rejected_prediction_tokens": null
},

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of double counting reasoning tokens embedded in text tokens, we need to modify the logic for calculating text_tokens and reasoning_tokens in the completion_tokens_details section.

Step-by-Step Solution

  1. Update the calculation logic: Modify the code to subtract reasoning_tokens from text_tokens to avoid double counting.
  2. Adjust the API response: Ensure the API returns the corrected text_tokens value.

Example Code Snippet

def calculate_text_tokens(completion_tokens_details):
    text_tokens = completion_tokens_details['text_tokens']
    reasoning_tokens = completion_tokens_details['reasoning_tokens']
    corrected_text_tokens = text_tokens - reasoning_tokens
    return corrected_text_tokens

# Example usage:
completion_tokens_details = {
    "text_tokens": 459,
    "audio_tokens": None,
    "image_tokens": None,
    "reasoning_tokens": 20,
    "accepted_prediction_tokens": None,
    "rejected_prediction_tokens": None
}

corrected_text_tokens = calculate_text_tokens(completion_tokens_details)
print(corrected_text_tokens)  # Output: 439

Verification

To verify the fix, make a model call via API or SDK and check the response. The text_tokens value should now be accurate, without double counting the reasoning tokens.

Extra Tips

  • Ensure to update the documentation to reflect the corrected calculation logic.
  • Consider adding tests to verify the correctness of the text_tokens calculation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: For preventing double count tokens for models like haiku [1 pull requests, 1 comments, 1 participants]