litellm - ✅(Solved) Fix [Bug]: Proper usage handling from VLLM (cached_tokens) [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#22984Fetched 2026-04-08 00:39:01
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1mentioned ×1

Fix Action

Fixed

PR fix notes

PR #22986: fix VLLM cached tokens handling in token cost calculator

Description (problem / solution / changelog)

Short description:

  • VLLM returns prompt_tokens_details.cached_tokens but not top-level cache_read_input_tokens, so the cost calculator and proxy UI showed "Cache Read Tokens: 0" even when cached tokens were present.
  • This PR maps prompt_tokens_details.cached_tokens to cache_read_input_tokens in Usage.init when it is not already set by other providers.

Relevant issues:

Fixes : #22984

  • Type: Bug fix

Changed files

  • litellm/types/utils.py (modified, +10/-0)
  • tests/test_litellm/types/test_types_utils.py (modified, +36/-0)

Code Example

usage:{
    total_tokens:150687,
    prompt_tokens:150383,
    completion_tokens:304,
    prompt_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        cached_tokens:149936
    },
    completion_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        reasoning_tokens:15,
        accepted_prediction_tokens:null,
        rejected_prediction_tokens:null
    }
},
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using VLLM as a provider, token cost calculator does not handle information about the cached tokens.

I've looked through the implementation and based on my understanding, this logic is missing in: https://github.com/BerriAI/litellm/blob/main/litellm/types/utils.py#L1538 , am I correct?

Steps to Reproduce

  1. Configure vllm provider
  2. Send a request, in LiteLLM logs for that request, see VLLM response:
usage:{
    total_tokens:150687,
    prompt_tokens:150383,
    completion_tokens:304,
    prompt_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        cached_tokens:149936
    },
    completion_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        reasoning_tokens:15,
        accepted_prediction_tokens:null,
        rejected_prediction_tokens:null
    }
},
  1. and see "Cache Read Tokens: 0", "Cache Creation Tokens: -".

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.18.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To handle cached tokens in the token cost calculator, we need to update the calculate_token_cost function in utils.py.

  • Update the calculate_token_cost function to parse the cached_tokens from the VLLM response.
  • Add a new variable to store the cached tokens and calculate the total token cost accordingly.

Example code:

def calculate_token_cost(vllm_response):
    # ...
    cached_tokens = vllm_response['usage']['prompt_tokens_details']['cached_tokens']
    total_tokens = vllm_response['usage']['total_tokens']
    cache_read_tokens = cached_tokens
    cache_creation_tokens = total_tokens - cache_read_tokens
    # ...
    return {
        # ...
        'cache_read_tokens': cache_read_tokens,
        'cache_creation_tokens': cache_creation_tokens,
    }
  • Update the logging to display the correct cache read and creation tokens.

Verification

To verify the fix, send a request with the VLLM provider and check the LiteLLM logs for the correct cache read and creation tokens.

Extra Tips

Make sure to test the updated function with different VLLM responses to ensure it handles all possible cases correctly. Additionally, consider adding error handling for cases where the cached_tokens field is missing or null.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Proper usage handling from VLLM (cached_tokens) [1 pull requests, 1 comments, 2 participants]