litellm - ✅(Solved) Fix [Bug]: Proper usage handling from VLLM (cached_tokens) [1 pull requests, 1 comments, 2 participants]

litellm2026-03-06 17:05:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#22984•Fetched 2026-04-08 00:39:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mfolnovic

Participants

mfolnovic

nehaaprasad

Timeline (top)

labeled ×2commented ×1cross-referenced ×1mentioned ×1

Fix Action

Fixed

Fixed by PR: fix VLLM cached tokens handling in token cost calculator (https://github.com/BerriAI/litellm/pull/22986)

PR fix notes

PR #22986: fix VLLM cached tokens handling in token cost calculator

Repository: BerriAI/litellm
Author: naaa760
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/22986

Description (problem / solution / changelog)

Short description:

VLLM returns prompt_tokens_details.cached_tokens but not top-level cache_read_input_tokens, so the cost calculator and proxy UI showed "Cache Read Tokens: 0" even when cached tokens were present.
This PR maps prompt_tokens_details.cached_tokens to cache_read_input_tokens in Usage.init when it is not already set by other providers.

Relevant issues:

Fixes : #22984

Type: Bug fix

Changed files

litellm/types/utils.py (modified, +10/-0)
tests/test_litellm/types/test_types_utils.py (modified, +36/-0)

Code Example

usage:{
    total_tokens:150687,
    prompt_tokens:150383,
    completion_tokens:304,
    prompt_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        cached_tokens:149936
    },
    completion_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        reasoning_tokens:15,
        accepted_prediction_tokens:null,
        rejected_prediction_tokens:null
    }
},

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using VLLM as a provider, token cost calculator does not handle information about the cached tokens.

I've looked through the implementation and based on my understanding, this logic is missing in: https://github.com/BerriAI/litellm/blob/main/litellm/types/utils.py#L1538 , am I correct?

Steps to Reproduce

Configure vllm provider
Send a request, in LiteLLM logs for that request, see VLLM response:

usage:{
    total_tokens:150687,
    prompt_tokens:150383,
    completion_tokens:304,
    prompt_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        cached_tokens:149936
    },
    completion_tokens_details:{
        text_tokens:null,
        audio_tokens:null,
        image_tokens:null,
        reasoning_tokens:15,
        accepted_prediction_tokens:null,
        rejected_prediction_tokens:null
    }
},

and see "Cache Read Tokens: 0", "Cache Creation Tokens: -".

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.18.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To handle cached tokens in the token cost calculator, we need to update the calculate_token_cost function in utils.py.

Update the calculate_token_cost function to parse the cached_tokens from the VLLM response.
Add a new variable to store the cached tokens and calculate the total token cost accordingly.

Example code:

def calculate_token_cost(vllm_response):
    # ...
    cached_tokens = vllm_response['usage']['prompt_tokens_details']['cached_tokens']
    total_tokens = vllm_response['usage']['total_tokens']
    cache_read_tokens = cached_tokens
    cache_creation_tokens = total_tokens - cache_read_tokens
    # ...
    return {
        # ...
        'cache_read_tokens': cache_read_tokens,
        'cache_creation_tokens': cache_creation_tokens,
    }

Update the logging to display the correct cache read and creation tokens.

Verification

To verify the fix, send a request with the VLLM provider and check the LiteLLM logs for the correct cache read and creation tokens.

Extra Tips

Make sure to test the updated function with different VLLM responses to ensure it handles all possible cases correctly. Additionally, consider adding error handling for cases where the cached_tokens field is missing or null.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #LLM response #task chaining #parallel task #integration issue #index setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: Proper usage handling from VLLM (cached_tokens) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #22986: fix VLLM cached tokens handling in token cost calculator

Description (problem / solution / changelog)

Short description:

Relevant issues:

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: Proper usage handling from VLLM (cached_tokens) [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #22986: fix VLLM cached tokens handling in token cost calculator

Description (problem / solution / changelog)

Short description:

Relevant issues:

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING