litellm - ✅(Solved) Fix [Bug]: Incorrect cost calculation for PDF attachments for Gemini models [1 pull requests, 1 comments, 2 participants]

litellm2026-03-22 21:59:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24375•Fetched 2026-04-08 01:18:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

DmitriyAlergant

Participants

DmitriyAlergant

hshgogogo

Timeline (top)

cross-referenced ×5referenced ×4labeled ×2closed ×1

Fix Action

Fixed

Fixed by PR: fix(cost): bill unaccounted Gemini token remainder (https://github.com/BerriAI/litellm/pull/24381)

PR fix notes

PR #24381: fix(cost): bill unaccounted Gemini token remainder

Repository: BerriAI/litellm
Author: hshgogogo
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24381

Description (problem / solution / changelog)

Summary

normalize partial prompt_tokens_details breakdowns so any unaccounted remainder falls back to text-token billing
apply the same remainder handling to completion_tokens_details to avoid silently dropping incomplete output token breakdowns
add regression coverage for partial Gemini token breakdowns

Fixes #24375.

Testing

pytest tests/test_litellm/test_cost_calculator.py -k "partial_prompt_token_breakdown or partial_completion_token_breakdown or explicit_caching_cost_direct_usage" -q
ruff check litellm/litellm_core_utils/llm_cost_calc/utils.py
ruff check tests/test_litellm/test_cost_calculator.py --ignore T201

Changed files

tests/test_litellm/test_cost_calculator_partial_breakdown.py (added, +109/-0)

Code Example

# Download a sample PDF
curl -sL -o /tmp/Example.pdf "https://upload.wikimedia.org/wikipedia/commons/1/13/Example.pdf"
  PDF_B64=$(base64 -i /tmp/Example.pdf)

# Send to LiteLLM proxy (non-streaming, cost returned in response header)
  curl -s -D /dev/stderr http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer sk-..." \
    -H "Content-Type: application/json" \
    -d "{
      \"model\": \"gemini-2.5-flash\",
      \"messages\": [{
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Summarize this PDF in one sentence.\"},
          {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:application/pdf;base64,${PDF_B64}\"}}
        ]
      }],
      \"max_tokens\": 100
    }" | jq

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Cost undercount for gemini models when PDF is attached. prompt_tokens_details incomplete

# Download a sample PDF
curl -sL -o /tmp/Example.pdf "https://upload.wikimedia.org/wikipedia/commons/1/13/Example.pdf"
  PDF_B64=$(base64 -i /tmp/Example.pdf)

# Send to LiteLLM proxy (non-streaming, cost returned in response header)
  curl -s -D /dev/stderr http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer sk-..." \
    -H "Content-Type: application/json" \
    -d "{
      \"model\": \"gemini-2.5-flash\",
      \"messages\": [{
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Summarize this PDF in one sentence.\"},
          {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:application/pdf;base64,${PDF_B64}\"}}
        ]
      }],
      \"max_tokens\": 100
    }" | jq

Response usage object: "usage": { "completion_tokens": 96, "prompt_tokens": 783, "total_tokens": 879, "completion_tokens_details": { "reasoning_tokens": 92, "text_tokens": 4 }, "prompt_tokens_details": { "text_tokens": 9 } }

Response cost header: x-litellm-response-cost: 0.000243

Problem

prompt_tokens_details reports only 9 of 783 prompt tokens — no other subcategory is present. The remaining 774 tokens (99%) are not accounted for in any subcategory. LiteLLM's cost calculation uses the subcategory breakdown rather than the total prompt_tokens, undercounting the spend.

Both streaming and non-streaming are affected.

Expected

Any "unaccounted tokens" should probably be billed as default / Text Tokens when prompt_tokens_details subcategories don't sum to the total Prompt Tokens? And same for completion_tokens_details..

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To address the cost undercount issue for Gemini models when a PDF is attached, we need to modify the cost calculation logic to account for unclassified tokens.

Here are the steps:

Update the prompt_tokens_details and completion_tokens_details to include a default category for unaccounted tokens.
Modify the cost calculation to use the total tokens when the subcategory breakdown is incomplete.

Example code changes:

# Calculate total prompt tokens
total_prompt_tokens = prompt_tokens

# Calculate prompt tokens details
prompt_tokens_details = {
    "text_tokens": 9,
    # Add default category for unaccounted tokens
    "default_tokens": total_prompt_tokens - 9
}

# Calculate total completion tokens
total_completion_tokens = completion_tokens

# Calculate completion tokens details
completion_tokens_details = {
    "reasoning_tokens": 92,
    "text_tokens": 4,
    # Add default category for unaccounted tokens
    "default_tokens": total_completion_tokens - (92 + 4)
}

# Update usage object
usage = {
    "completion_tokens": total_completion_tokens,
    "prompt_tokens": total_prompt_tokens,
    "total_tokens": total_completion_tokens + total_prompt_tokens,
    "completion_tokens_details": completion_tokens_details,
    "prompt_tokens_details": prompt_tokens_details
}

# Update cost calculation to use total tokens when subcategory breakdown is incomplete
if sum(prompt_tokens_details.values()) != total_prompt_tokens:
    # Use total prompt tokens for cost calculation
    cost = calculate_cost(total_prompt_tokens)
else:
    # Use subcategory breakdown for cost calculation
    cost = calculate_cost_from_subcategories(prompt_tokens_details)

Verification

To verify the fix, send a request with a PDF attachment and check the response usage object and cost header. The prompt_tokens_details should now include a default category for unaccounted tokens, and the cost calculation should use the total tokens when the subcategory breakdown is incomplete.

Example verification code:

# Send request with PDF attachment
response = curl -s -D /dev/stderr http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer sk-..." \
    -H "Content-Type: application/json" \
    -d "{
      \"model\": \"gemini-2.5-flash\",
      \"messages\": [{
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Summarize this PDF in one sentence.\"},
          {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:application/pdf;base64,${PDF_B64}\"}}
        ]
      }],
      \"max_tokens\": 100
    }" | jq

# Check response usage object and cost header
usage = response.usage
cost_header = response.headers["x-litellm-response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#LLM response #retriever error #indexing error #inference speed #output truncation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - ✅(Solved) Fix [Bug]: Incorrect cost calculation for PDF attachments for Gemini models [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #24381: fix(cost): bill unaccounted Gemini token remainder

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Still need to ship something?

TRENDING

litellm - ✅(Solved) Fix [Bug]: Incorrect cost calculation for PDF attachments for Gemini models [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #24381: fix(cost): bill unaccounted Gemini token remainder

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING