litellm - ✅(Solved) Fix [Bug]: Incorrect cost calculation for PDF attachments for Gemini models [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24375Fetched 2026-04-08 01:18:09
View on GitHub
Comments
1
Participants
2
Timeline
13
Reactions
0
Timeline (top)
cross-referenced ×5referenced ×4labeled ×2closed ×1

Fix Action

Fixed

PR fix notes

PR #24381: fix(cost): bill unaccounted Gemini token remainder

Description (problem / solution / changelog)

Summary

  • normalize partial prompt_tokens_details breakdowns so any unaccounted remainder falls back to text-token billing
  • apply the same remainder handling to completion_tokens_details to avoid silently dropping incomplete output token breakdowns
  • add regression coverage for partial Gemini token breakdowns

Fixes #24375.

Testing

  • pytest tests/test_litellm/test_cost_calculator.py -k "partial_prompt_token_breakdown or partial_completion_token_breakdown or explicit_caching_cost_direct_usage" -q
  • ruff check litellm/litellm_core_utils/llm_cost_calc/utils.py
  • ruff check tests/test_litellm/test_cost_calculator.py --ignore T201

Changed files

  • tests/test_litellm/test_cost_calculator_partial_breakdown.py (added, +109/-0)

Code Example

# Download a sample PDF
curl -sL -o /tmp/Example.pdf "https://upload.wikimedia.org/wikipedia/commons/1/13/Example.pdf"
  PDF_B64=$(base64 -i /tmp/Example.pdf)

# Send to LiteLLM proxy (non-streaming, cost returned in response header)
  curl -s -D /dev/stderr http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer sk-..." \
    -H "Content-Type: application/json" \
    -d "{
      \"model\": \"gemini-2.5-flash\",
      \"messages\": [{
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Summarize this PDF in one sentence.\"},
          {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:application/pdf;base64,${PDF_B64}\"}}
        ]
      }],
      \"max_tokens\": 100
    }" | jq

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Cost undercount for gemini models when PDF is attached. prompt_tokens_details incomplete

# Download a sample PDF
curl -sL -o /tmp/Example.pdf "https://upload.wikimedia.org/wikipedia/commons/1/13/Example.pdf"
  PDF_B64=$(base64 -i /tmp/Example.pdf)

# Send to LiteLLM proxy (non-streaming, cost returned in response header)
  curl -s -D /dev/stderr http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer sk-..." \
    -H "Content-Type: application/json" \
    -d "{
      \"model\": \"gemini-2.5-flash\",
      \"messages\": [{
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Summarize this PDF in one sentence.\"},
          {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:application/pdf;base64,${PDF_B64}\"}}
        ]
      }],
      \"max_tokens\": 100
    }" | jq

Response usage object: "usage": { "completion_tokens": 96, "prompt_tokens": 783, "total_tokens": 879, "completion_tokens_details": { "reasoning_tokens": 92, "text_tokens": 4 }, "prompt_tokens_details": { "text_tokens": 9 } }

Response cost header: x-litellm-response-cost: 0.000243

Problem

prompt_tokens_details reports only 9 of 783 prompt tokens — no other subcategory is present. The remaining 774 tokens (99%) are not accounted for in any subcategory. LiteLLM's cost calculation uses the subcategory breakdown rather than the total prompt_tokens, undercounting the spend.

Both streaming and non-streaming are affected.

Expected

Any "unaccounted tokens" should probably be billed as default / Text Tokens when prompt_tokens_details subcategories don't sum to the total Prompt Tokens? And same for completion_tokens_details..

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To address the cost undercount issue for Gemini models when a PDF is attached, we need to modify the cost calculation logic to account for unclassified tokens.

Here are the steps:

  • Update the prompt_tokens_details and completion_tokens_details to include a default category for unaccounted tokens.
  • Modify the cost calculation to use the total tokens when the subcategory breakdown is incomplete.

Example code changes:

# Calculate total prompt tokens
total_prompt_tokens = prompt_tokens

# Calculate prompt tokens details
prompt_tokens_details = {
    "text_tokens": 9,
    # Add default category for unaccounted tokens
    "default_tokens": total_prompt_tokens - 9
}

# Calculate total completion tokens
total_completion_tokens = completion_tokens

# Calculate completion tokens details
completion_tokens_details = {
    "reasoning_tokens": 92,
    "text_tokens": 4,
    # Add default category for unaccounted tokens
    "default_tokens": total_completion_tokens - (92 + 4)
}

# Update usage object
usage = {
    "completion_tokens": total_completion_tokens,
    "prompt_tokens": total_prompt_tokens,
    "total_tokens": total_completion_tokens + total_prompt_tokens,
    "completion_tokens_details": completion_tokens_details,
    "prompt_tokens_details": prompt_tokens_details
}

# Update cost calculation to use total tokens when subcategory breakdown is incomplete
if sum(prompt_tokens_details.values()) != total_prompt_tokens:
    # Use total prompt tokens for cost calculation
    cost = calculate_cost(total_prompt_tokens)
else:
    # Use subcategory breakdown for cost calculation
    cost = calculate_cost_from_subcategories(prompt_tokens_details)

Verification

To verify the fix, send a request with a PDF attachment and check the response usage object and cost header. The prompt_tokens_details should now include a default category for unaccounted tokens, and the cost calculation should use the total tokens when the subcategory breakdown is incomplete.

Example verification code:

# Send request with PDF attachment
response = curl -s -D /dev/stderr http://localhost:4000/v1/chat/completions \
    -H "Authorization: Bearer sk-..." \
    -H "Content-Type: application/json" \
    -d "{
      \"model\": \"gemini-2.5-flash\",
      \"messages\": [{
        \"role\": \"user\",
        \"content\": [
          {\"type\": \"text\", \"text\": \"Summarize this PDF in one sentence.\"},
          {\"type\": \"image_url\", \"image_url\": {\"url\": \"data:application/pdf;base64,${PDF_B64}\"}}
        ]
      }],
      \"max_tokens\": 100
    }" | jq

# Check response usage object and cost header
usage = response.usage
cost_header = response.headers["x-litellm-response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Incorrect cost calculation for PDF attachments for Gemini models [1 pull requests, 1 comments, 2 participants]