litellm - 💡(How to fix) Fix [Bug]: Cost not getting reflected while using flex paygo with litellm [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24666Fetched 2026-04-08 01:37:19
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
labeled ×2

When routing requests through Vertex AI using the Flex PayGo tier, the LiteLLM proxy UI spend logs show the same cost as a standard on-demand request for the same model and token count. Flex PayGo has a lower per-token price than standard on-demand, so the logged cost is incorrect — it is being over-reported at the standard rate.

The response latency for Flex PayGo requests is significantly higher than standard calls, which confirms the request was indeed processed under the Flex PayGo tier and should be billed at Flex pricing.

Root Cause

When routing requests through Vertex AI using the Flex PayGo tier, the LiteLLM proxy UI spend logs show the same cost as a standard on-demand request for the same model and token count. Flex PayGo has a lower per-token price than standard on-demand, so the logged cost is incorrect — it is being over-reported at the standard rate.

The response latency for Flex PayGo requests is significantly higher than standard calls, which confirms the request was indeed processed under the Flex PayGo tier and should be billed at Flex pricing.

Code Example

curl -X POST https://api-endpoint/chat/completions \
     -H 'Authorization: Bearer sk-...' \
     -H 'Content-Type: application/json' \
     -d '{"model": "gemini-standard", "messages": [{"role": "user", "content": "Hello"}]}'

   curl -X POST http://api-endpoint/chat/completions \
     -H 'Authorization: Bearer sk-...' \
     -H 'Content-Type: application/json' \
     -d '{"model": "gemini-flex", "messages": [{"role": "user", "content": "Hello"}]}'

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug: Vertex AI Flex PayGo requests show incorrect cost in UI spend logs (charged at standard on-demand rate instead of Flex rate)

Description

When routing requests through Vertex AI using the Flex PayGo tier, the LiteLLM proxy UI spend logs show the same cost as a standard on-demand request for the same model and token count. Flex PayGo has a lower per-token price than standard on-demand, so the logged cost is incorrect — it is being over-reported at the standard rate.

The response latency for Flex PayGo requests is significantly higher than standard calls, which confirms the request was indeed processed under the Flex PayGo tier and should be billed at Flex pricing.

Expected Behavior

A Flex PayGo request should be logged in the UI spend logs at the Flex PayGo price (lower than standard on-demand). The cost entry should reflect the actual tier used, not the default on-demand rate.

Actual Behavior

  • UI spend logs show the same cost for a Flex PayGo request as for an equivalent standard on-demand request (same model, same prompt, same token count)
  • Response latency for the Flex PayGo request is much higher (e.g. ~Xs vs ~Ys for standard), confirming the flex tier was used
  • LiteLLM is clearly not detecting the Flex tier and is falling back to standard on-demand pricing for cost calculation
<img width="1609" height="243" alt="Image" src="https://github.com/user-attachments/assets/c856f867-aa95-4427-828f-d8cf37bbc28f" />

Alternate Requests Used Here are Flex which took ~13 sec vs Default which is taking ~1.8 sec

Steps to Reproduce

  1. Send the exact same request to both models:

    curl -X POST https://api-endpoint/chat/completions \
      -H 'Authorization: Bearer sk-...' \
      -H 'Content-Type: application/json' \
      -d '{"model": "gemini-standard", "messages": [{"role": "user", "content": "Hello"}]}'
    
    curl -X POST http://api-endpoint/chat/completions \
      -H 'Authorization: Bearer sk-...' \
      -H 'Content-Type: application/json' \
      -d '{"model": "gemini-flex", "messages": [{"role": "user", "content": "Hello"}]}'
  2. Note that the Flex PayGo response takes significantly longer, confirming it was routed through the Flex tier.

  3. Open LiteLLM UI → Spend Logs — both requests show identical cost, despite using different pricing tiers.

Environment

  • LiteLLM Version: 1.80.11
  • Provider: Vertex AI
  • Deployment: LiteLLM Proxy

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.80.11

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the incorrect cost logging for Vertex AI Flex PayGo requests, we need to update the LiteLLM proxy to correctly detect the Flex tier and apply the corresponding pricing. Here are the steps:

  • Update the calculate_cost function to check for the Flex tier:
def calculate_cost(request):
    # ... existing code ...
    if request['model'] == 'gemini-flex':
        # Apply Flex PayGo pricing
        cost = calculate_flex_cost(request)
    else:
        # Apply standard on-demand pricing
        cost = calculate_on_demand_cost(request)
    return cost

def calculate_flex_cost(request):
    # Calculate cost based on Flex PayGo pricing tier
    # This may involve API calls to Vertex AI or using a pricing table
    flex_cost = request['token_count'] * FLEX_PRICE_PER_TOKEN
    return flex_cost
  • Add a new function to calculate the Flex PayGo cost:
FLEX_PRICE_PER_TOKEN = 0.0005  # Example price per token for Flex PayGo
  • Update the UI spend logs to display the correct cost for Flex PayGo requests:
def log_spend(request, cost):
    # ... existing code ...
    if request['model'] == 'gemini-flex':
        # Log cost with Flex PayGo pricing
        log_entry = {'request': request, 'cost': cost, 'pricing_tier': 'Flex PayGo'}
    else:
        # Log cost with standard on-demand pricing
        log_entry = {'request': request, 'cost': cost, 'pricing_tier': 'Standard On-Demand'}
    # ... existing code ...

Verification

To verify that the fix worked, send a Flex PayGo request and check the UI spend logs to ensure that the cost is correctly logged at the Flex PayGo price.

Extra Tips

  • Make sure to update the FLEX_PRICE_PER_TOKEN variable with the correct price per token for the Flex PayGo tier.
  • Consider adding additional logging or monitoring to ensure that the Flex tier is being correctly detected and that the correct pricing is being applied.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Cost not getting reflected while using flex paygo with litellm [1 participants]