litellm - 💡(How to fix) Fix [Bug]: Cost not getting reflected while using flex paygo with litellm [1 participants]

litellm2026-03-27 05:39:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24666•Fetched 2026-04-08 01:37:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

gurvkm

Participants

gurvkm

Timeline (top)

labeled ×2

When routing requests through Vertex AI using the Flex PayGo tier, the LiteLLM proxy UI spend logs show the same cost as a standard on-demand request for the same model and token count. Flex PayGo has a lower per-token price than standard on-demand, so the logged cost is incorrect — it is being over-reported at the standard rate.

The response latency for Flex PayGo requests is significantly higher than standard calls, which confirms the request was indeed processed under the Flex PayGo tier and should be billed at Flex pricing.

Root Cause

Code Example

curl -X POST https://api-endpoint/chat/completions \
     -H 'Authorization: Bearer sk-...' \
     -H 'Content-Type: application/json' \
     -d '{"model": "gemini-standard", "messages": [{"role": "user", "content": "Hello"}]}'

   curl -X POST http://api-endpoint/chat/completions \
     -H 'Authorization: Bearer sk-...' \
     -H 'Content-Type: application/json' \
     -d '{"model": "gemini-flex", "messages": [{"role": "user", "content": "Hello"}]}'

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug: Vertex AI Flex PayGo requests show incorrect cost in UI spend logs (charged at standard on-demand rate instead of Flex rate)

Description

Expected Behavior

A Flex PayGo request should be logged in the UI spend logs at the Flex PayGo price (lower than standard on-demand). The cost entry should reflect the actual tier used, not the default on-demand rate.

Actual Behavior

UI spend logs show the same cost for a Flex PayGo request as for an equivalent standard on-demand request (same model, same prompt, same token count)
Response latency for the Flex PayGo request is much higher (e.g. ~Xs vs ~Ys for standard), confirming the flex tier was used
LiteLLM is clearly not detecting the Flex tier and is falling back to standard on-demand pricing for cost calculation

Alternate Requests Used Here are Flex which took ~13 sec vs Default which is taking ~1.8 sec

Steps to Reproduce

Send the exact same request to both models:

curl -X POST https://api-endpoint/chat/completions \
  -H 'Authorization: Bearer sk-...' \
  -H 'Content-Type: application/json' \
  -d '{"model": "gemini-standard", "messages": [{"role": "user", "content": "Hello"}]}'

curl -X POST http://api-endpoint/chat/completions \
  -H 'Authorization: Bearer sk-...' \
  -H 'Content-Type: application/json' \
  -d '{"model": "gemini-flex", "messages": [{"role": "user", "content": "Hello"}]}'

Note that the Flex PayGo response takes significantly longer, confirming it was routed through the Flex tier.
Open LiteLLM UI → Spend Logs — both requests show identical cost, despite using different pricing tiers.

Environment

LiteLLM Version: 1.80.11
Provider: Vertex AI
Deployment: LiteLLM Proxy

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.80.11

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the incorrect cost logging for Vertex AI Flex PayGo requests, we need to update the LiteLLM proxy to correctly detect the Flex tier and apply the corresponding pricing. Here are the steps:

Update the calculate_cost function to check for the Flex tier:

def calculate_cost(request):
    # ... existing code ...
    if request['model'] == 'gemini-flex':
        # Apply Flex PayGo pricing
        cost = calculate_flex_cost(request)
    else:
        # Apply standard on-demand pricing
        cost = calculate_on_demand_cost(request)
    return cost

def calculate_flex_cost(request):
    # Calculate cost based on Flex PayGo pricing tier
    # This may involve API calls to Vertex AI or using a pricing table
    flex_cost = request['token_count'] * FLEX_PRICE_PER_TOKEN
    return flex_cost

Add a new function to calculate the Flex PayGo cost:

FLEX_PRICE_PER_TOKEN = 0.0005  # Example price per token for Flex PayGo

Update the UI spend logs to display the correct cost for Flex PayGo requests:

def log_spend(request, cost):
    # ... existing code ...
    if request['model'] == 'gemini-flex':
        # Log cost with Flex PayGo pricing
        log_entry = {'request': request, 'cost': cost, 'pricing_tier': 'Flex PayGo'}
    else:
        # Log cost with standard on-demand pricing
        log_entry = {'request': request, 'cost': cost, 'pricing_tier': 'Standard On-Demand'}
    # ... existing code ...

Verification

To verify that the fix worked, send a Flex PayGo request and check the UI spend logs to ensure that the cost is correctly logged at the Flex PayGo price.

Extra Tips

Make sure to update the FLEX_PRICE_PER_TOKEN variable with the correct price per token for the Flex PayGo tier.
Consider adding additional logging or monitoring to ensure that the Flex tier is being correctly detected and that the correct pricing is being applied.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #mixed precision #training loop #device allocation

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Cost not getting reflected while using flex paygo with litellm [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Check for existing issues

What happened?

Bug: Vertex AI Flex PayGo requests show incorrect cost in UI spend logs (charged at standard on-demand rate instead of Flex rate)

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Cost not getting reflected while using flex paygo with litellm [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Check for existing issues

What happened?

Bug: Vertex AI Flex PayGo requests show incorrect cost in UI spend logs (charged at standard on-demand rate instead of Flex rate)

Description

Expected Behavior

Actual Behavior

Steps to Reproduce

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING