litellm - 💡(How to fix) Fix [Bug]: model_info cost override ignored when calling upstream LiteLLM proxy (litellm_proxy/ prefix)

litellm2026-05-11 16:27:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Root cause — cost_calculator.py, lines 1749–1753:

provider_response_cost = get_response_cost_from_hidden_params(
    response_object._hidden_params
)
if provider_response_cost is not None:
    return provider_response_cost  # short-circuits before local model_info is checked

Fix Action

Fix / Workaround

Relation to #25204 / PR #25206: This is a distinct bug at a higher level. PR #25206 fixes the dispatch order inside cost_per_token(). This bug short-circuits in response_cost_calculator() before completion_cost() or cost_per_token() are called, so #25206 does not address this scenario.

v1.83.14-stable.patch.3

Code Example

provider_response_cost = get_response_cost_from_hidden_params(
    response_object._hidden_params
)
if provider_response_cost is not None:
    return provider_response_cost  # short-circuits before local model_info is checked

---

model_list:
  - model_name: glm-4.7
    litellm_params:
      model: litellm_proxy/hosted_vllm/glm-4.7-fp8
      api_key: sk-1234
      api_base: http://<X-host>:4000
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0
    model_info:
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0

---

Check Y's spend logs or the `x-litellm-response-cost` header on Y's response — it will match X's cost, not `0.0`.

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When a LiteLLM proxy instance (Y) is configured to call another LiteLLM proxy instance (X) using the litellm_proxy/ model prefix, setting input_cost_per_token: 0.0 / output_cost_per_token: 0.0 in model_info or litellm_params on Y has no effect. Y always records the cost that X computed, ignoring the local override entirely.

Expected behavior: Local model_info cost overrides on the calling proxy (Y) should take precedence over the cost reported by the upstream proxy (X).

Actual behavior: The upstream proxy (X) always includes an x-litellm-response-cost response header (common_request_processing.py:550). When Y receives the response, response_cost_calculator() extracts this header and returns immediately, before completion_cost() or cost_per_token() are ever reached — so Y's local model_info overrides are never consulted.

Root cause — cost_calculator.py, lines 1749–1753:

provider_response_cost = get_response_cost_from_hidden_params(
    response_object._hidden_params
)
if provider_response_cost is not None:
    return provider_response_cost  # short-circuits before local model_info is checked

Steps to Reproduce

Instance X — proxy config with a model that has a non-zero cost (e.g. pulled from the LiteLLM model registry).
Instance Y — proxy config pointing to X with zero-cost overrides:

model_list:
  - model_name: glm-4.7
    litellm_params:
      model: litellm_proxy/hosted_vllm/glm-4.7-fp8
      api_key: sk-1234
      api_base: http://<X-host>:4000
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0
    model_info:
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0

Send a request through Y. Observe that Y logs the same non-zero cost as X, not 0.0.

Relevant log output

Check Y's spend logs or the `x-litellm-response-cost` header on Y's response — it will match X's cost, not `0.0`.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.14-stable.patch.3

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #LLM response #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: model_info cost override ignored when calling upstream LiteLLM proxy (litellm_proxy/ prefix)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: model_info cost override ignored when calling upstream LiteLLM proxy (litellm_proxy/ prefix)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Check for existing issues

What happened?

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING