litellm - 💡(How to fix) Fix [Bug]: model_info cost override ignored when calling upstream LiteLLM proxy (litellm_proxy/ prefix)

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Root cause — cost_calculator.py, lines 1749–1753:

provider_response_cost = get_response_cost_from_hidden_params(
    response_object._hidden_params
)
if provider_response_cost is not None:
    return provider_response_cost  # short-circuits before local model_info is checked

Fix Action

Fix / Workaround

Relation to #25204 / PR #25206: This is a distinct bug at a higher level. PR #25206 fixes the dispatch order inside cost_per_token(). This bug short-circuits in response_cost_calculator() before completion_cost() or cost_per_token() are called, so #25206 does not address this scenario.

v1.83.14-stable.patch.3

Code Example

provider_response_cost = get_response_cost_from_hidden_params(
    response_object._hidden_params
)
if provider_response_cost is not None:
    return provider_response_cost  # short-circuits before local model_info is checked

---

model_list:
  - model_name: glm-4.7
    litellm_params:
      model: litellm_proxy/hosted_vllm/glm-4.7-fp8
      api_key: sk-1234
      api_base: http://<X-host>:4000
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0
    model_info:
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0

---

Check Y's spend logs or the `x-litellm-response-cost` header on Y's response — it will match X's cost, not `0.0`.
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When a LiteLLM proxy instance (Y) is configured to call another LiteLLM proxy instance (X) using the litellm_proxy/ model prefix, setting input_cost_per_token: 0.0 / output_cost_per_token: 0.0 in model_info or litellm_params on Y has no effect. Y always records the cost that X computed, ignoring the local override entirely.

Expected behavior: Local model_info cost overrides on the calling proxy (Y) should take precedence over the cost reported by the upstream proxy (X).

Actual behavior: The upstream proxy (X) always includes an x-litellm-response-cost response header (common_request_processing.py:550). When Y receives the response, response_cost_calculator() extracts this header and returns immediately, before completion_cost() or cost_per_token() are ever reached — so Y's local model_info overrides are never consulted.

Root cause — cost_calculator.py, lines 1749–1753:

provider_response_cost = get_response_cost_from_hidden_params(
    response_object._hidden_params
)
if provider_response_cost is not None:
    return provider_response_cost  # short-circuits before local model_info is checked

Relation to #25204 / PR #25206: This is a distinct bug at a higher level. PR #25206 fixes the dispatch order inside cost_per_token(). This bug short-circuits in response_cost_calculator() before completion_cost() or cost_per_token() are called, so #25206 does not address this scenario.

Steps to Reproduce

  1. Instance X — proxy config with a model that has a non-zero cost (e.g. pulled from the LiteLLM model registry).

  2. Instance Y — proxy config pointing to X with zero-cost overrides:

model_list:
  - model_name: glm-4.7
    litellm_params:
      model: litellm_proxy/hosted_vllm/glm-4.7-fp8
      api_key: sk-1234
      api_base: http://<X-host>:4000
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0
    model_info:
      input_cost_per_token: 0.0
      output_cost_per_token: 0.0
  1. Send a request through Y. Observe that Y logs the same non-zero cost as X, not 0.0.

Relevant log output

Check Y's spend logs or the `x-litellm-response-cost` header on Y's response — it will match X's cost, not `0.0`.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.14-stable.patch.3

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: model_info cost override ignored when calling upstream LiteLLM proxy (litellm_proxy/ prefix)