litellm - ✅(Solved) Fix [Bug]: Custom Pricing is broken for /messages and /responses endpoints (streaming) [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23185Fetched 2026-04-08 00:38:06
View on GitHub
Comments
1
Participants
1
Timeline
19
Reactions
0
Participants
Timeline (top)
cross-referenced ×9referenced ×5labeled ×3closed ×1

Root Cause

The /v1/chat/completions path works because it uses get_litellm_params(**kwargs) which explicitly passes input_cost_per_token etc. into the logging object's litellm_params, and the router's _acompletion uses _update_kwargs_with_deployment(deployment, kwargs) with no function_name — so model_info goes into kwargs["metadata"] (not litellm_metadata), making custom_pricing detectable.

Fix Action

Fixed

PR fix notes

PR #23239: fix: preserve deployment custom pricing for /responses and /messages

Description (problem / solution / changelog)

Relevant issues

https://github.com/BerriAI/litellm/issues/23185

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit All good aside of 7 pre-existing failures due to responses package shadowing AttributeError: module 'responses' has no attribute 'activate', this is a pre-existing issue
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • [will request] I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

Custom pricing from model_info config was ignored for /v1/responses (streaming) and /v1/messages (all modes) endpoints, causing cost calculations to use standard pricing instead. Regression introduced by: PR #20679 (commit 1ee43b11de). See repro steps in the issue linked above.

Changed files

  • litellm/litellm_core_utils/core_helpers.py (modified, +53/-1)
  • litellm/litellm_core_utils/litellm_logging.py (modified, +31/-17)
  • litellm/llms/custom_httpx/llm_http_handler.py (modified, +10/-1)
  • litellm/proxy/pass_through_endpoints/llm_provider_handlers/anthropic_passthrough_logging_handler.py (modified, +22/-1)
  • litellm/responses/main.py (modified, +8/-1)
  • tests/test_litellm/litellm_core_utils/test_litellm_logging.py (modified, +157/-0)
  • tests/test_litellm/proxy/pass_through_endpoints/llm_provider_handlers/test_anthropic_passthrough_logging_handler.py (modified, +213/-1)
  • tests/test_litellm/test_custom_pricing_metadata_propagation.py (added, +799/-0)

Code Example

litellm_settings:
  turn_off_message_logging: true

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  store_model_in_db: True
  database_url: os.environ/DATABASE_URL

model_list:

  - model_name: "gpt-5"
    litellm_params:
      model: openai/gpt-5
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      id: gpt-5-custom-pricing
      mode: "chat"
      input_cost_per_token: 0.000125        # 100x standard ($1.25/1M = $0.00000125)
      output_cost_per_token: 0.001         # 100x standard ($10.00/1M = $0.00001)

  - model_name: "claude-sonnet-4-20250514"
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      id: claude-sonnet-4-custom-pricing
      input_cost_per_token: 0.0003         # 100x standard ($0.000003)
      output_cost_per_token: 0.0015         # 100x standard ($0.000015)

---

Request: POST /v1/messages
  {
    "model": "claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": "Say exactly: hello world (ref:18599-9699)"
      }
    ],
    "max_tokens": 20,
    "stream": true
  }

---

Request: POST /v1/responses
  {
    "model": "gpt-5",
    "input": "Say exactly: hello world (ref:5494-4215)",
    "stream": true
  }

---

Agent's code analysis (not validated or attempted to fix):

Regression commit: 1ee43b11de — PR #20679: "[Fix] prevent shared backend model key from being polluted by per-deployment custom pricing"

LiteLLM registers each model deployment in litellm.model_cost under TWO keys:
  - Deployment-specific ID (e.g., gpt-5.2)with full _model_info including custom pricing
  - Shared base model name (e.g., openai/gpt-5.2 → resolves to gpt-5.2) — used as a fallback lookup

  `v1.81.0 (router.py:5651)`: the shared key got the full _model_info including custom pricing:
  litellm.register_model(model_cost={_model_name: _model_info})

  `v1.81.14 (router.py:6217-6229)`: PR #20679 strips all pricing fields from the shared key to prevent cross-deployment pollution:
  _custom_pricing_fields = CustomPricingLiteLLMParams.model_fields.keys()
  _shared_model_info = {
      k: v for k, v in _model_info.items() if k not in _custom_pricing_fields
  }
  litellm.register_model(model_cost={_model_name: _shared_model_info})

  This is correct in principle (two deployments of the same model with different pricing shouldn't overwrite each other's shared entry), but it breaks all code paths that look up cost by base model name — which includes:

  1. /spend/calculate endpoint — Chat App calls this with completion_response.model = "gpt-5.2", which looks up the now-stripped shared entry
  2. /responses API internal cost tracking — the router puts model_info into kwargs["litellm_metadata"] (via _get_router_metadata_variable_name("generic_api_call")"litellm_metadata"),but responses/main.py reads metadata (set by proxy with user API key info, always truthy) and never reaches litellm_metadata. So custom_pricing is not detected, and cost calculation falls back to the base model name lookup — which no longer has pricing.
  3. Anthropic /messages API — same _generic_api_call_with_fallbacks → litellm_metadata path, same issue in llm_http_handler.py:1886 where kwargs.get("metadata", {}) gets proxy metadata without model_info

The /v1/chat/completions path works because it uses get_litellm_params(**kwargs) which explicitly passes input_cost_per_token etc. into the logging object's litellm_params, and the router's _acompletion uses _update_kwargs_with_deployment(deployment, kwargs) with no function_name — so model_info goes into kwargs["metadata"] (not litellm_metadata), making custom_pricing detectable.
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

This regression is reproducible in v1.81.14-stable and v1.82.0-rc3. Still worked correctly in v1.81.0-stable.

Streaming requests to /v1/responses and /v1/messages are ignoring custom pricing. Spend is calculated using standard pricing map. No issue observed with /v1/chat/completions.

Steps to Reproduce

config.yaml

litellm_settings:
  turn_off_message_logging: true

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  store_model_in_db: True
  database_url: os.environ/DATABASE_URL

model_list:

  - model_name: "gpt-5"
    litellm_params:
      model: openai/gpt-5
      api_key: os.environ/OPENAI_API_KEY
    model_info:
      id: gpt-5-custom-pricing
      mode: "chat"
      input_cost_per_token: 0.000125        # 100x standard ($1.25/1M = $0.00000125)
      output_cost_per_token: 0.001         # 100x standard ($10.00/1M = $0.00001)

  - model_name: "claude-sonnet-4-20250514"
    litellm_params:
      model: anthropic/claude-sonnet-4-20250514
      api_key: os.environ/ANTHROPIC_API_KEY
    model_info:
      id: claude-sonnet-4-custom-pricing
      input_cost_per_token: 0.0003         # 100x standard ($0.000003)
      output_cost_per_token: 0.0015         # 100x standard ($0.000015)
  Request: POST /v1/messages
  {
    "model": "claude-sonnet-4-20250514",
    "messages": [
      {
        "role": "user",
        "content": "Say exactly: hello world (ref:18599-9699)"
      }
    ],
    "max_tokens": 20,
    "stream": true
  }

Spend Log record: cost=0.000273, model=anthropic/claude-sonnet-4-20250514, in=21, out=14, custom_pricing=N/A Expected with custom pricing: 0.02730000 Expected with standard pricing: 0.00027300 Actual cost recorded: 0.000273 Actual / Standard ratio: 1.0x

✗ FAIL: Cost is ~1x standard → standard pricing used (BUG!)

Request: POST /v1/responses
  {
    "model": "gpt-5",
    "input": "Say exactly: hello world (ref:5494-4215)",
    "stream": true
  }

Spend Log record: cost=0.002335, model=openai/gpt-5, in=20, out=231, custom_pricing=N/A Expected with custom pricing: 0.23350000 Expected with standard pricing: 0.00233500 Actual cost recorded: 0.002335 Actual / Standard ratio: 1.0x

✗ FAIL: Cost is ~1x standard → standard pricing used (BUG!)

Relevant log output

Agent's code analysis (not validated or attempted to fix):

Regression commit: 1ee43b11de — PR #20679: "[Fix] prevent shared backend model key from being polluted by per-deployment custom pricing"

LiteLLM registers each model deployment in litellm.model_cost under TWO keys:
  - Deployment-specific ID (e.g., gpt-5.2) — with full _model_info including custom pricing
  - Shared base model name (e.g., openai/gpt-5.2 → resolves to gpt-5.2) — used as a fallback lookup

  `v1.81.0 (router.py:5651)`: the shared key got the full _model_info including custom pricing:
  litellm.register_model(model_cost={_model_name: _model_info})

  `v1.81.14 (router.py:6217-6229)`: PR #20679 strips all pricing fields from the shared key to prevent cross-deployment pollution:
  _custom_pricing_fields = CustomPricingLiteLLMParams.model_fields.keys()
  _shared_model_info = {
      k: v for k, v in _model_info.items() if k not in _custom_pricing_fields
  }
  litellm.register_model(model_cost={_model_name: _shared_model_info})

  This is correct in principle (two deployments of the same model with different pricing shouldn't overwrite each other's shared entry), but it breaks all code paths that look up cost by base model name — which includes:

  1. /spend/calculate endpoint — Chat App calls this with completion_response.model = "gpt-5.2", which looks up the now-stripped shared entry
  2. /responses API internal cost tracking — the router puts model_info into kwargs["litellm_metadata"] (via _get_router_metadata_variable_name("generic_api_call") → "litellm_metadata"),but responses/main.py reads metadata (set by proxy with user API key info, always truthy) and never reaches litellm_metadata. So custom_pricing is not detected, and cost calculation falls back to the base model name lookup — which no longer has pricing.
  3. Anthropic /messages API — same _generic_api_call_with_fallbacks → litellm_metadata path, same issue in llm_http_handler.py:1886 where kwargs.get("metadata", {}) gets proxy metadata without model_info

The /v1/chat/completions path works because it uses get_litellm_params(**kwargs) which explicitly passes input_cost_per_token etc. into the logging object's litellm_params, and the router's _acompletion uses _update_kwargs_with_deployment(deployment, kwargs) with no function_name — so model_info goes into kwargs["metadata"] (not litellm_metadata), making custom_pricing detectable.

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.0-rc3, v1.81.14-stable

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to update the code to include custom pricing fields in the shared model info. We can do this by modifying the register_model function to include the custom pricing fields in the shared model info.

  • Update the register_model function to include custom pricing fields:
_custom_pricing_fields = CustomPricingLiteLLMParams.model_fields.keys()
_shared_model_info = {
    k: v for k, v in _model_info.items()
}
# Include custom pricing fields in shared model info
for field in _custom_pricing_fields:
    if field in _model_info:
        _shared_model_info[field] = _model_info[field]
litellm.register_model(model_cost={_model_name: _shared_model_info})
  • Update the responses/main.py and llm_http_handler.py files to use the litellm_metadata instead of metadata to access the model info:
# responses/main.py
model_info = kwargs.get("litellm_metadata", {}).get("model_info")

# llm_http_handler.py
model_info = kwargs.get("litellm_metadata", {}).get("model_info")
  • Update the router.py file to include the model info in the litellm_metadata:
kwargs["litellm_metadata"] = {"model_info": _model_info}

Verification

To verify that the fix worked, you can test the /v1/responses and /v1/messages endpoints with custom pricing and check that the spend log records show the correct cost.

  • Test the /v1/responses endpoint with custom pricing:
Request: POST /v1/responses
{
  "model": "gpt-5",
  "input": "Say exactly: hello world (ref:5494-4215)",
  "stream": true
}
  • Check the spend log record:
Spend Log record: cost=0.23350000, model=openai/gpt-5, in=20, out=231, custom_pricing=0.23350000
  • Test the /v1/messages endpoint with custom pricing:
Request: POST /v1/messages
{
  "model": "claude-sonnet-4-20250514",
  "messages": [
    {
      "role": "user",
      "content": "Say exactly: hello world (ref:18599-9699)"
    }
  ],
  "max_tokens": 20,
  "stream": true
}
  • Check the spend log record:
Spend Log record: cost=0.02730000, model=anthropic/claude-sonnet-4-20250514, in=21, out=14, custom_pricing=0

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Custom Pricing is broken for /messages and /responses endpoints (streaming) [1 pull requests, 1 comments, 1 participants]