litellm - 💡(How to fix) Fix [Bug]: Auto-routed Chat→Responses path drops usage.cost from streaming response despite include_cost_in_streaming

Code Example

litellm_settings:
  include_cost_in_streaming_usage: true

---

curl -sS -X POST "$LITELLM/v1/chat/completions" \
  -H "Authorization: Bearer $VKEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-5","stream":true,"stream_options":{"include_usage":true},"max_completion_tokens":2000,"messages":[{"role":"user","content":"Reply with the single word: ok"}]}'

---

data: {"id":"...","choices":[{"index":0,"delta":{}}]}

---

data: {"id":"...","usage":{"completion_tokens":4,"prompt_tokens":14,"total_tokens":18,"cost":0.000034,...}}

---

{
  "usage": {
    "completion_tokens": 54,
    "prompt_tokens": 13,
    "total_tokens": 67,
    "completion_tokens_details": {"reasoning_tokens": 0},
    "prompt_tokens_details": {"cached_tokens": 0}
  }
}

---

{
  "models": {
    "anthropic/claude-haiku-4-5": 0.000034,
    "anthropic/claude-opus-4-7": 0.007685,
    "openai/responses/gpt-5": 0.06601175
  }
}

---

if response_data.get("usage"):
    from litellm.responses.utils import ResponseAPILoggingUtils
    usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(
        response_data.get("usage")
    )

What happened?

When litellm_settings.include_cost_in_streaming_usage: true is set together with LITELLM_ROUTE_ALL_CHAT_OPENAI_TO_RESPONSES=true (so OpenAI /v1/chat/completions requests are auto-routed through the Responses API bridge), the final streaming SSE event for OpenAI reasoning models (gpt-5, o1, o3, o4) does not include usage.cost. The cost is tracked internally — /spend/logs returns it correctly under model key openai/responses/gpt-5 — but the streaming response body delivered to the client carries a usage object without a cost field.

The same flag works correctly for anthropic/* (verified claude-haiku-4-5 and claude-opus-4-7 both emit usage.cost on the final SSE event).

Relevant log output / repro

LiteLLM v1.83.14-stable.patch.2, config:

litellm_settings:
  include_cost_in_streaming_usage: true

ENV: LITELLM_ROUTE_ALL_CHAT_OPENAI_TO_RESPONSES=true

Streaming request:

curl -sS -X POST "$LITELLM/v1/chat/completions" \
  -H "Authorization: Bearer $VKEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"openai/gpt-5","stream":true,"stream_options":{"include_usage":true},"max_completion_tokens":2000,"messages":[{"role":"user","content":"Reply with the single word: ok"}]}'

Final pre-[DONE] SSE event (gpt-5):

data: {"id":"...","choices":[{"index":0,"delta":{}}]}

(no usage object at all on the final chunk for some calls; on others a usage block arrives without cost)

Compare with anthropic/claude-haiku-4-5 final SSE event:

data: {"id":"...","usage":{"completion_tokens":4,"prompt_tokens":14,"total_tokens":18,"cost":0.000034,...}}

Non-streaming gpt-5 /v1/chat/completions (also auto-routed) likewise returns usage without cost:

{
  "usage": {
    "completion_tokens": 54,
    "prompt_tokens": 13,
    "total_tokens": 67,
    "completion_tokens_details": {"reasoning_tokens": 0},
    "prompt_tokens_details": {"cached_tokens": 0}
  }
}

/spend/logs query for the same time window:

{
  "models": {
    "anthropic/claude-haiku-4-5": 0.000034,
    "anthropic/claude-opus-4-7": 0.007685,
    "openai/responses/gpt-5": 0.06601175
  }
}

So LiteLLM is computing the cost — it just isn't surfacing it on the response.

Likely root cause

PR #16236 (merged 2025-11-04) added two fixes:

streaming_iterator.py (BaseResponsesAPIStreamingIterator, lines 226-247 in v1.83.14-stable) — on RESPONSE_COMPLETED, calls _response_cost_calculator and setattr(usage_obj, "cost", cost) on the deserialized ResponseAPIUsage object.
responses/utils.py::_transform_response_api_usage_to_chat_usage — preserves the cost attribute if present on the input.

But the auto-routed Chat Completions → Responses path (completion_extras/litellm_responses_transformation/transformation.py:1340-1349) translates each chunk to Chat Completions shape via:

if response_data.get("usage"):
    from litellm.responses.utils import ResponseAPILoggingUtils
    usage = ResponseAPILoggingUtils._transform_response_api_usage_to_chat_usage(
        response_data.get("usage")
    )

Here response_data is the raw chunk dict parsed from the upstream SSE before the iterator's setattr ran on the deserialized object. So response_data.get("usage") is a plain dict that never had cost injected into it; _transform_response_api_usage_to_chat_usage's preservation logic looks for cost on a ResponseAPIUsage(**usage_input) constructed from that dict and finds nothing.

The two cost-injection sites work on different shapes (object vs. dict) and the auto-route translation reads the dict path.

Suggested fix direction

Either:

Inject cost into the raw chunk dict before chunk_parser calls _transform_response_api_usage_to_chat_usage, or
Have the auto-route translation read from the iterator's mutated object instead of the raw dict, or
Move the cost injection earlier so it lands on response_data["usage"] before any downstream translation runs.

Happy to test a candidate fix against our staging gateway and report back.

Are you a ML Ops Team?

What LiteLLM version are you on?

v1.83.14-stable.patch.2

Twitter / LinkedIn details

No response

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Auto-routed Chat→Responses path drops usage.cost from streaming response despite include_cost_in_streaming_usage=true

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Likely root cause

Fix Action

Fix / Workaround

Code Example

What happened?

Relevant log output / repro

Likely root cause

Suggested fix direction

Are you a ML Ops Team?

What LiteLLM version are you on?

Twitter / LinkedIn details

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Auto-routed Chat→Responses path drops usage.cost from streaming response despite include_cost_in_streaming_usage=true

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Likely root cause

Fix Action

Fix / Workaround

Code Example

What happened?

Relevant log output / repro

Likely root cause

Suggested fix direction

Are you a ML Ops Team?

What LiteLLM version are you on?

Twitter / LinkedIn details

Still need to ship something?

RELATED_DISCOVERY

TRENDING