litellm - 💡(How to fix) Fix [Bug]: Synthetic include_usage chunk violates OpenAI spec — usage event has non-empty choices instead of choices: []` [2 pull requests]

Root Cause

On StopIteration, when send_stream_usage=True, CustomStreamWrapper builds a synthetic chunk via model_response_creator() and attaches usage from stream_chunk_builder:

# litellm/litellm_core_utils/streaming_handler.py
response = self.model_response_creator()
setattr(response, "usage", getattr(complete_streaming_response, "usage"))
return response

model_response_creator() fills in a default choice when none is provided:

# litellm/litellm_core_utils/streaming_handler.py
model_response.choices = [StreamingChoices(finish_reason=None)]

The usage chunk should set choices = [] before returning.

Code Example

{
  "choices": [],
  "usage": { "prompt_tokens": 13, "completion_tokens": 20, "total_tokens": 33 }
}

---

curl -N 'http://127.0.0.1:4000/v1/chat/completions' \
  -H 'Authorization: Bearer sk-...' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "bedrock-claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

---

# litellm/litellm_core_utils/streaming_handler.py
response = self.model_response_creator()
setattr(response, "usage", getattr(complete_streaming_response, "usage"))
return response

---

# litellm/litellm_core_utils/streaming_handler.py
model_response.choices = [StreamingChoices(finish_reason=None)]

---

Final SSE event before `data: [DONE]` (model `bedrock-claude-sonnet-4-6`, LiteLLM proxy):


data: {"id":"chatcmpl-31aaa6b2-e51f-4b42-88e9-e1be7c084a50","created":1779589160,"model":"bedrock-claude-sonnet-4-6","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":20,"prompt_tokens":13,"total_tokens":33,"completion_tokens_details":{"reasoning_tokens":0,"text_tokens":20},"prompt_tokens_details":{"cached_tokens":0,"text_tokens":13,"cache_creation_tokens":0},"cache_creation_input_tokens":0,"cache_read_input_tokens":0}}

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Related: #8450 (same symptom, closed stale), PR #8751 (proposed fix, not merged).

With stream: true and stream_options: {"include_usage": true}, the OpenAI streaming spec requires the final usage chunk to have choices: []. LiteLLM emits a synthetic usage chunk at stream end with a non-empty choices array (see log output below).

OpenAI's documented shape for the final usage chunk:

{
  "choices": [],
  "usage": { "prompt_tokens": 13, "completion_tokens": 20, "total_tokens": 33 }
}

Spec reference: Chat Completions streaming — include_usage

The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array.

Steps to Reproduce

curl -N 'http://127.0.0.1:4000/v1/chat/completions' \
  -H 'Authorization: Bearer sk-...' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "bedrock-claude-sonnet-4-6",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "stream": true,
    "stream_options": {"include_usage": true}
  }'

Inspect the last JSON event before data: [DONE]. The chunk with non-null usage has choices with one entry ({"index": 0, "delta": {}}) instead of [].

Root cause

On StopIteration, when send_stream_usage=True, CustomStreamWrapper builds a synthetic chunk via model_response_creator() and attaches usage from stream_chunk_builder:

# litellm/litellm_core_utils/streaming_handler.py
response = self.model_response_creator()
setattr(response, "usage", getattr(complete_streaming_response, "usage"))
return response

model_response_creator() fills in a default choice when none is provided:

# litellm/litellm_core_utils/streaming_handler.py
model_response.choices = [StreamingChoices(finish_reason=None)]

The usage chunk should set choices = [] before returning.

Impact

LangChain.js (@langchain/openai, streamUsage: true) double-counts tokens when backed by LiteLLM.

In _streamResponseChunks, LangChain captures usage from any chunk that has it, but only skips delta emission when choices is empty. With OpenAI-direct, the final usage chunk has choices: [], so LangChain records usage once and emits a single trailing usage_metadata chunk. With LiteLLM, the usage chunk has choices: [{index: 0, delta: {}}], so LangChain treats it as a content event (yields an empty delta chunk with usage attached) and emits its own trailing usage chunk. Aggregating streamed chunks (e.g. concat()) then sums token counts twice (~2× prompt/completion/total).

Also affects Dify and other clients that detect the usage-only chunk via len(choices) == 0 (reference).

Relevant log output

Final SSE event before `data: [DONE]` (model `bedrock-claude-sonnet-4-6`, LiteLLM proxy):


data: {"id":"chatcmpl-31aaa6b2-e51f-4b42-88e9-e1be7c084a50","created":1779589160,"model":"bedrock-claude-sonnet-4-6","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}],"usage":{"completion_tokens":20,"prompt_tokens":13,"total_tokens":33,"completion_tokens_details":{"reasoning_tokens":0,"text_tokens":20},"prompt_tokens_details":{"cached_tokens":0,"text_tokens":13,"cache_creation_tokens":0},"cache_creation_input_tokens":0,"cache_read_input_tokens":0}}

choices should be [], not [{"index":0,"delta":{}}].

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.85.0

Twitter / LinkedIn details

No response

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Synthetic include_usage chunk violates OpenAI spec — usage event has non-empty choices instead of choices: []` [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis