litellm - ✅(Solved) Fix [Bug]: LiteLLM proxy doesn't include "role" for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true [5 pull requests, 2 comments, 2 participants]

adam-carruthers · 2026-03-20T16:37:58Z

[litellm] PR 24326: fix: ensure role='assistant' in Azure streaming with include usage - Repository: BerriAI/litellm - Author: majiayu000 - State: open | merge… # PR #24326: fix: ensure role='assistant' in Azure streaming with include_usage - Repository: BerriAI/litellm - Author: majiayu000 - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/24326 ## Description (problem / solution / changelog) Fixes #24221 ## Relevant issues Fixes #24221 — LiteLLM proxy doesn't include `role` for `/chat/completions` `stream=true` with Azure OpenAI and `stream_options.include_usage=true` ## What this PR does **Root cause:** In `streaming_handler.py` `chunk_creator()`, when `original_chunk` has no choices (Azure's `prompt_filter_results` chunk) and `include_usage=True`, the code returns `model_response` without calling `strip_role_from_delta()`. This means: 1. The empty-choices chunk has no `role` in its delta 2. `__next__`/`__anext__` then sets `sent_first_chunk=True` 3. When the actual first content chunk arrives with `role='assistant'` from Azure, `strip_role_from_delta()` sees `sent_first_chunk=True` and strips the role Net result: no chunk ever has `role='assistant'`. **Fix:** Call `self.strip_role_from_delta(model_response)` before returning `model_response` at line 1559. This is consistent with the other return paths in `chunk_creator` (lines 895 and 989) that already call `strip_role_from_delta`. ## Pre-Submission checklist - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Type 🐛 Bug Fix ## Changes - `litellm/litellm_core_utils/streaming_handler.py`: Changed `return model_response` to `return self.strip_role_from_delta(model_response)` in the `include_usage` empty-choices path - `tests/test_litellm/litellm_core_utils/test_streaming_handler.py`: Added `test_azure_streaming_role_with_include_usage` covering both sync and async iteration with mock Azure chunks ## Changed files - `litellm/litellm_core_utils/streaming_handler.py` (modified, +8/-6) - `tests/test_litellm/litellm_core_utils/test_streaming_handler.py` (modified, +42/-0) --- # PR #24354: fix: preserve role='assistant' in Azure streaming with include_usage - Repository: BerriAI/litellm - Author: Chesars - State: closed | merged: True - Link: https://github.com/BerriAI/litellm/pull/24354 ## Description (problem / solution / changelog) ## Relevant issues Fixes #24221 ## Pre-Submission checklist - [x] I have Added testing in the [`tests/test_litellm/`](https://github.com/BerriAI/litellm/tree/main/tests/test_litellm) directory, **Adding at least 1 test is a hard requirement** - [see details](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR passes all unit tests on [`make test-unit`](https://docs.litellm.ai/docs/extras/contributing_code) - [x] My PR's scope is as isolated as possible, it only solves 1 specific problem - [ ] I have requested a Greptile review by commenting `@greptileai` and received a **Confidence Score of at least 4/5** before requesting a maintainer review ## Type 🐛 Bug Fix ## Changes When `stream_options.include_usage=True`, Azure sends an initial chunk with `choices=[]` (prompt_filter_results) before the first content chunk. LiteLLM inflated this empty-choices chunk with a default `StreamingChoices`, which: 1. Consumed the `sent_first_chunk` flag 2. Caused `strip_role_from_delta` to strip `role` from the real first chunk 3. The real first chunk with `role='assistant'` and `content=''` was also discarded by `is_chunk_non_empty` as "empty" Net result: no chunk ever contained `role='assistant'`. **Fix (4 files):** - **`streaming_handler.py` - `chunk_creator`**: Set `model_response.choices = []` for chunks without choices, forwarding them faithfully instead of inflating with a default `StreamingChoices` - **`streaming_handler.py` - `is_chunk_non_empty`**: Treat chunks with `role` in delta as non-empty (the first chunk with `role='assistant'` and `content=''` is a valid OpenAI chunk) - **`streaming_handler.py` - `__next__`/`__anext__`**: Guard `choices[0]` access and only mark `sent_first_chunk` for chunks with real choices - **`main.py` + `streaming_chunk_builder_utils.py`**: Guard `choices[0]` access in `stream_chunk_builder` for chunks with `choices=[]` ## Changed files - `litellm/litellm_core_utils/streaming_chunk_builder_utils.py` (modified, +4/-1) - `litellm/litellm_core_utils/streaming_handler.py` (modified, +20/-12) - `litellm/main.py` (modified, +5/

litellm2026-03-20 16:37:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24221•Fetched 2026-04-08 01:09:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

adam-carruthers

Participants

adam-carruthers

majiayu000

Timeline (top)

cross-referenced ×3labeled ×3referenced ×3commented ×2

Error Message

Our app stopped working after the upgrade to the latest stable version of LiteLLM. We realised this is because the response from /chat/completions in streaming mode was not returning the role attribute in any of the chunks, and this causes an error with the OpenAI frontend library.

Root Cause

Code Example

model_list:
  - model_name: "mh-gpt-5"
    litellm_params:
      model: azure/mh-gpt-5
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-01-01-preview"
    model_info:
      base_model: "azure/gpt-5"

---

import os
import requests as r

resp_litellm = r.post('http://localhost:4003/chat/completions', headers={'Authorization': 'Bearer '+API_KEY}, json={'model': 'mh-gpt-5', 'messages': [{"role": "user", "content": "hiiiii"}], 'stream': True, 'stream_options': {'include_usage': True}})

print(resp_litellm.text)

---

data: {"id":"","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}]}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"Hi"}}],"service_tier":"default","obfuscation":"Yi6rJep2r4"}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"!"}}],"service_tier":"default","obfuscation":"ajMSXaefOeE"}

# Output truncated as you can already see that "role" is not included in the first chunks as it should be

---

data: {"choices":[],"created":0,"id":"","model":"","object":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{}}]}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"","refusal":null,"role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"obsxLdmrAF","
object":"chat.completion.chunk","service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"Hi"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QExAMfzZky","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QGHVMCDH225","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

# Content truncated

---

Already included above

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

I've managed to create a reproducible curl example of this problem, and it only happens in very specific circumstances:

with Azure OpenAI
when stream=true
and when stream_options={include_usage:true}

Specifically, we tried Anthropic models on Bedrock - the issue wasn't there. We also tried without stream_options.include_usage and the issue wasn't there either.

This issue could easily be related to #20975 but our issue relates to a different endpoint (although it does also involve streaming responses on Azure OpenAI)

We have verified that this is a recent regression, here are the versions we tried:

v.1.82.0 ❌ broken
v.1.81.12 ❌ broken
v.1.80.6 ✅ not broken

Steps to Reproduce

step 1 - have an Azure OpenAI endpoint setup, put it in your config.yaml

This happens with all of the Azure OpenAI models we tried, replace with the config of any model you have deployed.

model_list:
  - model_name: "mh-gpt-5"
    litellm_params:
      model: azure/mh-gpt-5
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-01-01-preview"
    model_info:
      base_model: "azure/gpt-5"

step 2 - start LiteLLM proxy

Make sure to set the right environment variables for your Azure OpenAI deployment

step 3 - do the following web request

I did the web request in Python

(make sure to add your API key if needed)

import os
import requests as r

resp_litellm = r.post('http://localhost:4003/chat/completions', headers={'Authorization': 'Bearer '+API_KEY}, json={'model': 'mh-gpt-5', 'messages': [{"role": "user", "content": "hiiiii"}], 'stream': True, 'stream_options': {'include_usage': True}})

print(resp_litellm.text)

This gave the following output

data: {"id":"","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}]}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"Hi"}}],"service_tier":"default","obfuscation":"Yi6rJep2r4"}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"!"}}],"service_tier":"default","obfuscation":"ajMSXaefOeE"}

# Output truncated as you can already see that "role" is not included in the first chunks as it should be

For the record, when I make the request direct to Azure OpenAI, I get

data: {"choices":[],"created":0,"id":"","model":"","object":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{}}]}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"","refusal":null,"role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"obsxLdmrAF","
object":"chat.completion.chunk","service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"Hi"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QExAMfzZky","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QGHVMCDH225","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

# Content truncated

As you can see, LiteLLM proxy is dropping the essential role parameter.

Relevant log output

Already included above

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v.1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of the role attribute not being included in the chunks returned by the LiteLLM proxy, we need to modify the proxy code to include the role in the response.

Here are the steps:

Modify the chat/completions endpoint in the LiteLLM proxy to include the role attribute in the response.
Update the stream_options handling to ensure that the role is included when include_usage is True.

Example code changes:

# In the chat/completions endpoint handler
def handle_chat_completions(request):
    # ...
    if request.json.get('stream'):
        # ...
        for chunk in response_chunks:
            # Include the role attribute in the chunk
            chunk['choices'][0]['delta']['role'] = 'assistant'
            yield chunk
    # ...

Verification

To verify that the fix worked, you can use the same curl example provided in the issue to test the chat/completions endpoint with stream=true and stream_options={include_usage:true}. The response should now include the role attribute in the chunks.

Example verification code:

import os
import requests as r

resp_litellm = r.post('http://localhost:4003/chat/completions', headers={'Authorization': 'Bearer '+API_KEY}, json={'model': 'mh-gpt-5', 'messages': [{"role": "user", "content": "hiiiii"}], 'stream': True, 'stream_options': {'include_usage': True}})

for line in resp_litellm.iter_lines():
    print(line)
    # Check that the role attribute is included in the chunks
    if b'"role"' in line:
        print("Role attribute found in chunk")
    else:
        print("Role attribute not found in chunk")

Extra Tips

Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
Consider adding additional logging or debugging statements to help diagnose any future issues.
If you are using a version control system, make sure to commit the changes and update the version number accordingly.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #environment variable #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: LiteLLM proxy doesn't include "role" for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true [5 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #24326: fix: ensure role='assistant' in Azure streaming with include_usage

Description (problem / solution / changelog)

Relevant issues

What this PR does

Pre-Submission checklist

Type

Changes

Changed files

PR #24354: fix: preserve role='assistant' in Azure streaming with include_usage

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Changes

Changed files

PR #24374: Litellm staging 03 22 2026

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Type

Changes

Changed files

PR #25638: fix: preserve role='assistant' in Azure streaming with include_usage

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Type

Changes

Changed files

PR #25636: Litellm ishaan april13

Description (problem / solution / changelog)

Relevant issues

Pre-Submission checklist

Delays in PR merge?

CI (LiteLLM team)

Screenshots / Proof of Fix

Type

Changes

Changed files

Code Example

Check for existing issues

What happened?

Steps to Reproduce

step 1 - have an Azure OpenAI endpoint setup, put it in your config.yaml

step 2 - start LiteLLM proxy

step 3 - do the following web request

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING