litellm - ✅(Solved) Fix [Bug]: LiteLLM proxy doesn't include "role" for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true [5 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24221Fetched 2026-04-08 01:09:19
View on GitHub
Comments
2
Participants
2
Timeline
12
Reactions
1
Timeline (top)
cross-referenced ×3labeled ×3referenced ×3commented ×2

Error Message

Our app stopped working after the upgrade to the latest stable version of LiteLLM. We realised this is because the response from /chat/completions in streaming mode was not returning the role attribute in any of the chunks, and this causes an error with the OpenAI frontend library.

Root Cause

Our app stopped working after the upgrade to the latest stable version of LiteLLM. We realised this is because the response from /chat/completions in streaming mode was not returning the role attribute in any of the chunks, and this causes an error with the OpenAI frontend library.

Fix Action

Fix / Workaround

Our app stopped working after the upgrade to the latest stable version of LiteLLM. We realised this is because the response from /chat/completions in streaming mode was not returning the role attribute in any of the chunks, and this causes an error with the OpenAI frontend library.

PR fix notes

PR #24326: fix: ensure role='assistant' in Azure streaming with include_usage

Description (problem / solution / changelog)

Fixes #24221

Relevant issues

Fixes #24221 — LiteLLM proxy doesn't include role for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true

What this PR does

Root cause: In streaming_handler.py chunk_creator(), when original_chunk has no choices (Azure's prompt_filter_results chunk) and include_usage=True, the code returns model_response without calling strip_role_from_delta(). This means:

  1. The empty-choices chunk has no role in its delta
  2. __next__/__anext__ then sets sent_first_chunk=True
  3. When the actual first content chunk arrives with role='assistant' from Azure, strip_role_from_delta() sees sent_first_chunk=True and strips the role

Net result: no chunk ever has role='assistant'.

Fix: Call self.strip_role_from_delta(model_response) before returning model_response at line 1559. This is consistent with the other return paths in chunk_creator (lines 895 and 989) that already call strip_role_from_delta.

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

  • litellm/litellm_core_utils/streaming_handler.py: Changed return model_response to return self.strip_role_from_delta(model_response) in the include_usage empty-choices path
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py: Added test_azure_streaming_role_with_include_usage covering both sync and async iteration with mock Azure chunks

Changed files

  • litellm/litellm_core_utils/streaming_handler.py (modified, +8/-6)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +42/-0)

PR #24354: fix: preserve role='assistant' in Azure streaming with include_usage

Description (problem / solution / changelog)

Relevant issues

Fixes #24221

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

When stream_options.include_usage=True, Azure sends an initial chunk with choices=[] (prompt_filter_results) before the first content chunk. LiteLLM inflated this empty-choices chunk with a default StreamingChoices, which:

  1. Consumed the sent_first_chunk flag
  2. Caused strip_role_from_delta to strip role from the real first chunk
  3. The real first chunk with role='assistant' and content='' was also discarded by is_chunk_non_empty as "empty"

Net result: no chunk ever contained role='assistant'.

Fix (4 files):

  • streaming_handler.py - chunk_creator: Set model_response.choices = [] for chunks without choices, forwarding them faithfully instead of inflating with a default StreamingChoices
  • streaming_handler.py - is_chunk_non_empty: Treat chunks with role in delta as non-empty (the first chunk with role='assistant' and content='' is a valid OpenAI chunk)
  • streaming_handler.py - __next__/__anext__: Guard choices[0] access and only mark sent_first_chunk for chunks with real choices
  • main.py + streaming_chunk_builder_utils.py: Guard choices[0] access in stream_chunk_builder for chunks with choices=[]

Changed files

  • litellm/litellm_core_utils/streaming_chunk_builder_utils.py (modified, +4/-1)
  • litellm/litellm_core_utils/streaming_handler.py (modified, +20/-12)
  • litellm/main.py (modified, +5/-2)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +144/-0)

PR #24374: Litellm staging 03 22 2026

Description (problem / solution / changelog)

Relevant issues

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

  • .github/workflows/test-unit-caching-redis.yml (added, +38/-0)
  • docs/my-website/docs/completion/web_search.md (modified, +84/-0)
  • litellm/litellm_core_utils/core_helpers.py (modified, +3/-0)
  • litellm/litellm_core_utils/prompt_templates/factory.py (modified, +40/-0)
  • litellm/litellm_core_utils/streaming_chunk_builder_utils.py (modified, +4/-1)
  • litellm/litellm_core_utils/streaming_handler.py (modified, +20/-12)
  • litellm/llms/anthropic/experimental_pass_through/adapters/streaming_iterator.py (modified, +0/-4)
  • litellm/llms/gemini/cost_calculator.py (modified, +18/-9)
  • litellm/llms/openai/responses/transformation.py (modified, +54/-2)
  • litellm/llms/vertex_ai/gemini/cost_calculator.py (modified, +11/-28)
  • litellm/llms/vertex_ai/gemini/vertex_and_google_ai_studio_gemini.py (modified, +9/-0)
  • litellm/llms/vertex_ai/gemini_embeddings/batch_embed_content_transformation.py (modified, +15/-10)
  • litellm/main.py (modified, +5/-2)
  • litellm/model_prices_and_context_window_backup.json (modified, +329/-48)
  • model_prices_and_context_window.json (modified, +329/-48)
  • tests/litellm/llms/vertex_ai/test_gemini_batch_embeddings.py (modified, +48/-0)
  • tests/litellm_core_utils/test_bedrock_converse_dedup_factory.py (modified, +131/-0)
  • tests/llm_translation/test_gpt4o_audio.py (modified, +3/-1)
  • tests/local_testing/test_completion.py (modified, +7/-1)
  • tests/local_testing/test_streaming.py (modified, +19/-13)
  • tests/test_litellm/litellm_core_utils/test_core_helpers.py (modified, +8/-0)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +144/-0)
  • tests/test_litellm/llms/anthropic/experimental_pass_through/adapters/test_anthropic_experimental_pass_through_adapters_transformation.py (modified, +151/-0)
  • tests/test_litellm/llms/gemini/test_cost_calculator.py (added, +65/-0)
  • tests/test_litellm/llms/openai/test_gpt5_transformation.py (modified, +128/-0)
  • tests/test_litellm/test_utils.py (modified, +4/-0)

PR #25638: fix: preserve role='assistant' in Azure streaming with include_usage

Description (problem / solution / changelog)

Relevant issues

Fixes #24221

Pre-Submission checklist

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting `@greptileai` and received a Confidence Score of at least 4/5 before requesting a maintainer review

Type

🐛 Bug Fix

Changes

When stream_options.include_usage=True, Azure sends an initial chunk with choices=[] (prompt_filter_results) before the first content chunk. LiteLLM inflated this empty-choices chunk with a default StreamingChoices, which:

  1. Consumed the sent_first_chunk flag
  2. Caused strip_role_from_delta to strip role from the real first chunk
  3. The real first chunk with role='assistant' and content='' was also discarded by is_chunk_non_empty as "empty"

Net result: no chunk ever contained role='assistant'.

Fix (4 files):

  • streaming_handler.py - chunk_creator: Set model_response.choices = [] for chunks without choices, forwarding them faithfully instead of inflating with a default StreamingChoices
  • streaming_handler.py - is_chunk_non_empty: Treat chunks with role in delta as non-empty (the first chunk with role='assistant' and content='' is a valid OpenAI chunk)
  • streaming_handler.py - __next__/__anext__: Guard choices[0] access and only mark sent_first_chunk for chunks with real choices
  • main.py + streaming_chunk_builder_utils.py: Guard choices[0] access in stream_chunk_builder for chunks with choices=[]

Originally filed as #24354 (merged to a staging branch, not main).

Changed files

  • litellm/litellm_core_utils/streaming_chunk_builder_utils.py (modified, +4/-1)
  • litellm/litellm_core_utils/streaming_handler.py (modified, +20/-12)
  • litellm/main.py (modified, +5/-2)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +144/-0)

PR #25636: Litellm ishaan april13

Description (problem / solution / changelog)

Relevant issues

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

Delays in PR merge?

If you're seeing a delay in your PR being merged, ping the LiteLLM Team on Slack (#pr-review).

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Screenshots / Proof of Fix

<!-- Include screenshots, screen recordings, or log output demonstrating that your changes work as expected. For bug fixes: show reproduction before the fix and passing behavior after. For new features: show the feature working end-to-end. For UI changes: include before/after screenshots. -->

Type

<!-- Select the type of Pull Request --> <!-- Keep only the necessary ones -->

🆕 New Feature 🐛 Bug Fix 🧹 Refactoring 📖 Documentation 🚄 Infrastructure ✅ Test

Changes

Changed files

  • docs/my-website/docs/proxy/config_settings.md (modified, +1/-0)
  • docs/my-website/docs/proxy/jwt_key_mapping.md (modified, +10/-6)
  • litellm/constants.py (modified, +5/-0)
  • litellm/litellm_core_utils/streaming_chunk_builder_utils.py (modified, +11/-10)
  • litellm/litellm_core_utils/streaming_handler.py (modified, +31/-18)
  • litellm/llms/anthropic/experimental_pass_through/adapters/transformation.py (modified, +19/-16)
  • litellm/main.py (modified, +3/-2)
  • litellm/proxy/_experimental/mcp_server/mcp_server_manager.py (modified, +6/-0)
  • litellm/proxy/_types.py (modified, +83/-48)
  • litellm/proxy/auth/user_api_key_auth.py (modified, +173/-30)
  • litellm/responses/litellm_completion_transformation/transformation.py (modified, +82/-7)
  • litellm/router_utils/get_retry_from_policy.py (modified, +11/-0)
  • tests/litellm/router_utils/test_get_retry_from_policy.py (added, +39/-0)
  • tests/llm_translation/test_gpt4o_audio.py (modified, +3/-1)
  • tests/proxy_unit_tests/test_jwt_key_mapping.py (modified, +418/-0)
  • tests/router_unit_tests/test_router_helper_utils.py (modified, +1/-0)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +144/-0)
  • tests/test_litellm/proxy/_experimental/mcp_server/test_mcp_server_manager.py (modified, +33/-0)
  • tests/test_litellm/proxy/spend_tracking/test_spend_management_endpoints.py (modified, +29/-23)
  • tests/test_litellm/responses/litellm_completion_transformation/test_litellm_completion_responses.py (modified, +361/-0)
  • uv.lock (modified, +2/-2)

Code Example

model_list:
  - model_name: "mh-gpt-5"
    litellm_params:
      model: azure/mh-gpt-5
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-01-01-preview"
    model_info:
      base_model: "azure/gpt-5"

---

import os
import requests as r

resp_litellm = r.post('http://localhost:4003/chat/completions', headers={'Authorization': 'Bearer '+API_KEY}, json={'model': 'mh-gpt-5', 'messages': [{"role": "user", "content": "hiiiii"}], 'stream': True, 'stream_options': {'include_usage': True}})

print(resp_litellm.text)

---

data: {"id":"","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}]}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"Hi"}}],"service_tier":"default","obfuscation":"Yi6rJep2r4"}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"!"}}],"service_tier":"default","obfuscation":"ajMSXaefOeE"}

# Output truncated as you can already see that "role" is not included in the first chunks as it should be

---

data: {"choices":[],"created":0,"id":"","model":"","object":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{}}]}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"","refusal":null,"role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"obsxLdmrAF","
object":"chat.completion.chunk","service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"Hi"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QExAMfzZky","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QGHVMCDH225","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

# Content truncated

---

Already included above
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Our app stopped working after the upgrade to the latest stable version of LiteLLM. We realised this is because the response from /chat/completions in streaming mode was not returning the role attribute in any of the chunks, and this causes an error with the OpenAI frontend library.

I've managed to create a reproducible curl example of this problem, and it only happens in very specific circumstances:

  • with Azure OpenAI
  • when stream=true
  • and when stream_options={include_usage:true}

Specifically, we tried Anthropic models on Bedrock - the issue wasn't there. We also tried without stream_options.include_usage and the issue wasn't there either.

This issue could easily be related to #20975 but our issue relates to a different endpoint (although it does also involve streaming responses on Azure OpenAI)

We have verified that this is a recent regression, here are the versions we tried:

  • v.1.82.0 ❌ broken
  • v.1.81.12 ❌ broken
  • v.1.80.6 ✅ not broken

Steps to Reproduce

step 1 - have an Azure OpenAI endpoint setup, put it in your config.yaml

This happens with all of the Azure OpenAI models we tried, replace with the config of any model you have deployed.

model_list:
  - model_name: "mh-gpt-5"
    litellm_params:
      model: azure/mh-gpt-5
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-01-01-preview"
    model_info:
      base_model: "azure/gpt-5"

step 2 - start LiteLLM proxy

Make sure to set the right environment variables for your Azure OpenAI deployment

step 3 - do the following web request

I did the web request in Python

(make sure to add your API key if needed)

import os
import requests as r

resp_litellm = r.post('http://localhost:4003/chat/completions', headers={'Authorization': 'Bearer '+API_KEY}, json={'model': 'mh-gpt-5', 'messages': [{"role": "user", "content": "hiiiii"}], 'stream': True, 'stream_options': {'include_usage': True}})

print(resp_litellm.text)

This gave the following output

data: {"id":"","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"index":0,"delta":{}}]}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"Hi"}}],"service_tier":"default","obfuscation":"Yi6rJep2r4"}

data: {"id":"chatcmpl-DLWThNVcb9NOgngdgXAbExAFndmu5","created":1774022108,"model":"mh-gpt-5","object":"chat.completion.chunk","choices":[{"content_filter_results":{},"index":0,"delta":{"content":"!"}}],"service_tier":"default","obfuscation":"ajMSXaefOeE"}

# Output truncated as you can already see that "role" is not included in the first chunks as it should be

For the record, when I make the request direct to Azure OpenAI, I get

data: {"choices":[],"created":0,"id":"","model":"","object":"","prompt_filter_results":[{"prompt_index":0,"content_filter_results":{}}]}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"","refusal":null,"role":"assistant"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"obsxLdmrAF","
object":"chat.completion.chunk","service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"Hi"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QExAMfzZky","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

data: {"choices":[{"content_filter_results":{},"delta":{"content":"!"},"finish_reason":null,"index":0,"logprobs":null}],"created":1774022292,"id":"chatcmpl-DLWWio3KVrHNjUVbsLqOi2RG8wC2s","model":"gpt-5-2025-08-07","obfuscation":"QGHVMCDH225","object":"chat.completion.chunk",
"service_tier":"default","system_fingerprint":null,"usage":null}

# Content truncated

As you can see, LiteLLM proxy is dropping the essential role parameter.

Relevant log output

Already included above

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v.1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of the role attribute not being included in the chunks returned by the LiteLLM proxy, we need to modify the proxy code to include the role in the response.

Here are the steps:

  • Modify the chat/completions endpoint in the LiteLLM proxy to include the role attribute in the response.
  • Update the stream_options handling to ensure that the role is included when include_usage is True.

Example code changes:

# In the chat/completions endpoint handler
def handle_chat_completions(request):
    # ...
    if request.json.get('stream'):
        # ...
        for chunk in response_chunks:
            # Include the role attribute in the chunk
            chunk['choices'][0]['delta']['role'] = 'assistant'
            yield chunk
    # ...

Verification

To verify that the fix worked, you can use the same curl example provided in the issue to test the chat/completions endpoint with stream=true and stream_options={include_usage:true}. The response should now include the role attribute in the chunks.

Example verification code:

import os
import requests as r

resp_litellm = r.post('http://localhost:4003/chat/completions', headers={'Authorization': 'Bearer '+API_KEY}, json={'model': 'mh-gpt-5', 'messages': [{"role": "user", "content": "hiiiii"}], 'stream': True, 'stream_options': {'include_usage': True}})

for line in resp_litellm.iter_lines():
    print(line)
    # Check that the role attribute is included in the chunks
    if b'"role"' in line:
        print("Role attribute found in chunk")
    else:
        print("Role attribute not found in chunk")

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
  • Consider adding additional logging or debugging statements to help diagnose any future issues.
  • If you are using a version control system, make sure to commit the changes and update the version number accordingly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: LiteLLM proxy doesn't include "role" for /chat/completions stream=true with Azure OpenAI and stream_options.include_usage=true [5 pull requests, 2 comments, 2 participants]