- **`stream=false`:** LiteLLM returns `reasoning_content` on the message (already works). - **`stream=true`:** LiteLLM should also return `reasoning_content` in the streamed response (e.g. in `choices[0].delta.reasoning_content` or equivalent in each chunk, or aggregated in the final message), since the Watsonx endpoint already provides it in both modes.

litellm - ✅(Solved) Fix [Bug]: Watsonx returns reasoning_content for both stream=true and stream=false, but LiteLLM only returns it when stream=false [2 pull requests, 1 comments, 2 participants]

UmeshYakkundi · 2026-03-09T12:30:59Z

[litellm] The Watsonx API returns reasoning content for both stream=true and stream=false . LiteLLM correctly returns it when stream=false , but does not retur… The **Watsonx** API returns **`reasoning_content`** for both **`stream=true`** and **`stream=false`**. LiteLLM correctly returns it when **`stream=false`**, but **does not** return it when **`stream=true`**. So the bug is in LiteLLM’s streaming path for Watsonx: reasoning_content is not being forwarded or mapped into the streamed response even though the upstream Watsonx endpoint sends it. # PR #24002: Fix dropped reasoning_content in stream responses for WatsonX and OpenAILike providers - Repository: BerriAI/litellm - Author: lido-alexion - State: closed | merged: False - Link: https://github.com/BerriAI/litellm/pull/24002 ## Description (problem / solution / changelog) **Problem** The generic ModelResponseIterator.chunk_parser() in litellm/llms/databricks/streaming_utils.py extracts text, tool_calls, usage, and finish_reason from streaming chunks — but never extracts reasoning_content. This causes reasoning_content to be silently dropped for all OpenAI-like providers that use the generic iterator (Watsonx, Cerebras, etc.). Non-streaming works correctly because there's explicit extraction at the response level. Fixes https://github.com/BerriAI/litellm/issues/23148 Changes litellm/types/utils.py — Added reasoning_content: Optional[str] to the GenericStreamingChunk TypedDict litellm/llms/databricks/streaming_utils.py — Extract reasoning_content from processed_chunk.choices[0].delta.reasoning_content in chunk_parser() and include it in the returned GenericStreamingChunk litellm/litellm_core_utils/streaming_handler.py — Two changes: In chunk_creator(): propagate reasoning_content from the GenericStreamingChunk to completion_obj so it flows through to the Delta In convert_generic_chunk_to_model_response_stream(): include reasoning_content in the Delta when present Testing Added 3 unit tests in tests/test_litellm/llms/databricks/test_streaming_utils.py: test_chunk_parser_extracts_reasoning_content — verifies extraction when present test_chunk_parser_reasoning_content_none_when_absent — verifies None when absent test_chunk_parser_both_content_and_reasoning — verifies both fields coexist All existing databricks tests continue to pass. ## Changed files - `litellm/litellm_core_utils/streaming_handler.py` (modified, +25/-5) - `litellm/llms/databricks/streaming_utils.py` (modified, +8/-11) - `litellm/types/utils.py` (modified, +1/-0) - `tests/test_litellm/litellm_core_utils/test_streaming_handler.py` (modified, +44/-0) - `tests/test_litellm/llms/databricks/test_streaming_utils.py` (added, +105/-0) --- # PR #24010: Handle reasoning_content in streaming in generic ModelResponseIterator - Repository: BerriAI/litellm - Author: lido-alexion - State: open | merged: False - Link: https://github.com/BerriAI/litellm/pull/24010 ## Description (problem / solution / changelog) **Problem** The generic `ModelResponseIterator.chunk_parser()` in [litellm/llms/databricks/streaming_utils.py](file:///Users/nitid/projects/litellm/litellm/litellm/llms/databricks/streaming_utils.py) extracts text, tool_calls, usage, and finish_reason from streaming chunks — but never extracts [reasoning_content](file:///Users/nitid/projects/litellm/litellm/tests/local_testing/test_streaming.py#3824-3857). This causes [reasoning_content](file:///Users/nitid/projects/litellm/litellm/tests/local_testing/test_streaming.py#3824-3857) to be silently dropped for all OpenAI-like providers that use the generic iterator (Watsonx, Cerebras, etc.). Non-streaming works correctly because there's explicit extraction at the response level. Fixes https://github.com/BerriAI/litellm/issues/23148 **Changes** - [litellm/types/utils.py](file:///Users/nitid/projects/litellm/litellm/litellm/types/utils.py) — Added `reasoning_content: Optional[str]` to the [GenericStreamingChunk](file:///Users/nitid/projects/litellm/litellm/litellm/types/utils.py#267-278) TypedDict - [litellm/llms/databricks/streaming_utils.py](file:///Users/nitid/projects/litellm/litellm/litellm/llms/databricks/streaming_utils.py) — Extract [reasoning_content](file:///Users/nitid/projects/litellm/litellm/tests/local_testing/test_streaming.py#3824-3857) from `processed_chunk.choices[0].delta.reasoning_content` in [chunk_parser()](file:///Users/nitid/projects/litellm/litellm/litellm/llms/databricks/streaming_utils.py#18-79) and include it in the returned [GenericStreamingChunk](file:///Users/nitid/projects/litellm/litellm/litellm/types/utils.py#267-278) - [litellm/litellm_core_utils/streaming_handler.py](file:///Users/nitid/projects/litellm/litellm/litellm/litellm_core_utils/streaming_handler.py) — Propagate [reasoning_content](file:///Users/nitid/projects/litellm/litellm/tests/local_testing/test_streaming.py#3824-3857) through the streaming logic ([is_chunk_non_empty](file:///Users/nitid/projects/litellm/litellm/litellm/litellm_core_utils/streaming_handler

litellm2026-03-09 12:30:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#23148•Fetched 2026-04-08 00:38:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

UmeshYakkundi

Participants

lido-alexion

UmeshYakkundi

Timeline (top)

labeled ×3cross-referenced ×2commented ×1

The Watsonx API returns reasoning_content for both stream=true and stream=false. LiteLLM correctly returns it when stream=false, but does not return it when stream=true. So the bug is in LiteLLM’s streaming path for Watsonx: reasoning_content is not being forwarded or mapped into the streamed response even though the upstream Watsonx endpoint sends it.

Root Cause

Root cause: In the Watsonx provider’s streaming handling, is reasoning_content from the Watsonx stream ever read and written into the LiteLLM stream chunks (e.g. delta.reasoning_content)? If not, that would explain the bug.
Can the Watsonx streaming response path be updated to pass through (or map) reasoning_content so that it appears in the streamed response, consistent with stream=false?
If there is an existing helper or pattern used for other providers (e.g. Bedrock) for streaming reasoning_content, the same approach could be applied to Watsonx.

Fix Action

Fixed

Fixed by PR: Fix dropped reasoning_content in stream responses for WatsonX and OpenAILike providers (https://github.com/BerriAI/litellm/pull/24002)
Fixed by PR: Handle reasoning_content in streaming in generic ModelResponseIterator (https://github.com/BerriAI/litellm/pull/24010)

PR fix notes

PR #24002: Fix dropped reasoning_content in stream responses for WatsonX and OpenAILike providers

Repository: BerriAI/litellm
Author: lido-alexion
State: closed | merged: False
Link: https://github.com/BerriAI/litellm/pull/24002

Description (problem / solution / changelog)

Problem The generic ModelResponseIterator.chunk_parser() in litellm/llms/databricks/streaming_utils.py extracts text, tool_calls, usage, and finish_reason from streaming chunks — but never extracts reasoning_content. This causes reasoning_content to be silently dropped for all OpenAI-like providers that use the generic iterator (Watsonx, Cerebras, etc.).

Non-streaming works correctly because there's explicit extraction at the response level.

Fixes https://github.com/BerriAI/litellm/issues/23148

Changes litellm/types/utils.py — Added reasoning_content: Optional[str] to the GenericStreamingChunk TypedDict litellm/llms/databricks/streaming_utils.py — Extract reasoning_content from processed_chunk.choices[0].delta.reasoning_content in chunk_parser() and include it in the returned GenericStreamingChunk litellm/litellm_core_utils/streaming_handler.py — Two changes: In chunk_creator(): propagate reasoning_content from the GenericStreamingChunk to completion_obj so it flows through to the Delta In convert_generic_chunk_to_model_response_stream(): include reasoning_content in the Delta when present

Testing Added 3 unit tests in tests/test_litellm/llms/databricks/test_streaming_utils.py:

test_chunk_parser_extracts_reasoning_content — verifies extraction when present test_chunk_parser_reasoning_content_none_when_absent — verifies None when absent test_chunk_parser_both_content_and_reasoning — verifies both fields coexist All existing databricks tests continue to pass.

Changed files

litellm/litellm_core_utils/streaming_handler.py (modified, +25/-5)
litellm/llms/databricks/streaming_utils.py (modified, +8/-11)
litellm/types/utils.py (modified, +1/-0)
tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +44/-0)
tests/test_litellm/llms/databricks/test_streaming_utils.py (added, +105/-0)

PR #24010: Handle reasoning_content in streaming in generic ModelResponseIterator

Repository: BerriAI/litellm
Author: lido-alexion
State: open | merged: False
Link: https://github.com/BerriAI/litellm/pull/24010

Description (problem / solution / changelog)

Problem The generic ModelResponseIterator.chunk_parser() in litellm/llms/databricks/streaming_utils.py extracts text, tool_calls, usage, and finish_reason from streaming chunks — but never extracts reasoning_content. This causes reasoning_content to be silently dropped for all OpenAI-like providers that use the generic iterator (Watsonx, Cerebras, etc.).

Non-streaming works correctly because there's explicit extraction at the response level.

Fixes https://github.com/BerriAI/litellm/issues/23148

Changes

litellm/types/utils.py — Added reasoning_content: Optional[str] to the GenericStreamingChunk TypedDict
litellm/llms/databricks/streaming_utils.py — Extract reasoning_content from processed_chunk.choices[0].delta.reasoning_content in chunk_parser() and include it in the returned GenericStreamingChunk
litellm/litellm_core_utils/streaming_handler.py — Propagate reasoning_content through the streaming logic (is_chunk_non_empty, chunk_creator, and convert_generic_chunk_to_model_response_stream) to ensure it flows through to the final Delta.

Testing Added 3 unit tests in tests/test_litellm/llms/databricks/test_streaming_utils.py:

test_chunk_parser_extracts_reasoning_content — verifies extraction when present
test_chunk_parser_reasoning_content_none_when_absent — verifies None when absent
test_chunk_parser_both_content_and_reasoning — verifies both fields coexist Updated tests/test_litellm/litellm_core_utils/test_streaming_handler.py to verify that chunks containing only reasoning_content are correctly identified as non-empty.

Changed files

litellm/litellm_core_utils/streaming_handler.py (modified, +26/-5)
litellm/llms/databricks/streaming_utils.py (modified, +5/-0)
litellm/types/utils.py (modified, +1/-0)
tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +44/-0)
tests/test_litellm/llms/databricks/test_streaming_utils.py (added, +105/-0)

Code Example

model_list:
  - model_name: openai/gpt-oss-120b_si-storage-insights
    litellm_params:
      model: watsonx/openai/gpt-oss-120b
      api_base: os.environ/SI_WATSONX_URL
      api_key: os.environ/SI_WATSONX_APIKEY_1
      project_id: os.environ/SI_WATSONX_PROJECT_ID_1
      supports_function_calling: true

---

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

Description

We use LiteLLM proxy in front of IBM Watsonx with the gpt-oss-120b model (watsonx/openai/gpt-oss-120b). We have confirmed:

Watsonx endpoint: Returns reasoning_content in both streaming and non-streaming responses.
LiteLLM with stream=false: Returns reasoning_content correctly (e.g. in response.choices[0].message.reasoning_content).
LiteLLM with stream=true: Does not return reasoning_content in the streamed chunks (e.g. nothing in choices[0].delta.reasoning_content or equivalent), even though the Watsonx API is sending it.

So the issue is not with the Watsonx API; it is that LiteLLM does not pass through or map reasoning_content when handling Watsonx streaming responses. We’d like this fixed so that streaming responses include reasoning_content in the same way non-streaming does.

We use the proxy with the OpenAI-compatible /v1/chat/completions endpoint; the same behavior applies when using the LiteLLM Python SDK with stream=True and this model.

Our config (snippet from proxy_config.yaml):

model_list:
  - model_name: openai/gpt-oss-120b_si-storage-insights
    litellm_params:
      model: watsonx/openai/gpt-oss-120b
      api_base: os.environ/SI_WATSONX_URL
      api_key: os.environ/SI_WATSONX_APIKEY_1
      project_id: os.environ/SI_WATSONX_PROJECT_ID_1
      supports_function_calling: true

(We use multiple model entries with different credentials; all use watsonx/openai/gpt-oss-120b and exhibit the same streaming behavior.)

Expected behavior

stream=false: LiteLLM returns reasoning_content on the message (already works).
stream=true: LiteLLM should also return reasoning_content in the streamed response (e.g. in choices[0].delta.reasoning_content or equivalent in each chunk, or aggregated in the final message), since the Watsonx endpoint already provides it in both modes.

Actual behavior

stream=false: reasoning_content is present in the response (correct).
stream=true: reasoning_content is missing from the streamed chunks and from the aggregated message. The Watsonx API does send it; LiteLLM is not exposing it in streaming mode.

Steps to Reproduce

Configure LiteLLM (proxy or SDK) with Watsonx and gpt-oss-120b, e.g.:
- Proxy: In proxy_config.yaml, add a model entry with litellm_params.model: watsonx/openai/gpt-oss-120b and the required api_base, api_key, project_id (or env vars).
- SDK: Set model="watsonx/openai/gpt-oss-120b" with valid Watsonx credentials.
Non-streaming (baseline): Send a chat completion with stream=false. Check response.choices[0].message.reasoning_content — it is present (Watsonx sends it, LiteLLM returns it).
Streaming (bug): Send the same request with stream=true:
- Proxy: POST /v1/chat/completions with "stream": true, plus model and messages.
- SDK: completion(..., stream=True).
Consume the stream and inspect each chunk (e.g. choices[0].delta) and any final aggregated message.
Observe: reasoning_content is not present in the streamed deltas or in the aggregated message, even though the Watsonx endpoint returns it when streaming. So the bug is in LiteLLM’s handling of Watsonx streaming responses (reasoning_content is not being forwarded/mapped).

Questions for maintainers

Root cause: In the Watsonx provider’s streaming handling, is reasoning_content from the Watsonx stream ever read and written into the LiteLLM stream chunks (e.g. delta.reasoning_content)? If not, that would explain the bug.
Can the Watsonx streaming response path be updated to pass through (or map) reasoning_content so that it appears in the streamed response, consistent with stream=false?
If there is an existing helper or pattern used for other providers (e.g. Bedrock) for streaming reasoning_content, the same approach could be applied to Watsonx.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.9, v1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of reasoning_content not being returned in the streamed response when using LiteLLM with Watsonx and stream=true, follow these steps:

Update the Watsonx provider in LiteLLM to handle reasoning_content in the streaming response path.
Modify the streaming handling code to read reasoning_content from the Watsonx stream and write it into the LiteLLM stream chunks.

Example code changes:

# In the Watsonx provider's streaming handling code
def handle_streaming_response(self, response):
    # ...
    for chunk in response:
        # Read reasoning_content from the Watsonx stream
        reasoning_content = chunk.get('reasoning_content')
        
        # Write reasoning_content into the LiteLLM stream chunk
        if reasoning_content:
            chunk['delta']['reasoning_content'] = reasoning_content
        # ...

Ensure that the updated code is properly tested to verify that reasoning_content is correctly passed through in the streamed response.

Verification

To verify that the fix worked:

Send a chat completion request with stream=true using the updated LiteLLM version.
Consume the stream and inspect each chunk (e.g. choices[0].delta) and any final aggregated message.
Verify that reasoning_content is present in the streamed deltas and in the aggregated message.

Extra Tips

Review the LiteLLM documentation and code to ensure that the fix is consistent with the existing architecture and design patterns.
Consider adding tests to cover the updated streaming handling code to prevent regressions in the future.
If similar issues are encountered with other providers, apply the same fix to ensure consistency across the platform.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

stream=false: LiteLLM returns reasoning_content on the message (already works).
stream=true: LiteLLM should also return reasoning_content in the streamed response (e.g. in choices[0].delta.reasoning_content or equivalent in each chunk, or aggregated in the final message), since the Watsonx endpoint already provides it in both modes.

#api #ssr #installation #tensor shape #autograd error #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

litellm - ✅(Solved) Fix [Bug]: Watsonx returns reasoning_content for both stream=true and stream=false, but LiteLLM only returns it when stream=false [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #24002: Fix dropped reasoning_content in stream responses for WatsonX and OpenAILike providers

Description (problem / solution / changelog)

Changed files

PR #24010: Handle reasoning_content in streaming in generic ModelResponseIterator

Description (problem / solution / changelog)

Changed files

Code Example

Check for existing issues

What happened?

Summary

Description

Expected behavior

Actual behavior

Steps to Reproduce

Questions for maintainers

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

Fix Plan

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING