litellm - ✅(Solved) Fix [Bug]: Watsonx returns reasoning_content for both stream=true and stream=false, but LiteLLM only returns it when stream=false [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23148Fetched 2026-04-08 00:38:25
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×3cross-referenced ×2commented ×1

The Watsonx API returns reasoning_content for both stream=true and stream=false. LiteLLM correctly returns it when stream=false, but does not return it when stream=true. So the bug is in LiteLLM’s streaming path for Watsonx: reasoning_content is not being forwarded or mapped into the streamed response even though the upstream Watsonx endpoint sends it.

Root Cause

  1. Root cause: In the Watsonx provider’s streaming handling, is reasoning_content from the Watsonx stream ever read and written into the LiteLLM stream chunks (e.g. delta.reasoning_content)? If not, that would explain the bug.
  2. Can the Watsonx streaming response path be updated to pass through (or map) reasoning_content so that it appears in the streamed response, consistent with stream=false?
  3. If there is an existing helper or pattern used for other providers (e.g. Bedrock) for streaming reasoning_content, the same approach could be applied to Watsonx.

Fix Action

Fixed

PR fix notes

PR #24002: Fix dropped reasoning_content in stream responses for WatsonX and OpenAILike providers

Description (problem / solution / changelog)

Problem The generic ModelResponseIterator.chunk_parser() in litellm/llms/databricks/streaming_utils.py extracts text, tool_calls, usage, and finish_reason from streaming chunks — but never extracts reasoning_content. This causes reasoning_content to be silently dropped for all OpenAI-like providers that use the generic iterator (Watsonx, Cerebras, etc.).

Non-streaming works correctly because there's explicit extraction at the response level.

Fixes https://github.com/BerriAI/litellm/issues/23148

Changes litellm/types/utils.py — Added reasoning_content: Optional[str] to the GenericStreamingChunk TypedDict litellm/llms/databricks/streaming_utils.py — Extract reasoning_content from processed_chunk.choices[0].delta.reasoning_content in chunk_parser() and include it in the returned GenericStreamingChunk litellm/litellm_core_utils/streaming_handler.py — Two changes: In chunk_creator(): propagate reasoning_content from the GenericStreamingChunk to completion_obj so it flows through to the Delta In convert_generic_chunk_to_model_response_stream(): include reasoning_content in the Delta when present

Testing Added 3 unit tests in tests/test_litellm/llms/databricks/test_streaming_utils.py:

test_chunk_parser_extracts_reasoning_content — verifies extraction when present test_chunk_parser_reasoning_content_none_when_absent — verifies None when absent test_chunk_parser_both_content_and_reasoning — verifies both fields coexist All existing databricks tests continue to pass.

Changed files

  • litellm/litellm_core_utils/streaming_handler.py (modified, +25/-5)
  • litellm/llms/databricks/streaming_utils.py (modified, +8/-11)
  • litellm/types/utils.py (modified, +1/-0)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +44/-0)
  • tests/test_litellm/llms/databricks/test_streaming_utils.py (added, +105/-0)

PR #24010: Handle reasoning_content in streaming in generic ModelResponseIterator

Description (problem / solution / changelog)

Problem The generic ModelResponseIterator.chunk_parser() in litellm/llms/databricks/streaming_utils.py extracts text, tool_calls, usage, and finish_reason from streaming chunks — but never extracts reasoning_content. This causes reasoning_content to be silently dropped for all OpenAI-like providers that use the generic iterator (Watsonx, Cerebras, etc.).

Non-streaming works correctly because there's explicit extraction at the response level.

Fixes https://github.com/BerriAI/litellm/issues/23148

Changes

  • litellm/types/utils.py — Added reasoning_content: Optional[str] to the GenericStreamingChunk TypedDict
  • litellm/llms/databricks/streaming_utils.py — Extract reasoning_content from processed_chunk.choices[0].delta.reasoning_content in chunk_parser() and include it in the returned GenericStreamingChunk
  • litellm/litellm_core_utils/streaming_handler.py — Propagate reasoning_content through the streaming logic (is_chunk_non_empty, chunk_creator, and convert_generic_chunk_to_model_response_stream) to ensure it flows through to the final Delta.

Testing Added 3 unit tests in tests/test_litellm/llms/databricks/test_streaming_utils.py:

  • test_chunk_parser_extracts_reasoning_content — verifies extraction when present
  • test_chunk_parser_reasoning_content_none_when_absent — verifies None when absent
  • test_chunk_parser_both_content_and_reasoning — verifies both fields coexist Updated tests/test_litellm/litellm_core_utils/test_streaming_handler.py to verify that chunks containing only reasoning_content are correctly identified as non-empty.

Changed files

  • litellm/litellm_core_utils/streaming_handler.py (modified, +26/-5)
  • litellm/llms/databricks/streaming_utils.py (modified, +5/-0)
  • litellm/types/utils.py (modified, +1/-0)
  • tests/test_litellm/litellm_core_utils/test_streaming_handler.py (modified, +44/-0)
  • tests/test_litellm/llms/databricks/test_streaming_utils.py (added, +105/-0)

Code Example

model_list:
  - model_name: openai/gpt-oss-120b_si-storage-insights
    litellm_params:
      model: watsonx/openai/gpt-oss-120b
      api_base: os.environ/SI_WATSONX_URL
      api_key: os.environ/SI_WATSONX_APIKEY_1
      project_id: os.environ/SI_WATSONX_PROJECT_ID_1
      supports_function_calling: true

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Summary

The Watsonx API returns reasoning_content for both stream=true and stream=false. LiteLLM correctly returns it when stream=false, but does not return it when stream=true. So the bug is in LiteLLM’s streaming path for Watsonx: reasoning_content is not being forwarded or mapped into the streamed response even though the upstream Watsonx endpoint sends it.

Description

We use LiteLLM proxy in front of IBM Watsonx with the gpt-oss-120b model (watsonx/openai/gpt-oss-120b). We have confirmed:

  • Watsonx endpoint: Returns reasoning_content in both streaming and non-streaming responses.
  • LiteLLM with stream=false: Returns reasoning_content correctly (e.g. in response.choices[0].message.reasoning_content).
  • LiteLLM with stream=true: Does not return reasoning_content in the streamed chunks (e.g. nothing in choices[0].delta.reasoning_content or equivalent), even though the Watsonx API is sending it.

So the issue is not with the Watsonx API; it is that LiteLLM does not pass through or map reasoning_content when handling Watsonx streaming responses. We’d like this fixed so that streaming responses include reasoning_content in the same way non-streaming does.

We use the proxy with the OpenAI-compatible /v1/chat/completions endpoint; the same behavior applies when using the LiteLLM Python SDK with stream=True and this model.

Our config (snippet from proxy_config.yaml):

model_list:
  - model_name: openai/gpt-oss-120b_si-storage-insights
    litellm_params:
      model: watsonx/openai/gpt-oss-120b
      api_base: os.environ/SI_WATSONX_URL
      api_key: os.environ/SI_WATSONX_APIKEY_1
      project_id: os.environ/SI_WATSONX_PROJECT_ID_1
      supports_function_calling: true

(We use multiple model entries with different credentials; all use watsonx/openai/gpt-oss-120b and exhibit the same streaming behavior.)

Expected behavior

  • stream=false: LiteLLM returns reasoning_content on the message (already works).
  • stream=true: LiteLLM should also return reasoning_content in the streamed response (e.g. in choices[0].delta.reasoning_content or equivalent in each chunk, or aggregated in the final message), since the Watsonx endpoint already provides it in both modes.

Actual behavior

  • stream=false: reasoning_content is present in the response (correct).
  • stream=true: reasoning_content is missing from the streamed chunks and from the aggregated message. The Watsonx API does send it; LiteLLM is not exposing it in streaming mode.

Steps to Reproduce

  1. Configure LiteLLM (proxy or SDK) with Watsonx and gpt-oss-120b, e.g.:

    • Proxy: In proxy_config.yaml, add a model entry with litellm_params.model: watsonx/openai/gpt-oss-120b and the required api_base, api_key, project_id (or env vars).
    • SDK: Set model="watsonx/openai/gpt-oss-120b" with valid Watsonx credentials.
  2. Non-streaming (baseline): Send a chat completion with stream=false. Check response.choices[0].message.reasoning_content — it is present (Watsonx sends it, LiteLLM returns it).

  3. Streaming (bug): Send the same request with stream=true:

    • Proxy: POST /v1/chat/completions with "stream": true, plus model and messages.
    • SDK: completion(..., stream=True).
  4. Consume the stream and inspect each chunk (e.g. choices[0].delta) and any final aggregated message.

  5. Observe: reasoning_content is not present in the streamed deltas or in the aggregated message, even though the Watsonx endpoint returns it when streaming. So the bug is in LiteLLM’s handling of Watsonx streaming responses (reasoning_content is not being forwarded/mapped).

Questions for maintainers

  1. Root cause: In the Watsonx provider’s streaming handling, is reasoning_content from the Watsonx stream ever read and written into the LiteLLM stream chunks (e.g. delta.reasoning_content)? If not, that would explain the bug.
  2. Can the Watsonx streaming response path be updated to pass through (or map) reasoning_content so that it appears in the streamed response, consistent with stream=false?
  3. If there is an existing helper or pattern used for other providers (e.g. Bedrock) for streaming reasoning_content, the same approach could be applied to Watsonx.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.9, v1.82.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue of reasoning_content not being returned in the streamed response when using LiteLLM with Watsonx and stream=true, follow these steps:

  • Update the Watsonx provider in LiteLLM to handle reasoning_content in the streaming response path.
  • Modify the streaming handling code to read reasoning_content from the Watsonx stream and write it into the LiteLLM stream chunks.

Example code changes:

# In the Watsonx provider's streaming handling code
def handle_streaming_response(self, response):
    # ...
    for chunk in response:
        # Read reasoning_content from the Watsonx stream
        reasoning_content = chunk.get('reasoning_content')
        
        # Write reasoning_content into the LiteLLM stream chunk
        if reasoning_content:
            chunk['delta']['reasoning_content'] = reasoning_content
        # ...
  • Ensure that the updated code is properly tested to verify that reasoning_content is correctly passed through in the streamed response.

Verification

To verify that the fix worked:

  1. Send a chat completion request with stream=true using the updated LiteLLM version.
  2. Consume the stream and inspect each chunk (e.g. choices[0].delta) and any final aggregated message.
  3. Verify that reasoning_content is present in the streamed deltas and in the aggregated message.

Extra Tips

  • Review the LiteLLM documentation and code to ensure that the fix is consistent with the existing architecture and design patterns.
  • Consider adding tests to cover the updated streaming handling code to prevent regressions in the future.
  • If similar issues are encountered with other providers, apply the same fix to ensure consistency across the platform.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • stream=false: LiteLLM returns reasoning_content on the message (already works).
  • stream=true: LiteLLM should also return reasoning_content in the streamed response (e.g. in choices[0].delta.reasoning_content or equivalent in each chunk, or aggregated in the final message), since the Watsonx endpoint already provides it in both modes.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - ✅(Solved) Fix [Bug]: Watsonx returns reasoning_content for both stream=true and stream=false, but LiteLLM only returns it when stream=false [2 pull requests, 1 comments, 2 participants]