litellm - ✅(Solved) Fix [Bug]: reasoning_content not parsed from raw <think> tags for nvidia_nim/minimax-m2.5 via LiteLLM Proxy [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24253Fetched 2026-04-08 01:09:00
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
1
Participants
Timeline (top)
labeled ×3cross-referenced ×2referenced ×2

Fix Action

Fixed

PR fix notes

PR #24276: fix(nvidia_nim): extract <think> reasoning blocks from content in transform_response

Description (problem / solution / changelog)

Summary

Fixes #24253

Problem

NVIDIA NIM reasoning models (e.g. minimax/minimax-m1) return their chain-of-thought inside a raw <think>…</think> block in choices[0].message.content. The expected behaviour is for litellm to split this into reasoning_content + content, as it does for DeepSeek R1 and other reasoning providers. Instead, reasoning_content is null and the raw tags appear in content.

Root cause

The litellm response pipeline already calls _parse_content_for_reasoning() via _extract_reasoning_content() in convert_to_model_response_object(). However, in the proxy path with a thinking: true / merge_reasoning_content_in_choices: false model config, NvidiaNimConfig inherits OpenAIGPTConfig.transform_response() which routes through convert_to_model_response_object() — but that function only invokes _extract_reasoning_content() when the choice message dictionary has a string content. A subtle ordering issue with how the proxy wraps the response means the extraction step can be skipped.

Fix

Override transform_response() in NvidiaNimConfig to add an explicit post-processing pass after the parent class builds the ModelResponse: if reasoning_content is still None and the message content contains <think>…</think>, extract the reasoning using the shared _parse_content_for_reasoning() helper.

After fix

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "\n\npong",
      "reasoning_content": "The user just sent ping..."
    }
  }]
}

Changed files

  • litellm/llms/nvidia_nim/chat/transformation.py (modified, +71/-2)
  • litellm/llms/openrouter/chat/transformation.py (modified, +6/-0)

PR #24280: fix: [Bug]: reasoning_content not parsed from raw <think> tags for nvidia_nim/minimax-m2.5 via LiteLLM Proxy

Description (problem / solution / changelog)

Summary

Fixes #24253

Changes

<!-- Edit this description as needed -->

Test plan

  • Unit tests added
  • make test-unit passes
  • make lint passes

Changed files

  • litellm/litellm_core_utils/prompt_templates/common_utils.py (modified, +2/-2)
  • litellm/litellm_core_utils/streaming_handler.py (modified, +65/-0)
  • tests/llm_translation/test_nvidia_nim.py (modified, +234/-0)

Code Example

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "<think>The user just sent ping...</think>\n\npong",
      "reasoning_content": null
    }
  }]
}

---

model_list:
  - model_name: my-model
    litellm_params:
      model: nvidia_nim/minimax/minimax-m1
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_API_KEY
      thinking: true
      merge_reasoning_content_in_choices: false

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

When using nvidia_nim as the provider with the minimax-m1 model, the <think>...</think> block is returned as raw text inside content instead of being parsed into reasoning_content. Setting thinking: true and merge_reasoning_content_in_choices: false in the model config has no effect. Expected behavior reasoning_content should be populated and content should contain only the final answer, consistent with how other reasoning models behave (e.g. DeepSeek R1 via OpenRouter). Actual behavior

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "<think>The user just sent ping...</think>\n\npong",
      "reasoning_content": null
    }
  }]
}

Config

model_list:
  - model_name: my-model
    litellm_params:
      model: nvidia_nim/minimax/minimax-m1
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_API_KEY
      thinking: true
      merge_reasoning_content_in_choices: false

Steps to Reproduce

  1. Configure proxy with the above config
  2. Send any message to the model
  3. Observe <think> block in raw content instead of reasoning_content

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To resolve the issue, we need to modify the litellm_params configuration to correctly parse the <think> block into reasoning_content.

  • Update the model_list configuration in the YAML file:
model_list:
  - model_name: my-model
    litellm_params:
      model: nvidia_nim/minimax/minimax-m1
      api_base: https://integrate.api.nvidia.com/v1
      api_key: os.environ/NVIDIA_API_KEY
      thinking: true
      merge_reasoning_content_in_choices: true  # Change this to true
      parse_think_blocks: true  # Add this parameter
  • If the parse_think_blocks parameter is not available, you may need to modify the LiteLLM code to handle this case. For example:
# In the LiteLLM codebase, find the function that processes the model response
def process_response(response):
    # ...
    if 'content' in response and '<think>' in response['content']:
        # Parse the <think> block and move it to reasoning_content
        think_block = response['content'].split('<think>')[1].split('</think>')[0]
        response['reasoning_content'] = think_block
        response['content'] = response['content'].replace('<think>' + think_block + '</think>', '')
    # ...

Verification

After applying the fix, send a message to the model and verify that the <think> block is correctly parsed into reasoning_content and removed from the content. The expected response should be:

{
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "pong",
      "reasoning_content": "The user just sent ping..."
    }
  }]
}

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING