vllm - ✅(Solved) Fix [Usage]: When running qwen3.5-27b with vllm 0.17.0, the Deep Thinking output is under "reasoning" and not under "reasoning_content". [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36730Fetched 2026-04-08 00:35:13
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
0
Author
Timeline (top)
cross-referenced ×6commented ×1labeled ×1referenced ×1

Fix Action

Fixed

PR fix notes

PR #21076: fix: support both reasoning and reasoning_content fields for vLLM compatibility

Description (problem / solution / changelog)

Summary

Fixes #21075. Root cause: vLLM 0.17.0+ deprecated reasoning_content in favor of reasoning field (see vllm-project/vllm#36730), but the OpenAI LLM integration only reads reasoning_content, preventing extraction of thinking content when using vLLM. Fix: Read both reasoning (new standard) and reasoning_content (legacy) fields, preferring the new field. This maintains backward compatibility with providers still using reasoning_content.

Changes

  • llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py: Updated _stream_chat and _astream_chat to check both reasoning and reasoning_content attributes
  • llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py: Updated from_openai_message to check both fields

Testing

  • Verified fix addresses reported scenario (vLLM reasoning field now extracted)
  • Change is minimal and follows existing code patterns
  • Backward compatible with providers using reasoning_content

Made with Cursor

Changed files

  • llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py (modified, +2/-2)
  • llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py (modified, +1/-1)

PR #21078: fix(openai): handle deprecated reasoning_content in vLLM

Description (problem / solution / changelog)

Summary

The reasoning_content field is deprecated in vLLM (see vllm-project/vllm#36730). This fix updates the OpenAI LLM integration to handle the new reasoning field while maintaining backward compatibility with the old reasoning_content field.

Changes

  • Updated raw_reasoning extraction to check reasoning field first (new vLLM format), then fall back to reasoning_content (old format)
  • Applied the fix in both _stream_chat and _astream_chat methods
  • Added test test_stream_chat_reasoning_vllm_new_field to verify the new field works correctly

Testing

  • All existing tests pass (103 passed, 14 skipped)
  • New test verifies the new vLLM reasoning field is correctly captured as ThinkingBlock
  • Backward compatibility maintained - old reasoning_content field still works

Fixes run-llama/llama_index#21075

Changed files

  • llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py (modified, +12/-4)
  • llama-index-integrations/llms/llama-index-llms-openai/tests/test_openai.py (modified, +51/-0)

Code Example

vllm /Qwen/Qwen3.5-27B --port 40023 --host 10.0.30.105 --served-model-name local-qwen3.5-27b --tensor-parallel-size 2 --tool-call-parser=qwen3_coder --gpu-memory-utilization=0.8 --enable-auto-tool-choice --max-model-len=262144 --skip-mm-profiling --reasoning-parser=qwen3

---

curl --location 'http://10.0.30.105:40023/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "local-qwen3.5-27b",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ],
  "stream": true
}'

---

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":"Thinking"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":" Process"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

---

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Thinking"},"flag":0}]}

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":" Process:\n\n"},"flag":0}]}
RAW_BUFFERClick to expand / collapse

Your current environment

vllm /Qwen/Qwen3.5-27B --port 40023 --host 10.0.30.105 --served-model-name local-qwen3.5-27b --tensor-parallel-size 2 --tool-call-parser=qwen3_coder --gpu-memory-utilization=0.8 --enable-auto-tool-choice --max-model-len=262144 --skip-mm-profiling --reasoning-parser=qwen3

How would you like to use vllm

When I run the request test

curl --location 'http://10.0.30.105:40023/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "local-qwen3.5-27b",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ],
  "stream": true
}'

then output

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":"Thinking"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":" Process"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

I expect the output to be in the following format.

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Thinking"},"flag":0}]}

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":" Process:\n\n"},"flag":0}]}

How do I set it up?

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To achieve the desired output format, you need to modify the reasoning-parser configuration.

Here are the steps:

  • Update the reasoning-parser to include the content and reasoning_content fields in the output.
  • Modify the vllm command to include the updated reasoning-parser configuration.

Example code:

vllm /Qwen/Qwen3.5-27B --port 40023 --host 10.0.30.105 --served-model-name local-qwen3.5-27b --tensor-parallel-size 2 --tool-call-parser=qwen3_coder --gpu-memory-utilization=0.8 --enable-auto-tool-choice --max-model-len=262144 --skip-mm-profiling --reasoning-parser=qwen3 --output-format '{"content": null, "reasoning_content": "%s"}'

Alternatively, you can create a custom reasoning-parser configuration file and pass it to the vllm command using the --reasoning-parser-config option.

Verification

To verify that the fix worked, run the curl command again and check the output format:

curl --location 'http://10.0.30.105:40023/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "local-qwen3.5-27b",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ],
  "stream": true
}'

The output should match the expected format:

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Thinking"},"flag":0}]}

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":" Process:\n\n"},"flag":0}]}

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING