vllm - ✅(Solved) Fix [Usage]: When running qwen3.5-27b with vllm 0.17.0, the Deep Thinking output is under "reasoning" and not under "reasoning_content". [3 pull requests, 1 comments, 2 participants]

vllm2026-03-11 02:08:23

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#36730•Fetched 2026-04-08 00:35:13

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AEGEGE

Participants

AEGEGE

chaunceyjiang

Timeline (top)

cross-referenced ×6commented ×1labeled ×1referenced ×1

Fix Action

Fixed

Fixed by PR: fix(openai): handle deprecated reasoning_content in vLLM (https://github.com/run-llama/llama_index/pull/21078)
Fixed by PR: fix(llms-openai): fall back to reasoning field for vLLM compatibility (https://github.com/run-llama/llama_index/pull/21220)

PR fix notes

PR #21076: fix: support both reasoning and reasoning_content fields for vLLM compatibility

Repository: run-llama/llama_index
Author: themavik
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/21076

Description (problem / solution / changelog)

Summary

Fixes #21075. Root cause: vLLM 0.17.0+ deprecated reasoning_content in favor of reasoning field (see vllm-project/vllm#36730), but the OpenAI LLM integration only reads reasoning_content, preventing extraction of thinking content when using vLLM. Fix: Read both reasoning (new standard) and reasoning_content (legacy) fields, preferring the new field. This maintains backward compatibility with providers still using reasoning_content.

Changes

llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py: Updated _stream_chat and _astream_chat to check both reasoning and reasoning_content attributes
llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py: Updated from_openai_message to check both fields

Testing

Verified fix addresses reported scenario (vLLM reasoning field now extracted)
Change is minimal and follows existing code patterns
Backward compatible with providers using reasoning_content

Made with Cursor

Changed files

llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py (modified, +2/-2)
llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/utils.py (modified, +1/-1)

PR #21078: fix(openai): handle deprecated reasoning_content in vLLM

Repository: run-llama/llama_index
Author: hkc5
State: open | merged: False
Link: https://github.com/run-llama/llama_index/pull/21078

Description (problem / solution / changelog)

Summary

The reasoning_content field is deprecated in vLLM (see vllm-project/vllm#36730). This fix updates the OpenAI LLM integration to handle the new reasoning field while maintaining backward compatibility with the old reasoning_content field.

Changes

Updated raw_reasoning extraction to check reasoning field first (new vLLM format), then fall back to reasoning_content (old format)
Applied the fix in both _stream_chat and _astream_chat methods
Added test test_stream_chat_reasoning_vllm_new_field to verify the new field works correctly

Testing

All existing tests pass (103 passed, 14 skipped)
New test verifies the new vLLM reasoning field is correctly captured as ThinkingBlock
Backward compatibility maintained - old reasoning_content field still works

Fixes run-llama/llama_index#21075

Changed files

llama-index-integrations/llms/llama-index-llms-openai/llama_index/llms/openai/base.py (modified, +12/-4)
llama-index-integrations/llms/llama-index-llms-openai/tests/test_openai.py (modified, +51/-0)

Code Example

vllm /Qwen/Qwen3.5-27B --port 40023 --host 10.0.30.105 --served-model-name local-qwen3.5-27b --tensor-parallel-size 2 --tool-call-parser=qwen3_coder --gpu-memory-utilization=0.8 --enable-auto-tool-choice --max-model-len=262144 --skip-mm-profiling --reasoning-parser=qwen3

---

curl --location 'http://10.0.30.105:40023/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "local-qwen3.5-27b",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ],
  "stream": true
}'

---

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":"Thinking"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":" Process"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

---

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Thinking"},"flag":0}]}

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":" Process:\n\n"},"flag":0}]}

RAW_BUFFERClick to expand / collapse

Your current environment

vllm /Qwen/Qwen3.5-27B --port 40023 --host 10.0.30.105 --served-model-name local-qwen3.5-27b --tensor-parallel-size 2 --tool-call-parser=qwen3_coder --gpu-memory-utilization=0.8 --enable-auto-tool-choice --max-model-len=262144 --skip-mm-profiling --reasoning-parser=qwen3

How would you like to use vllm

When I run the request test

curl --location 'http://10.0.30.105:40023/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "local-qwen3.5-27b",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ],
  "stream": true
}'

then output

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":"Thinking"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

data: {"id":"chatcmpl-740c22ad-c984-49a4-9a78-b7bbdc69b3d2","object":"chat.completion.chunk","created":1773161262,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"reasoning":" Process"},"logprobs":null,"finish_reason":null,"token_ids":null}]}

I expect the output to be in the following format.

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Thinking"},"flag":0}]}

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":" Process:\n\n"},"flag":0}]}

How do I set it up?

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

To achieve the desired output format, you need to modify the reasoning-parser configuration.

Here are the steps:

Update the reasoning-parser to include the content and reasoning_content fields in the output.
Modify the vllm command to include the updated reasoning-parser configuration.

Example code:

vllm /Qwen/Qwen3.5-27B --port 40023 --host 10.0.30.105 --served-model-name local-qwen3.5-27b --tensor-parallel-size 2 --tool-call-parser=qwen3_coder --gpu-memory-utilization=0.8 --enable-auto-tool-choice --max-model-len=262144 --skip-mm-profiling --reasoning-parser=qwen3 --output-format '{"content": null, "reasoning_content": "%s"}'

Alternatively, you can create a custom reasoning-parser configuration file and pass it to the vllm command using the --reasoning-parser-config option.

Verification

To verify that the fix worked, run the curl command again and check the output format:

curl --location 'http://10.0.30.105:40023/v1/chat/completions' \
--header 'Content-Type: application/json' \
--data '{
  "model": "local-qwen3.5-27b",
  "messages": [
    {
      "role": "user",
      "content": "你好"
    }
  ],
  "stream": true
}'

The output should match the expected format:

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":"Thinking"},"flag":0}]}

data: {"id":"as-d54pngnrxn","object":"chat.completion.chunk","created":1773151016,"model":"local-qwen3.5-27b","choices":[{"index":0,"delta":{"content":null,"reasoning_content":" Process:\n\n"},"flag":0}]}

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #ISR setup #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Usage]: When running qwen3.5-27b with vllm 0.17.0, the Deep Thinking output is under "reasoning" and not under "reasoning_content". [3 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #21076: fix: support both reasoning and reasoning_content fields for vLLM compatibility

Description (problem / solution / changelog)

Summary

Changes

Testing

Changed files

PR #21078: fix(openai): handle deprecated reasoning_content in vLLM

Description (problem / solution / changelog)

Summary

Changes

Testing

Changed files

Code Example

Your current environment

How would you like to use vllm

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Usage]: When running qwen3.5-27b with vllm 0.17.0, the Deep Thinking output is under "reasoning" and not under "reasoning_content". [3 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #21076: fix: support both reasoning and reasoning_content fields for vLLM compatibility

Description (problem / solution / changelog)

Summary

Changes

Testing

Changed files

PR #21078: fix(openai): handle deprecated reasoning_content in vLLM

Description (problem / solution / changelog)

Summary

Changes

Testing

Changed files

Code Example

Your current environment

How would you like to use vllm

Before submitting a new issue...

extent analysis

Fix Plan

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING