vllm - ✅(Solved) Fix [Bug]: Streaming output incorrectly mapped to `reasoning` field instead of `content` when `enable_thinking=False` [1 pull requests, 1 participants]

vllm2026-04-21 13:59:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40466•Fetched 2026-04-22 07:45:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

linqiuu

Participants

linqiuu

Assignees

sfeng33

Timeline (top)

assigned ×1cross-referenced ×1labeled ×1

PR fix notes

PR #40579: [Bugfix] Fix streaming output incorrectly mapped to reasoning field instead of content when enable_thinking=False

Repository: vllm-project/vllm
Author: sfeng33
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40579

Description (problem / solution / changelog)

Purpose

Fix https://github.com/vllm-project/vllm/issues/40466 Also added test coverage to CI.

Test Plan

Unit test:

pytest tests/parser/test_streaming.py -v

E2E test:

vllm serve Qwen/Qwen3-0.6B --reasoning-parser deepseek_r1

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true,
    "chat_template_kwargs": {"enable_thinking": false}
  }' | grep -o '"delta":{[^}]*}'

Changed files

.buildkite/test-amd.yaml (modified, +2/-0)
.buildkite/test_areas/misc.yaml (modified, +2/-0)
tests/parser/__init__.py (added, +0/-0)
tests/parser/test_streaming.py (added, +126/-0)
vllm/parser/abstract_parser.py (modified, +5/-4)

Code Example

{'role': 'assistant', 'content': ''}
{'reasoning': 'Hello'}
{'reasoning': '! How can'}
{'reasoning': ' I assist'}
{'reasoning': ' you today?'}
{}

RAW_BUFFERClick to expand / collapse

Your current environment

============================== vLLM Info

ROCM Version : Could not collect vLLM Version : 0.11.1rc6.dev157+g3755c1453 (git sha: 3755c1453) vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled GPU Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X SYS 0-27,56-83 0 N/A GPU1 SYS X 28-55,84-111 1 N/A

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

When starting the vllm/vllm-openai:nightly server without enabling tool calling arguments at startup, and passing chat_template_kwargs={"enable_thinking": False} via the client to explicitly disable reasoning, the streaming output (stream=True) incorrectly maps the generated text into the reasoning field instead of the content field.

Observed chunk delta format:

{'role': 'assistant', 'content': ''}
{'reasoning': 'Hello'}
{'reasoning': '! How can'}
{'reasoning': ' I assist'}
{'reasoning': ' you today?'}
{}

Conditions where the bug occurs:

The vLLM server is started WITHOUT tool calling enabled (e.g., no --enable-auto-tool-choice or --tool-call-parser flags in the Docker run command).
stream=True is used in the request.
chat_template_kwargs={"enable_thinking": False} is passed in the request body.

Conditions where it works CORRECTLY:

Non-streaming mode (stream=False): The output is correctly placed in the content field.
Thinking enabled (omitting the enable_thinking: False flag): The stream correctly separates reasoning and content into their respective fields.

$ sudo docker inspect --format='Created: {{.Created}}' vllm/vllm-openai:nightly Created: 2026-04-19T05:21:21.785332291Z

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be worked around by enabling tool calling or omitting the enable_thinking: False flag when using streaming mode.

Guidance

Verify that the vllm/vllm-openai:nightly server is started with the correct flags, specifically checking if --enable-auto-tool-choice or --tool-call-parser are included.
Test the streaming output with stream=True and chat_template_kwargs={"enable_thinking": True} to see if the issue persists.
If the issue only occurs with enable_thinking: False, consider removing this flag as a temporary workaround.
Check the documentation for any updates or known issues related to streaming mode and tool calling.

Example

No code snippet is provided as the issue seems to be related to configuration and flag settings rather than code.

Notes

The provided information suggests that the issue is specific to the combination of streaming mode, disabled tool calling, and the enable_thinking: False flag. The root cause might be related to how the server handles these conditions, but without further details, it's difficult to provide a definitive fix.

Recommendation

Apply workaround: Omit the enable_thinking: False flag when using streaming mode, as this condition seems to trigger the incorrect mapping of generated text into the reasoning field instead of the content field. This workaround allows the system to function correctly under the given conditions until a more permanent fix can be implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#model download #tokenizer error #prompt formatting #chain error #conversation history

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - ✅(Solved) Fix [Bug]: Streaming output incorrectly mapped to `reasoning` field instead of `content` when `enable_thinking=False` [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #40579: [Bugfix] Fix streaming output incorrectly mapped to reasoning field instead of content when enable_thinking=False

Description (problem / solution / changelog)

Purpose

Test Plan

Changed files

Code Example

Your current environment

============================== vLLM Info

🐛 Describe the bug

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

vllm - ✅(Solved) Fix [Bug]: Streaming output incorrectly mapped to `reasoning` field instead of `content` when `enable_thinking=False` [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #40579: [Bugfix] Fix streaming output incorrectly mapped to reasoning field instead of content when enable_thinking=False

Description (problem / solution / changelog)

Purpose

Test Plan

Changed files

Code Example

Your current environment

============================== vLLM Info

🐛 Describe the bug

🐛 Describe the bug

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING