vllm - ✅(Solved) Fix [Bug]: Streaming output incorrectly mapped to `reasoning` field instead of `content` when `enable_thinking=False` [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40466Fetched 2026-04-22 07:45:25
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Assignees
Timeline (top)
assigned ×1cross-referenced ×1labeled ×1

PR fix notes

PR #40579: [Bugfix] Fix streaming output incorrectly mapped to reasoning field instead of content when enable_thinking=False

Description (problem / solution / changelog)

Purpose

Fix https://github.com/vllm-project/vllm/issues/40466 Also added test coverage to CI.

Test Plan

Unit test:

pytest tests/parser/test_streaming.py -v

E2E test:

vllm serve Qwen/Qwen3-0.6B --reasoning-parser deepseek_r1

curl -s http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen3-0.6B",
    "messages": [{"role": "user", "content": "Hello"}],
    "stream": true,
    "chat_template_kwargs": {"enable_thinking": false}
  }' | grep -o '"delta":{[^}]*}'

Changed files

  • .buildkite/test-amd.yaml (modified, +2/-0)
  • .buildkite/test_areas/misc.yaml (modified, +2/-0)
  • tests/parser/__init__.py (added, +0/-0)
  • tests/parser/test_streaming.py (added, +126/-0)
  • vllm/parser/abstract_parser.py (modified, +5/-4)

Code Example

{'role': 'assistant', 'content': ''}
{'reasoning': 'Hello'}
{'reasoning': '! How can'}
{'reasoning': ' I assist'}
{'reasoning': ' you today?'}
{}
RAW_BUFFERClick to expand / collapse

Your current environment

============================== vLLM Info

ROCM Version : Could not collect vLLM Version : 0.11.1rc6.dev157+g3755c1453 (git sha: 3755c1453) vLLM Build Flags: CUDA Archs: Not Set; ROCm: Disabled; XPU: Disabled GPU Topology: GPU0 GPU1 CPU Affinity NUMA Affinity GPU NUMA ID GPU0 X SYS 0-27,56-83 0 N/A GPU1 SYS X 28-55,84-111 1 N/A

Legend:

X = Self SYS = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI) NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node PHB = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU) PXB = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge) PIX = Connection traversing at most a single PCIe bridge NV# = Connection traversing a bonded set of # NVLinks

🐛 Describe the bug

🐛 Describe the bug

When starting the vllm/vllm-openai:nightly server without enabling tool calling arguments at startup, and passing chat_template_kwargs={"enable_thinking": False} via the client to explicitly disable reasoning, the streaming output (stream=True) incorrectly maps the generated text into the reasoning field instead of the content field.

Observed chunk delta format:

{'role': 'assistant', 'content': ''}
{'reasoning': 'Hello'}
{'reasoning': '! How can'}
{'reasoning': ' I assist'}
{'reasoning': ' you today?'}
{}

Conditions where the bug occurs:

  • The vLLM server is started WITHOUT tool calling enabled (e.g., no --enable-auto-tool-choice or --tool-call-parser flags in the Docker run command).
  • stream=True is used in the request.
  • chat_template_kwargs={"enable_thinking": False} is passed in the request body.

Conditions where it works CORRECTLY:

  1. Non-streaming mode (stream=False): The output is correctly placed in the content field.
  2. Thinking enabled (omitting the enable_thinking: False flag): The stream correctly separates reasoning and content into their respective fields.

$ sudo docker inspect --format='Created: {{.Created}}' vllm/vllm-openai:nightly Created: 2026-04-19T05:21:21.785332291Z

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

TL;DR

The issue can be worked around by enabling tool calling or omitting the enable_thinking: False flag when using streaming mode.

Guidance

  • Verify that the vllm/vllm-openai:nightly server is started with the correct flags, specifically checking if --enable-auto-tool-choice or --tool-call-parser are included.
  • Test the streaming output with stream=True and chat_template_kwargs={"enable_thinking": True} to see if the issue persists.
  • If the issue only occurs with enable_thinking: False, consider removing this flag as a temporary workaround.
  • Check the documentation for any updates or known issues related to streaming mode and tool calling.

Example

No code snippet is provided as the issue seems to be related to configuration and flag settings rather than code.

Notes

The provided information suggests that the issue is specific to the combination of streaming mode, disabled tool calling, and the enable_thinking: False flag. The root cause might be related to how the server handles these conditions, but without further details, it's difficult to provide a definitive fix.

Recommendation

Apply workaround: Omit the enable_thinking: False flag when using streaming mode, as this condition seems to trigger the incorrect mapping of generated text into the reasoning field instead of the content field. This workaround allows the system to function correctly under the given conditions until a more permanent fix can be implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING