vllm - ✅(Solved) Fix [Bug]: Forced tool_choice crashes with AssertionError when reasoning_parser consumes all content [1 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#40528Fetched 2026-04-22 07:44:02
View on GitHub
Comments
2
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
commented ×2cross-referenced ×1referenced ×1

Error Message

Error traceback

No descriptive error message — just a bare AssertionError.

Root Cause

In vllm/entrypoints/openai/engine/serving.py, method _parse_tool_calls_from_content:

# Line 615 — ToolChoiceFunction branch
if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    assert content is not None  # <-- CRASH when reasoning_parser consumed all content

# Line 624 — ChatCompletionNamedToolChoiceParam branch  
elif request.tool_choice and isinstance(request.tool_choice, ChatCompletionNamedToolChoiceParam):
    assert content is not None  # <-- CRASH

Additionally, in vllm/entrypoints/openai/chat_completion/serving.py line 1325:

elif (request.tool_choice and type(request.tool_choice) is ChatCompletionNamedToolChoiceParam):
    assert tool_calls is not None and len(tool_calls) > 0  # <-- CRASH (downstream)

Fix Action

Fixed

PR fix notes

PR #40529: [Bugfix] Fix forced tool_choice crash when reasoning_parser consumes all content

Description (problem / solution / changelog)

Summary

When using --reasoning-parser (e.g., glm45, deepseek_r1) with forced tool_choice, the reasoning parser may consume the entire model output into <think>...</think> tags, leaving content as None. This causes AssertionError crashes in the forced tool_choice code paths.

Fix: Normalize None content to empty string "" (consistent with the existing "required" branch which already does content = content or ""), and replace the downstream assert tool_calls with a defensive fallback.

Changes

FileChange
vllm/entrypoints/openai/engine/serving.pyReplace 2x assert content is not Nonecontent = content or ""
vllm/parser/abstract_parser.pyReplace 2x assert content is not Nonecontent = content or ""
vllm/entrypoints/openai/chat_completion/serving.pyReplace assert tool_calls is not None and len(tool_calls) > 0tool_calls = tool_calls or []

Total: 5 lines changed across 3 files.

How to reproduce

vllm serve <reasoning-model> --reasoning-parser glm45 --enable-auto-tool-choice --tool-call-parser hermes
response = client.chat.completions.create(
    model="<model>",
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=[{"type": "function", "function": {"name": "get_weather", "parameters": {}}}],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)
# Crash: AssertionError (no message) when model output is all <think>...</think>

Why this is not a duplicate

PR #40148 fixes engine/serving.py and abstract_parser.py but does not fix the downstream assert in chat_completion/serving.py (line 1417). This PR covers all 5 crash points across all 3 files.

Fixes #40528 Related: #40147, #40148

Test

The fix is consistent with the existing "required" branch pattern which already uses content = content or "" and tool_calls = tool_calls or [].

AI Disclosure

AI assistance was used in identifying and fixing this bug.

Signed-off-by: liuchenbing [email protected]

Changed files

  • vllm/entrypoints/openai/chat_completion/serving.py (modified, +1/-1)
  • vllm/entrypoints/openai/engine/serving.py (modified, +2/-2)
  • vllm/parser/abstract_parser.py (modified, +2/-2)

Code Example

vllm main branch (upstream/main commit 0008729ab)

---

vllm serve <model> --reasoning-parser glm45 --enable-auto-tool-choice --tool-call-parser hermes

---

import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1")

response = client.chat.completions.create(
    model="<model>",
    messages=[{"role": "user", "content": "What's the weather in Beijing?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
        }
    }],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

---

AssertionError
  File "vllm/entrypoints/openai/engine/serving.py", line 615, in _parse_tool_calls_from_content
    assert content is not None

---

# Line 615ToolChoiceFunction branch
if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    assert content is not None  # <-- CRASH when reasoning_parser consumed all content

# Line 624ChatCompletionNamedToolChoiceParam branch  
elif request.tool_choice and isinstance(request.tool_choice, ChatCompletionNamedToolChoiceParam):
    assert content is not None  # <-- CRASH

---

elif (request.tool_choice and type(request.tool_choice) is ChatCompletionNamedToolChoiceParam):
    assert tool_calls is not None and len(tool_calls) > 0  # <-- CRASH (downstream)

---

if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    if not content:
        logger.warning(
            "Forced tool_choice=%s but content is empty "
            "(reasoning_parser may have consumed all output). "
            "Skipping tool call parsing.",
            request.tool_choice.name,
        )
        return None, content
    # ... original logic

---

elif type(request.tool_choice) is ChatCompletionNamedToolChoiceParam:
    if not tool_calls:
        message = ChatMessage(role=role, reasoning=reasoning, content=content or "")
    else:
        # ... original tool_call construction logic
RAW_BUFFERClick to expand / collapse

Your current environment

vllm main branch (upstream/main commit 0008729ab)

Bug Description

When using --reasoning-parser (e.g., glm45, deepseek_r1) with forced tool_choice, the server crashes with an AssertionError if the reasoning parser consumes the entire model output into <think>...</think> tags, leaving content as None or empty string.

This affects two tool_choice modes:

  1. tool_choice: {type: "function", function: {name: "xxx"}}ToolChoiceFunction path
  2. tool_choice: {function: {name: "xxx"}}ChatCompletionNamedToolChoiceParam path

The "required" mode was partially fixed in a recent commit but the named function modes still crash.

How to reproduce

  1. Start vllm with a reasoning model and tool_choice support:
vllm serve <model> --reasoning-parser glm45 --enable-auto-tool-choice --tool-call-parser hermes
  1. Send a request with forced tool_choice where the model wraps all output in <think> tags:
import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1")

response = client.chat.completions.create(
    model="<model>",
    messages=[{"role": "user", "content": "What's the weather in Beijing?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
        }
    }],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)
  1. If the model produces output like <think>Let me check the weather...</think> with no content outside the think tags, the reasoning parser strips everything into reasoning_content, leaving content = None.

Error traceback

AssertionError
  File "vllm/entrypoints/openai/engine/serving.py", line 615, in _parse_tool_calls_from_content
    assert content is not None

No descriptive error message — just a bare AssertionError.

Root Cause

In vllm/entrypoints/openai/engine/serving.py, method _parse_tool_calls_from_content:

# Line 615 — ToolChoiceFunction branch
if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    assert content is not None  # <-- CRASH when reasoning_parser consumed all content

# Line 624 — ChatCompletionNamedToolChoiceParam branch  
elif request.tool_choice and isinstance(request.tool_choice, ChatCompletionNamedToolChoiceParam):
    assert content is not None  # <-- CRASH

Additionally, in vllm/entrypoints/openai/chat_completion/serving.py line 1325:

elif (request.tool_choice and type(request.tool_choice) is ChatCompletionNamedToolChoiceParam):
    assert tool_calls is not None and len(tool_calls) > 0  # <-- CRASH (downstream)

Suggested Fix

Replace the assert statements with graceful handling — log a warning and return None for tool_calls, allowing the response to be returned as a plain message with reasoning content:

In engine/serving.py:

if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    if not content:
        logger.warning(
            "Forced tool_choice=%s but content is empty "
            "(reasoning_parser may have consumed all output). "
            "Skipping tool call parsing.",
            request.tool_choice.name,
        )
        return None, content
    # ... original logic

In chat_completion/serving.py:

elif type(request.tool_choice) is ChatCompletionNamedToolChoiceParam:
    if not tool_calls:
        message = ChatMessage(role=role, reasoning=reasoning, content=content or "")
    else:
        # ... original tool_call construction logic

Affected code

FileLineIssue
vllm/entrypoints/openai/engine/serving.py615assert content is not None (ToolChoiceFunction)
vllm/entrypoints/openai/engine/serving.py624assert content is not None (ChatCompletionNamedToolChoiceParam)
vllm/entrypoints/openai/chat_completion/serving.py1325assert tool_calls is not None and len(tool_calls) > 0

Before submitting a new issue...

  • I have searched for similar issues and couldn't find anything relevant.
  • I have read the documentation.

extent analysis

TL;DR

Replace the assert statements in vllm/entrypoints/openai/engine/serving.py and vllm/entrypoints/openai/chat_completion/serving.py with graceful handling to log a warning and return None for tool_calls when the reasoning parser consumes all content.

Guidance

  • Identify the lines of code causing the crash: vllm/entrypoints/openai/engine/serving.py lines 615 and 624, and vllm/entrypoints/openai/chat_completion/serving.py line 1325.
  • Replace the assert statements with conditional checks to handle the case where content is None or empty.
  • Log a warning message when the reasoning parser consumes all content, and return None for tool_calls to allow the response to be returned as a plain message with reasoning content.
  • Verify that the changes fix the issue by reproducing the crash with the original code and then applying the changes to see if the crash is resolved.

Example

if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    if not content:
        logger.warning(
            "Forced tool_choice=%s but content is empty "
            "(reasoning_parser may have consumed all output). "
            "Skipping tool call parsing.",
            request.tool_choice.name,
        )
        return None, content
    # ... original logic

Notes

The suggested fix assumes that the reasoning parser consuming all content is a valid scenario and that returning None for tool_calls is the desired behavior. If this is not the case, additional changes may be needed to handle this scenario correctly.

Recommendation

Apply the suggested fix to replace the assert statements with graceful handling, as this will allow the code to handle the case where the reasoning parser consumes all content without crashing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - ✅(Solved) Fix [Bug]: Forced tool_choice crashes with AssertionError when reasoning_parser consumes all content [1 pull requests, 2 comments, 1 participants]