vllm - ✅(Solved) Fix [Bug]: Forced tool_choice crashes with AssertionError when reasoning_parser consumes all content [1 pull requests, 2 comments, 1 participants]

vllm2026-04-21 15:53:08

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

vllm-project/vllm#40528•Fetched 2026-04-22 07:44:02

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liuchenbing2026

Participants

liuchenbing2026

Timeline (top)

commented ×2cross-referenced ×1referenced ×1

Error Message

Error traceback

No descriptive error message — just a bare AssertionError.

Root Cause

In vllm/entrypoints/openai/engine/serving.py, method _parse_tool_calls_from_content:

# Line 615 — ToolChoiceFunction branch
if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    assert content is not None  # <-- CRASH when reasoning_parser consumed all content

# Line 624 — ChatCompletionNamedToolChoiceParam branch  
elif request.tool_choice and isinstance(request.tool_choice, ChatCompletionNamedToolChoiceParam):
    assert content is not None  # <-- CRASH

Additionally, in vllm/entrypoints/openai/chat_completion/serving.py line 1325:

elif (request.tool_choice and type(request.tool_choice) is ChatCompletionNamedToolChoiceParam):
    assert tool_calls is not None and len(tool_calls) > 0  # <-- CRASH (downstream)

Fix Action

Fixed

Fixed by PR: [Bugfix] Fix forced tool_choice crash when reasoning_parser consumes all content (https://github.com/vllm-project/vllm/pull/40529)

PR fix notes

PR #40529: [Bugfix] Fix forced tool_choice crash when reasoning_parser consumes all content

Repository: vllm-project/vllm
Author: liuchenbing2026
State: open | merged: False
Link: https://github.com/vllm-project/vllm/pull/40529

Description (problem / solution / changelog)

Summary

When using --reasoning-parser (e.g., glm45, deepseek_r1) with forced tool_choice, the reasoning parser may consume the entire model output into <think>...</think> tags, leaving content as None. This causes AssertionError crashes in the forced tool_choice code paths.

Fix: Normalize None content to empty string "" (consistent with the existing "required" branch which already does content = content or ""), and replace the downstream assert tool_calls with a defensive fallback.

Changes

File	Change
`vllm/entrypoints/openai/engine/serving.py`	Replace 2x `assert content is not None` → `content = content or ""`
`vllm/parser/abstract_parser.py`	Replace 2x `assert content is not None` → `content = content or ""`
`vllm/entrypoints/openai/chat_completion/serving.py`	Replace `assert tool_calls is not None and len(tool_calls) > 0` → `tool_calls = tool_calls or []`

Total: 5 lines changed across 3 files.

How to reproduce

vllm serve <reasoning-model> --reasoning-parser glm45 --enable-auto-tool-choice --tool-call-parser hermes

response = client.chat.completions.create(
    model="<model>",
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=[{"type": "function", "function": {"name": "get_weather", "parameters": {}}}],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)
# Crash: AssertionError (no message) when model output is all <think>...</think>

Why this is not a duplicate

PR #40148 fixes engine/serving.py and abstract_parser.py but does not fix the downstream assert in chat_completion/serving.py (line 1417). This PR covers all 5 crash points across all 3 files.

Fixes #40528 Related: #40147, #40148

Test

The fix is consistent with the existing "required" branch pattern which already uses content = content or "" and tool_calls = tool_calls or [].

AI Disclosure

AI assistance was used in identifying and fixing this bug.

Signed-off-by: liuchenbing [email protected]

Changed files

vllm/entrypoints/openai/chat_completion/serving.py (modified, +1/-1)
vllm/entrypoints/openai/engine/serving.py (modified, +2/-2)
vllm/parser/abstract_parser.py (modified, +2/-2)

Code Example

vllm main branch (upstream/main commit 0008729ab)

---

vllm serve <model> --reasoning-parser glm45 --enable-auto-tool-choice --tool-call-parser hermes

---

import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1")

response = client.chat.completions.create(
    model="<model>",
    messages=[{"role": "user", "content": "What's the weather in Beijing?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
        }
    }],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

---

AssertionError
  File "vllm/entrypoints/openai/engine/serving.py", line 615, in _parse_tool_calls_from_content
    assert content is not None

---

# Line 615 — ToolChoiceFunction branch
if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    assert content is not None  # <-- CRASH when reasoning_parser consumed all content

# Line 624 — ChatCompletionNamedToolChoiceParam branch  
elif request.tool_choice and isinstance(request.tool_choice, ChatCompletionNamedToolChoiceParam):
    assert content is not None  # <-- CRASH

---

elif (request.tool_choice and type(request.tool_choice) is ChatCompletionNamedToolChoiceParam):
    assert tool_calls is not None and len(tool_calls) > 0  # <-- CRASH (downstream)

---

if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    if not content:
        logger.warning(
            "Forced tool_choice=%s but content is empty "
            "(reasoning_parser may have consumed all output). "
            "Skipping tool call parsing.",
            request.tool_choice.name,
        )
        return None, content
    # ... original logic

---

elif type(request.tool_choice) is ChatCompletionNamedToolChoiceParam:
    if not tool_calls:
        message = ChatMessage(role=role, reasoning=reasoning, content=content or "")
    else:
        # ... original tool_call construction logic

RAW_BUFFERClick to expand / collapse

Your current environment

vllm main branch (upstream/main commit 0008729ab)

Bug Description

When using --reasoning-parser (e.g., glm45, deepseek_r1) with forced tool_choice, the server crashes with an AssertionError if the reasoning parser consumes the entire model output into <think>...</think> tags, leaving content as None or empty string.

This affects two tool_choice modes:

tool_choice: {type: "function", function: {name: "xxx"}} — ToolChoiceFunction path
tool_choice: {function: {name: "xxx"}} — ChatCompletionNamedToolChoiceParam path

The "required" mode was partially fixed in a recent commit but the named function modes still crash.

How to reproduce

Start vllm with a reasoning model and tool_choice support:

vllm serve <model> --reasoning-parser glm45 --enable-auto-tool-choice --tool-call-parser hermes

Send a request with forced tool_choice where the model wraps all output in <think> tags:

import openai
client = openai.OpenAI(base_url="http://localhost:8000/v1")

response = client.chat.completions.create(
    model="<model>",
    messages=[{"role": "user", "content": "What's the weather in Beijing?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "parameters": {"type": "object", "properties": {"city": {"type": "string"}}}
        }
    }],
    tool_choice={"type": "function", "function": {"name": "get_weather"}}
)

If the model produces output like <think>Let me check the weather...</think> with no content outside the think tags, the reasoning parser strips everything into reasoning_content, leaving content = None.

Error traceback

AssertionError
  File "vllm/entrypoints/openai/engine/serving.py", line 615, in _parse_tool_calls_from_content
    assert content is not None

No descriptive error message — just a bare AssertionError.

Root Cause

In vllm/entrypoints/openai/engine/serving.py, method _parse_tool_calls_from_content:

# Line 615 — ToolChoiceFunction branch
if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    assert content is not None  # <-- CRASH when reasoning_parser consumed all content

# Line 624 — ChatCompletionNamedToolChoiceParam branch  
elif request.tool_choice and isinstance(request.tool_choice, ChatCompletionNamedToolChoiceParam):
    assert content is not None  # <-- CRASH

Additionally, in vllm/entrypoints/openai/chat_completion/serving.py line 1325:

elif (request.tool_choice and type(request.tool_choice) is ChatCompletionNamedToolChoiceParam):
    assert tool_calls is not None and len(tool_calls) > 0  # <-- CRASH (downstream)

Suggested Fix

Replace the assert statements with graceful handling — log a warning and return None for tool_calls, allowing the response to be returned as a plain message with reasoning content:

In engine/serving.py:

if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    if not content:
        logger.warning(
            "Forced tool_choice=%s but content is empty "
            "(reasoning_parser may have consumed all output). "
            "Skipping tool call parsing.",
            request.tool_choice.name,
        )
        return None, content
    # ... original logic

In chat_completion/serving.py:

elif type(request.tool_choice) is ChatCompletionNamedToolChoiceParam:
    if not tool_calls:
        message = ChatMessage(role=role, reasoning=reasoning, content=content or "")
    else:
        # ... original tool_call construction logic

Affected code

File	Line	Issue
`vllm/entrypoints/openai/engine/serving.py`	615	`assert content is not None` (ToolChoiceFunction)
`vllm/entrypoints/openai/engine/serving.py`	624	`assert content is not None` (ChatCompletionNamedToolChoiceParam)
`vllm/entrypoints/openai/chat_completion/serving.py`	1325	`assert tool_calls is not None and len(tool_calls) > 0`

Before submitting a new issue...

I have searched for similar issues and couldn't find anything relevant.
I have read the documentation.

extent analysis

TL;DR

Replace the assert statements in vllm/entrypoints/openai/engine/serving.py and vllm/entrypoints/openai/chat_completion/serving.py with graceful handling to log a warning and return None for tool_calls when the reasoning parser consumes all content.

Guidance

Identify the lines of code causing the crash: vllm/entrypoints/openai/engine/serving.py lines 615 and 624, and vllm/entrypoints/openai/chat_completion/serving.py line 1325.
Replace the assert statements with conditional checks to handle the case where content is None or empty.
Log a warning message when the reasoning parser consumes all content, and return None for tool_calls to allow the response to be returned as a plain message with reasoning content.
Verify that the changes fix the issue by reproducing the crash with the original code and then applying the changes to see if the crash is resolved.

Example

if request.tool_choice and isinstance(request.tool_choice, ToolChoiceFunction):
    if not content:
        logger.warning(
            "Forced tool_choice=%s but content is empty "
            "(reasoning_parser may have consumed all output). "
            "Skipping tool call parsing.",
            request.tool_choice.name,
        )
        return None, content
    # ... original logic

Notes

The suggested fix assumes that the reasoning parser consuming all content is a valid scenario and that returning None for tool_calls is the desired behavior. If this is not the case, additional changes may be needed to handle this scenario correctly.

Recommendation

Apply the suggested fix to replace the assert statements with graceful handling, as this will allow the code to handle the case where the reasoning parser consumes all content without crashing.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#database connection #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

vllm - ✅(Solved) Fix [Bug]: Forced tool_choice crashes with AssertionError when reasoning_parser consumes all content [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error traceback

Root Cause

Fix Action

Fixed

PR fix notes

PR #40529: [Bugfix] Fix forced tool_choice crash when reasoning_parser consumes all content

Description (problem / solution / changelog)

Summary

Changes

How to reproduce

Why this is not a duplicate

Test

AI Disclosure

Changed files

Code Example

Your current environment

Bug Description

How to reproduce

Error traceback

Root Cause

Suggested Fix

Affected code

Before submitting a new issue...

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING