vllm - 💡(How to fix) Fix [Bug]: Chat completions emits empty tool_calls arrays after tool results [3 pull requests]

vllm2026-05-31 09:57:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fixed

Fixed by PR: [BugFix] Omit empty tool_calls from OpenAI chat responses (https://github.com/vllm-project/vllm/pull/44105)
Fixed by PR: [BugFix] Omit empty tool_calls from OpenAI chat responses (https://github.com/vllm-project/vllm-ascend/pull/9791)
Fixed by PR: [BugFix][v0.20.2rc] Omit empty tool_calls from OpenAI chat responses (https://github.com/vllm-project/vllm-ascend/pull/9792)

Code Example

while assistant_output.tool_calls is not None:
    tool_call = assistant_output.tool_calls[0]

---

{
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "...final answer...",
        "tool_calls": []
      }
    }
  ]
}

---

{
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "...final answer..."
      }
    }
  ]
}

RAW_BUFFERClick to expand / collapse

Your current environment

Observed on both:

vLLM 0.20.2 pinned checkout: bc150f5
vLLM main/latest checkout tested locally: 7bd738988

🐛 Describe the bug

The OpenAI-compatible chat completions API can serialize a normal assistant response with an empty tool_calls array after the client returns a tool result.

The response is semantically a final assistant message:

finish_reason is "stop"
message.content / stream delta.content contains normal text
there is no tool call to execute

But the JSON payload still contains "tool_calls": []. In the OpenAI Python SDK this becomes a non-None message.tool_calls value, so common client loops treat the response as another tool-call response and then fail when indexing the empty list.

Example client failure:

while assistant_output.tool_calls is not None:
    tool_call = assistant_output.tool_calls[0]

Actual vLLM payload shape after the tool result:

{
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "...final answer...",
        "tool_calls": []
      }
    }
  ]
}

Expected payload shape:

{
  "choices": [
    {
      "finish_reason": "stop",
      "message": {
        "role": "assistant",
        "content": "...final answer..."
      }
    }
  ]
}

The same issue appears in streaming chunks as delta.tool_calls: [] alongside normal text deltas.

This is not a DeepSeek V4 tool-call parser failure and not a model accuracy issue. The model returns the expected final natural-language answer after the tool result; the API response serializer emits a misleading empty tool-call field.

Before submitting a new issue...

I have searched the existing issues and did not find a duplicate.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering