vllm - 💡(How to fix) Fix [Bug]: Not support role tool of image [1 pull requests]

vllm2026-05-20 10:11:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When serving Kimi K2.5/K2.6 with vLLM OpenAI-compatible Chat Completions API, I found that multimodal image_url content is accepted in role: "user" messages, but rejected when the same image_url content is returned from a tool message using role: "tool".

The error says that tool message content only supports text content.

I would like to clarify whether this is expected behavior, and if so, whether vLLM plans to support multimodal tool results for vision-capable models such as Kimi K2.5/K2.6.

Error Message

The error says that tool message content only supports text content.

Root Cause

The error says that tool message content only supports text content.

I would like to clarify whether this is expected behavior, and if so, whether vLLM plans to support multimodal tool results for vision-capable models such as Kimi K2.5/K2.6.

Fix Action

Fixed

Fixed by PR: fix(chat): allow multimodal content in tool messages for vision models (https://github.com/vllm-project/vllm/pull/43216)

Code Example

{
  "role": "tool",
  "tool_call_id": "functions.get_image:0",
  "content": [
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png"
      }
    }
  ]
}

RAW_BUFFERClick to expand / collapse

Your current environment

Title

Kimi K2.5/K2.6 vision model rejects image_url content in role: "tool" messages, although image input works in user messages

Description

The error says that tool message content only supports text content.

I would like to clarify whether this is expected behavior, and if so, whether vLLM plans to support multimodal tool results for vision-capable models such as Kimi K2.5/K2.6.

Reproduction

A tool call flow where the assistant calls a tool, and the tool returns an image:

{
  "role": "tool",
  "tool_call_id": "functions.get_image:0",
  "content": [
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png"
      }
    }
  ]
}

🐛 Describe the bug

{ "role": "tool", "tool_call_id": "functions.get_image:0", "content": [ { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } } ] }

Before submitting a new issue...

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]: Not support role tool of image [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Your current environment

Title

Description

Reproduction

🐛 Describe the bug

Before submitting a new issue...

Still need to ship something?

TRENDING