vllm - 💡(How to fix) Fix [Bug]: Not support role tool of image [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When serving Kimi K2.5/K2.6 with vLLM OpenAI-compatible Chat Completions API, I found that multimodal image_url content is accepted in role: "user" messages, but rejected when the same image_url content is returned from a tool message using role: "tool".

The error says that tool message content only supports text content.

I would like to clarify whether this is expected behavior, and if so, whether vLLM plans to support multimodal tool results for vision-capable models such as Kimi K2.5/K2.6.

Error Message

The error says that tool message content only supports text content.

Root Cause

When serving Kimi K2.5/K2.6 with vLLM OpenAI-compatible Chat Completions API, I found that multimodal image_url content is accepted in role: "user" messages, but rejected when the same image_url content is returned from a tool message using role: "tool".

The error says that tool message content only supports text content.

I would like to clarify whether this is expected behavior, and if so, whether vLLM plans to support multimodal tool results for vision-capable models such as Kimi K2.5/K2.6.

Fix Action

Fixed

Code Example

{
  "role": "tool",
  "tool_call_id": "functions.get_image:0",
  "content": [
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png"
      }
    }
  ]
}
RAW_BUFFERClick to expand / collapse

Your current environment

Title

Kimi K2.5/K2.6 vision model rejects image_url content in role: "tool" messages, although image input works in user messages

Description

When serving Kimi K2.5/K2.6 with vLLM OpenAI-compatible Chat Completions API, I found that multimodal image_url content is accepted in role: "user" messages, but rejected when the same image_url content is returned from a tool message using role: "tool".

The error says that tool message content only supports text content.

I would like to clarify whether this is expected behavior, and if so, whether vLLM plans to support multimodal tool results for vision-capable models such as Kimi K2.5/K2.6.

Reproduction

A tool call flow where the assistant calls a tool, and the tool returns an image:

{
  "role": "tool",
  "tool_call_id": "functions.get_image:0",
  "content": [
    {
      "type": "image_url",
      "image_url": {
        "url": "https://example.com/image.png"
      }
    }
  ]
}

🐛 Describe the bug

{ "role": "tool", "tool_call_id": "functions.get_image:0", "content": [ { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } } ] }

Before submitting a new issue...

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Not support role tool of image [1 pull requests]