hermes - 💡(How to fix) Fix [Bug] computer_use multimodal tool message causes 400 error on providers that don't support multimodal tool content (e.g. Xiaomi MiMo) [2 pull requests]

hermes2026-05-17 08:02:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When using computer_use with a vision-capable model (e.g. xiaomi/mimo-v2.5), the tool captures a screenshot and returns a _multimodal dict with content as a list containing both text and image_url parts. This list is then set as the content of the role: "tool" message sent to the API.

However, MiMo's API does not accept list-type content in tool messages — it requires content to be a string for role: "tool". This causes a 400 error:

Error code: 400 - {'error': {'code': '400', 'message': 'Param Incorrect', 'param': 'text is not set', 'type': ''}}

Error Message

However, MiMo's API does not accept list-type content in tool messages — it requires content to be a string for role: "tool". This causes a 400 error: Error code: 400 - {'error': {'code': '400', 'message': 'Param Incorrect', 'param': 'text is not set', 'type': ''}}

Root Cause

In run_agent.py, _tool_result_content_for_active_model() (line 9621-9659) checks _model_supports_vision() to decide whether to pass through the multimodal content or fall back to text summary.

_model_supports_vision() (line 9479-9497) uses agent.models_dev.get_model_capabilities() which checks modalities.input from models.dev. For mimo-v2.5, modalities.input = ['text', 'image', 'audio', 'video'], so supports_vision returns True.

The bug: _model_supports_vision() checks if the model supports images in user messages, but doesn't check if it supports images in tool messages. These are different things:

Most OpenAI-compatible providers support images in user messages (via content as a list)
But many providers (including MiMo) require tool message content to be a string, not a list

The OpenAI API spec says tool message content should be a string. Some providers extend this to support multimodal tool messages (Anthropic, GPT-4o), but MiMo does not.

Fix Action

Fixed

Fixed by PR: fix(run_agent): guard multimodal tool content by provider capability (fixes #27344) (https://github.com/NousResearch/hermes-agent/pull/27351)
Fixed by PR: fix(agent): guard multimodal tool content behind provider profile flag (https://github.com/NousResearch/hermes-agent/pull/27597)

Code Example

Error code: 400 - {'error': {'code': '400', 'message': 'Param Incorrect', 'param': 'text is not set', 'type': ''}}

RAW_BUFFERClick to expand / collapse

Description

However, MiMo's API does not accept list-type content in tool messages — it requires content to be a string for role: "tool". This causes a 400 error:

Error code: 400 - {'error': {'code': '400', 'message': 'Param Incorrect', 'param': 'text is not set', 'type': ''}}

Root Cause

In run_agent.py, _tool_result_content_for_active_model() (line 9621-9659) checks _model_supports_vision() to decide whether to pass through the multimodal content or fall back to text summary.

The bug: _model_supports_vision() checks if the model supports images in user messages, but doesn't check if it supports images in tool messages. These are different things:

Most OpenAI-compatible providers support images in user messages (via content as a list)
But many providers (including MiMo) require tool message content to be a string, not a list

The OpenAI API spec says tool message content should be a string. Some providers extend this to support multimodal tool messages (Anthropic, GPT-4o), but MiMo does not.

Reproduction

Configure Hermes with xiaomi/mimo-v2.5 as the main model
Call computer_use(action='capture', mode='som')
The tool returns _multimodal content with image
_tool_result_content_for_active_model returns the content list (because supports_vision=True)
The tool message with list content is sent to MiMo API
MiMo API returns 400: text is not set

Suggested Fix

Add a provider/model-level flag for supports_multimodal_tool_content (or similar) that controls whether multimodal content is allowed in tool messages specifically. Providers that don't support it should always receive string content (the text_summary fallback).

Possible approaches:

Provider-specific flag: Add supports_multimodal_tool_content = False to the xiaomi provider profile
Conservative default: Only use multimodal tool content for providers known to support it (Anthropic, OpenAI), and use text summary for all others
Fallback with retry: Send multimodal content first; if it fails, retry with text summary (but this wastes a round-trip)

Option 2 is the safest and most backward-compatible.

The same issue was previously reported in GitHub issue #27325 (MiMo thinking parameter bug) — that issue is about reasoning_content being stripped, which is a different but related MiMo compatibility issue.
The _tool_result_content_for_active_model method already has the right fallback logic for non-vision models (line 9640-9659), but it doesn't apply to vision-capable models that don't support multimodal tool messages.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #search optimization #API routing #API middleware #SSR setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug] computer_use multimodal tool message causes 400 error on providers that don't support multimodal tool content (e.g. Xiaomi MiMo) [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Description

Root Cause

Reproduction

Suggested Fix

Related

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug] computer_use multimodal tool message causes 400 error on providers that don't support multimodal tool content (e.g. Xiaomi MiMo) [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Code Example

Description

Root Cause

Reproduction

Suggested Fix

Related

Still need to ship something?

RELATED_DISCOVERY

TRENDING