hermes - ✅(Solved) Fix [models.dev] mimo-v2.5-pro incorrectly marked as attachment:true — mimo-2.5-pro is text-only, mimo-2.5 is the omnimodal model [1 pull requests, 1 participants]

hermes2026-05-02 16:09:09

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#18884•Fetched 2026-05-03 04:53:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

dewu0224

Participants

dewu0224

Timeline (top)

labeled ×4cross-referenced ×1

Fix Action

Workaround

Users can set agent.image_input_mode: text in config.yaml to force the text pipeline (vision_analyze pre-analysis) instead of native image attachment.

PR fix notes

PR #18889: fix(models-dev): let modalities.input take precedence over attachment flag

Repository: NousResearch/hermes-agent
Author: liuhao1024
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/18889

Description (problem / solution / changelog)

Problem

The models.dev data marks mimo-v2.5-pro on xiaomi-token-plan-cn with attachment: true, but the model's modalities.input only contains ["text"]. mimo-v2.5-pro is a text-only model — the actual omnimodal model is mimo-2.5.

The current code in get_model_capabilities() uses OR logic:

supports_vision = bool(entry.get("attachment", False)) or "image" in input_mods

This means attachment: true overrides the explicit modalities.input, causing image_routing.py to select native mode and send base64 images to a text-only model that silently drops them.

Fix

When modalities.input is explicitly provided and non-empty, it takes precedence over the attachment flag:

if input_mods:
    supports_vision = "image" in input_mods
else:
    supports_vision = bool(entry.get("attachment", False))

Applied to both get_model_capabilities() and ModelEntry.supports_vision().

Test

Added test_modalities_input_text_only_overrides_attachment covering the exact scenario from #18884.

Fixes

Fixes #18884

Changed files

agent/models_dev.py (modified, +14/-4)
tests/agent/test_models_dev.py (modified, +23/-0)

Code Example

"mimo-v2.5-pro": {
  "attachment": true,
  "modalities": {
    "input": ["text"]
  }
}

---

"mimo-v2.5-pro": {
  "attachment": true,
  "modalities": {
    "input": ["text", "image", "audio", "video", "pdf"]
  }
}

RAW_BUFFERClick to expand / collapse

Problem

The models.dev data marks mimo-v2.5-pro with attachment: true on the xiaomi-token-plan-cn provider, but mimo-2.5-pro is a text-only model. The actual omnimodal model is mimo-2.5 (supports image/audio/video/pdf).

This causes image_routing.py to choose "native" mode for mimo-v2.5-pro, sending base64 images directly to the model. The model silently drops the image data since it does not support multimodal input, and the user receives a response with no image understanding.

Evidence

Fresh data from https://models.dev/api.json:

xiaomi-token-plan-cn / mimo-v2.5-pro:

"mimo-v2.5-pro": {
  "attachment": true,
  "modalities": {
    "input": ["text"]
  }
}

xiaomi-token-plan-cn / mimo-v2.5 (if listed — this is the actual multimodal model): The attachment: true flag should be on mimo-2.5, not mimo-2.5-pro.

xiaomi (main API) / mimo-v2.5-pro:

"mimo-v2.5-pro": {
  "attachment": true,
  "modalities": {
    "input": ["text", "image", "audio", "video", "pdf"]
  }
}

The main xiaomi API entry lists full modalities for mimo-v2.5-pro, which may be correct for that endpoint. But the xiaomi-token-plan-cn entry only has ["text"] in modalities.input yet still has attachment: true.

Model clarification

Model	Type	Vision	Tool calling	Reasoning
mimo-2.5-pro	Text-only	❌	✅	✅
mimo-2.5	Omnimodal	✅ (image/audio/video/pdf)	✅	✅

The Token Plan API (token-plan-cn.xiaomimimo.com) provides access to both models, but they have different capabilities. The current data incorrectly treats mimo-2.5-pro as multimodal.

Impact

In agent/image_routing.py, decide_image_input_mode() checks supports_vision which resolves from attachment in models.dev data. With attachment: true on mimo-2.5-pro, the router selects "native" mode and sends images as base64 content parts. The model ignores the image data, and the user gets no image understanding — the image is silently lost.

Suggested fix

Set attachment: false for mimo-2.5-pro on xiaomi-token-plan-cn (and other Token Plan providers)
Ensure mimo-2.5 (if listed) has attachment: true with full modalities
Alternatively, if the main xiaomi API entry also has mimo-2.5-pro incorrectly marked, fix that too — the modalities.input field already correctly shows ["text"] for Token Plan, so attachment should be false to match

Workaround

Users can set agent.image_input_mode: text in config.yaml to force the text pipeline (vision_analyze pre-analysis) instead of native image attachment.

extent analysis

TL;DR

Update the models.dev data to set attachment: false for mimo-v2.5-pro on xiaomi-token-plan-cn to correctly reflect its text-only capabilities.

Guidance

Verify the models.dev data for xiaomi-token-plan-cn and other Token Plan providers to ensure mimo-v2.5-pro has attachment: false.
Check if mimo-2.5 is listed and has attachment: true with full modalities to support multimodal input.
Consider updating the main xiaomi API entry for mimo-v2.5-pro to reflect its correct capabilities, if necessary.
As a temporary workaround, users can set agent.image_input_mode: text in config.yaml to force the text pipeline.

Example

No code snippet is necessary, as the fix involves updating the models.dev data.

Notes

The suggested fix assumes that the models.dev data is the source of the issue. If the problem persists after updating the data, further investigation may be necessary.

Recommendation

Apply the workaround by setting agent.image_input_mode: text in config.yaml until the models.dev data can be updated, as this will ensure that images are not silently dropped by the model.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix [models.dev] mimo-v2.5-pro incorrectly marked as attachment:true — mimo-2.5-pro is text-only, mimo-2.5 is the omnimodal model [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

PR fix notes

PR #18889: fix(models-dev): let modalities.input take precedence over attachment flag

Description (problem / solution / changelog)

Problem

Fix

Test

Fixes

Changed files

Code Example

Problem

Evidence

Model clarification

Impact

Suggested fix

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix [models.dev] mimo-v2.5-pro incorrectly marked as attachment:true — mimo-2.5-pro is text-only, mimo-2.5 is the omnimodal model [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

PR fix notes

PR #18889: fix(models-dev): let modalities.input take precedence over attachment flag

Description (problem / solution / changelog)

Problem

Fix

Test

Fixes

Changed files

Code Example

Problem

Evidence

Model clarification

Impact

Suggested fix

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING