openclaw - 💡(How to fix) Fix [Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Two related asks in one issue:

  1. Add xiaomi/mimo-v2.5 to the built-in Xiaomi provider catalog — it was released on April 22 2026 and supersedes mimo-v2-omni across every multimodal benchmark.
  2. Auto-route to a multimodal model when the primary model is text-only and the incoming message contains image / video / audio attachments — so users running DeepSeek V4-Pro as their default never have to manually switch models.

Error Message

Failover fires on error. This needs to fire before the request is sent, based on input type inspection — a different code path.

  • Pre-dispatch input-type inspection (not error-triggered failover)

Root Cause

Two related asks in one issue:

  1. Add xiaomi/mimo-v2.5 to the built-in Xiaomi provider catalog — it was released on April 22 2026 and supersedes mimo-v2-omni across every multimodal benchmark.
  2. Auto-route to a multimodal model when the primary model is text-only and the incoming message contains image / video / audio attachments — so users running DeepSeek V4-Pro as their default never have to manually switch models.

Fix Action

Fix / Workaround

OpenClaw should inspect the input capability array of the resolved primary model before dispatching. If the message contains attachment types the primary model does not declare support for, it should automatically re-route that single turn to a configured multimodal fallback, then resume the primary model on subsequent text-only turns.

Primary model supportsMessage containsAction
["text"]text onlydispatch to primary ✅
["text"]image / video / audiore-route turn to multimodalFallback
["text", "image"]imagedispatch to primary ✅
["text", "image"]video or audiore-route turn to multimodalFallback
["text", "image", "video", "audio"]anythingdispatch to primary ✅

Expected behavior when a user sends a message with a video attached:

  • Turn is dispatched to xiaomi/mimo-v2.5
  • Response is returned normally in the same conversation thread
  • Next text-only message resumes on deepseek/deepseek-v4-pro

Code Example

{
 id: "mimo-v2.5",
 name: "Xiaomi MiMo V2.5",
 reasoning: true,
 input: ["text", "image", "video", "audio"],
 cost: { input: 0.40, output: 2.00, cacheRead: 0.04, cacheWrite: 0 },
 contextWindow: 1_048_576,
 maxTokens: 131_072,
},

---

// openclaw.config.json
{
 "agents": {
 "defaults": {
 "model": { "primary": "deepseek/deepseek-v4-pro" }
 }
 }
}

---

{
 "agents": {
 "defaults": {
 "model": {
 "primary": "deepseek/deepseek-v4-pro",
 "multimodalFallback": "xiaomi/mimo-v2.5"
 }
 }
 }
}

---

{
 "env": {
 "DEEPSEEK_API_KEY": "sk-...",
 "XIAOMI_API_KEY": "xm-..."
 },
 "agents": {
 "defaults": {
 "model": {
 "primary": "deepseek/deepseek-v4-pro",
 "multimodalFallback": "xiaomi/mimo-v2.5"
 }
 }
 },
 "models": {
 "mode": "merge",
 "providers": {
 "deepseek": {
 "baseUrl": "https://api.deepseek.com/v1",
 "api": "openai-completions",
 "apiKey": "***"
 },
 "xiaomi": {
 "baseUrl": "https://api.xiaomimimo.com/v1",
 "api": "openai-completions",
 "apiKey": "***",
 "models": [
 {
 "id": "mimo-v2.5",
 "name": "Xiaomi MiMo V2.5",
 "reasoning": true,
 "input": ["text", "image", "video", "audio"],
 "cost": { "input": 0.40, "output": 2.00, "cacheRead": 0.04, "cacheWrite": 0 },
 "contextWindow": 1048576,
 "maxTokens": 131072
 }
 ]
 }
 }
 }
}
RAW_BUFFERClick to expand / collapse

[Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model

Summary

Two related asks in one issue:

  1. Add xiaomi/mimo-v2.5 to the built-in Xiaomi provider catalog — it was released on April 22 2026 and supersedes mimo-v2-omni across every multimodal benchmark.
  2. Auto-route to a multimodal model when the primary model is text-only and the incoming message contains image / video / audio attachments — so users running DeepSeek V4-Pro as their default never have to manually switch models.

Why MiMo-V2.5 instead of MiMo-V2-Omni

Propertymimo-v2-omnimimo-v2.5
Parameters310B total / 15B active (sparse MoE)
Modalitiestext, imagetext, image, video, audio
Context window262,1441,048,576
Max output32,000131,072
Reasoning
Input price$0.40 / 1M tokens
Output price$2.00 / 1M tokens
API model idmimo-v2-omnimimo-v2.5
Base URLhttps://api.xiaomimimo.com/v1same

MiMo-V2.5 trains image, video, and audio encoders jointly from the start — not adapter-bolted-on. It also beats MiMo-V2-Omni on Video-MME (87.7) and MMMU-Pro (77.9), and runs at roughly half the cost of MiMo-V2-Pro.


Requested catalog entry

Add to src/providers/xiaomi/catalog.ts (or equivalent):

{
 id: "mimo-v2.5",
 name: "Xiaomi MiMo V2.5",
 reasoning: true,
 input: ["text", "image", "video", "audio"],
 cost: { input: 0.40, output: 2.00, cacheRead: 0.04, cacheWrite: 0 },
 contextWindow: 1_048_576,
 maxTokens: 131_072,
},

The existing mimo-v2-omni entry should be kept for backward compat but marked as superseded in docs.


Requested behavior: automatic multimodal routing

Problem

When a user sets deepseek/deepseek-v4-pro (or any text-only model) as primary:

// openclaw.config.json
{
 "agents": {
 "defaults": {
 "model": { "primary": "deepseek/deepseek-v4-pro" }
 }
 }
}

…and then attaches an image, video, or audio file to a message, OpenClaw currently either errors out or strips the attachment silently.

Proposed behavior

OpenClaw should inspect the input capability array of the resolved primary model before dispatching. If the message contains attachment types the primary model does not declare support for, it should automatically re-route that single turn to a configured multimodal fallback, then resume the primary model on subsequent text-only turns.

Suggested config surface:

{
 "agents": {
 "defaults": {
 "model": {
 "primary": "deepseek/deepseek-v4-pro",
 "multimodalFallback": "xiaomi/mimo-v2.5"
 }
 }
 }
}

Routing rules

Primary model supportsMessage containsAction
["text"]text onlydispatch to primary ✅
["text"]image / video / audiore-route turn to multimodalFallback
["text", "image"]imagedispatch to primary ✅
["text", "image"]video or audiore-route turn to multimodalFallback
["text", "image", "video", "audio"]anythingdispatch to primary ✅

The multimodal fallback response should be injected into the conversation history as an assistant message attributed to the primary model alias, so the context is seamless.

Why not just use failover?

Failover fires on error. This needs to fire before the request is sent, based on input type inspection — a different code path.


Minimal reproducible config (for testing)

{
 "env": {
 "DEEPSEEK_API_KEY": "sk-...",
 "XIAOMI_API_KEY": "xm-..."
 },
 "agents": {
 "defaults": {
 "model": {
 "primary": "deepseek/deepseek-v4-pro",
 "multimodalFallback": "xiaomi/mimo-v2.5"
 }
 }
 },
 "models": {
 "mode": "merge",
 "providers": {
 "deepseek": {
 "baseUrl": "https://api.deepseek.com/v1",
 "api": "openai-completions",
 "apiKey": "***"
 },
 "xiaomi": {
 "baseUrl": "https://api.xiaomimimo.com/v1",
 "api": "openai-completions",
 "apiKey": "***",
 "models": [
 {
 "id": "mimo-v2.5",
 "name": "Xiaomi MiMo V2.5",
 "reasoning": true,
 "input": ["text", "image", "video", "audio"],
 "cost": { "input": 0.40, "output": 2.00, "cacheRead": 0.04, "cacheWrite": 0 },
 "contextWindow": 1048576,
 "maxTokens": 131072
 }
 ]
 }
 }
 }
}

Expected behavior when a user sends a message with a video attached:

  • Turn is dispatched to xiaomi/mimo-v2.5
  • Response is returned normally in the same conversation thread
  • Next text-only message resumes on deepseek/deepseek-v4-pro

References


Checklist

  • Add mimo-v2.5 to built-in Xiaomi catalog (input: ["text","image","video","audio"])
  • Add mimo-v2.5 to onboarding model selection UI
  • Implement multimodalFallback key in agent model config
  • Pre-dispatch input-type inspection (not error-triggered failover)
  • Inject fallback response into history under primary model alias
  • Update https://docs.openclaw.ai/providers/xiaomi with V2.5 catalog row
  • Add migration note: V2.5 supersedes V2-Omni for new installs

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature] Add MiMo-V2.5 to Xiaomi catalog + automatic multimodal routing when DeepSeek V4-Pro is primary model