openclaw - 💡(How to fix) Fix Native Xiaomi MiMo Ecosystem Integration (Image, TTS, Full Provider Support) [1 comments, 1 participants]

M-Lietz · 2026-03-25T08:44:10Z

[openclaw] Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integrati… Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integration — similar to how Anthropic, OpenAI, and Google are supported. ## Feature Request: Native Xiaomi MiMo Ecosystem Integration ### Summary Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integration — similar to how Anthropic, OpenAI, and Google are supported. ### Background Xiaomi's MiMo platform (platform.xiaomimimo.com) offers an OpenAI-compatible API with several models: - **MiMo-V2-Pro** — Primary reasoning model (1M context) - **MiMo-V2-Flash** — Budget reasoning model (262K context) - **MiMo-V2-Omni** — Multimodal model (text + image understanding) - **MiMo-V2-TTS** — Text-to-speech model The API base URL is `https://api.xiaomimimo.com/v1` and uses standard OpenAI-compatible endpoints (`/chat/completions`, `/audio/speech`). ### Current Status in OpenClaw - ✅ **Text completion** — Works via OpenRouter or direct Xiaomi provider - ✅ **Model configuration** — All 4 models registered in models.providers.xiaomi - ✅ **Aliases** — mimo-omni, mimo-direct, mimo-flash-direct, etc. - ❌ **Image understanding** — image tool fails with 'No media-understanding provider registered for openrouter' even though agents.defaults.imageModel is set - ❌ **TTS** — MiMo-V2-TTS is not available as a TTS provider option - ❌ **Native provider recognition** — Xiaomi multimodal models not recognized by image tool ### What's Needed #### 1. Image Understanding Support (HIGH PRIORITY) Add Xiaomi as a recognized image-understanding provider in the image tool's provider routing logic. Direct API calls work via curl. #### 2. TTS Provider Integration (MEDIUM PRIORITY) Add Xiaomi TTS (mimo-v2-tts) as a TTS provider option in messages.tts, supporting model selection and voice options. #### 3. Provider Metadata Enhancement Ensure provider metadata system correctly identifies Xiaomi models' multimodal capabilities when input includes ['text', 'image']. ### Environment - OpenClaw version: 2026.3.23-2 - Xiaomi API: https://api.xiaomimimo.com/v1 (OpenAI-compatible) - All 4 MiMo models confirmed working via direct API calls ### Benefits - Cost-effective image analysis (MiMo-V2-Omni: /bin/bash.40/.00 per 1M tokens) - Native TTS option - Complete Xiaomi ecosystem integration - First-class MiMo support in OpenClaw

openclaw2026-03-25 08:44:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54367•Fetched 2026-04-08 01:28:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

M-Lietz

Participants

M-Lietz

Timeline (top)

closed ×1commented ×1locked ×1

Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integration — similar to how Anthropic, OpenAI, and Google are supported.

Root Cause

RAW_BUFFERClick to expand / collapse

Feature Request: Native Xiaomi MiMo Ecosystem Integration

Summary

Background

Xiaomi's MiMo platform (platform.xiaomimimo.com) offers an OpenAI-compatible API with several models:

MiMo-V2-Pro — Primary reasoning model (1M context)
MiMo-V2-Flash — Budget reasoning model (262K context)
MiMo-V2-Omni — Multimodal model (text + image understanding)
MiMo-V2-TTS — Text-to-speech model

The API base URL is https://api.xiaomimimo.com/v1 and uses standard OpenAI-compatible endpoints (/chat/completions, /audio/speech).

Current Status in OpenClaw

✅ Text completion — Works via OpenRouter or direct Xiaomi provider
✅ Model configuration — All 4 models registered in models.providers.xiaomi
✅ Aliases — mimo-omni, mimo-direct, mimo-flash-direct, etc.
❌ Image understanding — image tool fails with 'No media-understanding provider registered for openrouter' even though agents.defaults.imageModel is set
❌ TTS — MiMo-V2-TTS is not available as a TTS provider option
❌ Native provider recognition — Xiaomi multimodal models not recognized by image tool

What's Needed

1. Image Understanding Support (HIGH PRIORITY)

Add Xiaomi as a recognized image-understanding provider in the image tool's provider routing logic. Direct API calls work via curl.

2. TTS Provider Integration (MEDIUM PRIORITY)

Add Xiaomi TTS (mimo-v2-tts) as a TTS provider option in messages.tts, supporting model selection and voice options.

3. Provider Metadata Enhancement

Ensure provider metadata system correctly identifies Xiaomi models' multimodal capabilities when input includes ['text', 'image'].

Environment

OpenClaw version: 2026.3.23-2
Xiaomi API: https://api.xiaomimimo.com/v1 (OpenAI-compatible)
All 4 MiMo models confirmed working via direct API calls

Benefits

Cost-effective image analysis (MiMo-V2-Omni: /bin/bash.40/.00 per 1M tokens)
Native TTS option
Complete Xiaomi ecosystem integration
First-class MiMo support in OpenClaw

extent analysis

Fix Plan

To integrate the Xiaomi MiMo ecosystem into OpenClaw, we need to make the following changes:

1. Image Understanding Support

Register Xiaomi as an image-understanding provider in the image tool's provider routing logic.
Update the image_tool.py file to include Xiaomi in the providers dictionary:

providers = {
    # ... existing providers ...
    'xiaomi': {
        'api_url': 'https://api.xiaomimimo.com/v1',
        'model': 'MiMo-V2-Omni'
    }
}

Update the image_tool function to use the Xiaomi provider for image understanding:

def image_tool(input_image):
    # ... existing code ...
    if provider == 'xiaomi':
        response = requests.post(
            f"{providers['xiaomi']['api_url']}/chat/completions",
            json={'model': providers['xiaomi']['model'], 'prompt': input_image}
        )
        # ... process response ...

2. TTS Provider Integration

Add Xiaomi TTS as a TTS provider option in messages.tts.
Update the tts_providers dictionary to include Xiaomi:

tts_providers = {
    # ... existing providers ...
    'xiaomi': {
        'api_url': 'https://api.xiaomimimo.com/v1',
        'model': 'MiMo-V2-TTS'
    }
}

Update the tts function to use the Xiaomi provider for text-to-speech:

def tts(input_text):
    # ... existing code ...
    if provider == 'xiaomi':
        response = requests.post(
            f"{tts_providers['xiaomi']['api_url']}/audio/speech",
            json={'model': tts_providers['xiaomi']['model'], 'prompt': input_text}
        )
        # ... process response ...

3. Provider Metadata Enhancement

Update the provider metadata system to correctly identify Xiaomi models' multimodal capabilities.
Add a new field to the provider_metadata dictionary to indicate multimodal support:

provider_metadata = {
    # ... existing providers ...
    'xiaomi': {
        'multimodal': True
    }
}

Update the get_provider_metadata function to return the multimodal field:

def get_provider_metadata(provider):
    # ... existing code ...
    if provider == 'xiaomi':
        return {'multimodal': True}

Verification

To verify that the fixes worked, test the following scenarios:

Image understanding: Use the image_tool function with a test image and verify that the response is correct.
TTS: Use the tts function with a test input text and verify that the audio response is correct.
Provider metadata: Verify that the

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Native Xiaomi MiMo Ecosystem Integration (Image, TTS, Full Provider Support) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Feature Request: Native Xiaomi MiMo Ecosystem Integration

Summary

Background

Current Status in OpenClaw

What's Needed

1. Image Understanding Support (HIGH PRIORITY)

2. TTS Provider Integration (MEDIUM PRIORITY)

3. Provider Metadata Enhancement

Environment

Benefits

extent analysis

Fix Plan

1. Image Understanding Support

2. TTS Provider Integration

3. Provider Metadata Enhancement

Verification

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Native Xiaomi MiMo Ecosystem Integration (Image, TTS, Full Provider Support) [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Feature Request: Native Xiaomi MiMo Ecosystem Integration

Summary

Background

Current Status in OpenClaw

What's Needed

1. Image Understanding Support (HIGH PRIORITY)

2. TTS Provider Integration (MEDIUM PRIORITY)

3. Provider Metadata Enhancement

Environment

Benefits

extent analysis

Fix Plan

1. Image Understanding Support

2. TTS Provider Integration

3. Provider Metadata Enhancement

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING