openclaw - 💡(How to fix) Fix Native Xiaomi MiMo Ecosystem Integration (Image, TTS, Full Provider Support) [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#54367Fetched 2026-04-08 01:28:30
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1locked ×1

Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integration — similar to how Anthropic, OpenAI, and Google are supported.

Root Cause

Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integration — similar to how Anthropic, OpenAI, and Google are supported.

RAW_BUFFERClick to expand / collapse

Feature Request: Native Xiaomi MiMo Ecosystem Integration

Summary

Request for native, first-class support of the Xiaomi MiMo AI ecosystem in OpenClaw, including image understanding, TTS, and full provider integration — similar to how Anthropic, OpenAI, and Google are supported.

Background

Xiaomi's MiMo platform (platform.xiaomimimo.com) offers an OpenAI-compatible API with several models:

  • MiMo-V2-Pro — Primary reasoning model (1M context)
  • MiMo-V2-Flash — Budget reasoning model (262K context)
  • MiMo-V2-Omni — Multimodal model (text + image understanding)
  • MiMo-V2-TTS — Text-to-speech model

The API base URL is https://api.xiaomimimo.com/v1 and uses standard OpenAI-compatible endpoints (/chat/completions, /audio/speech).

Current Status in OpenClaw

  • Text completion — Works via OpenRouter or direct Xiaomi provider
  • Model configuration — All 4 models registered in models.providers.xiaomi
  • Aliases — mimo-omni, mimo-direct, mimo-flash-direct, etc.
  • Image understanding — image tool fails with 'No media-understanding provider registered for openrouter' even though agents.defaults.imageModel is set
  • TTS — MiMo-V2-TTS is not available as a TTS provider option
  • Native provider recognition — Xiaomi multimodal models not recognized by image tool

What's Needed

1. Image Understanding Support (HIGH PRIORITY)

Add Xiaomi as a recognized image-understanding provider in the image tool's provider routing logic. Direct API calls work via curl.

2. TTS Provider Integration (MEDIUM PRIORITY)

Add Xiaomi TTS (mimo-v2-tts) as a TTS provider option in messages.tts, supporting model selection and voice options.

3. Provider Metadata Enhancement

Ensure provider metadata system correctly identifies Xiaomi models' multimodal capabilities when input includes ['text', 'image'].

Environment

  • OpenClaw version: 2026.3.23-2
  • Xiaomi API: https://api.xiaomimimo.com/v1 (OpenAI-compatible)
  • All 4 MiMo models confirmed working via direct API calls

Benefits

  • Cost-effective image analysis (MiMo-V2-Omni: /bin/bash.40/.00 per 1M tokens)
  • Native TTS option
  • Complete Xiaomi ecosystem integration
  • First-class MiMo support in OpenClaw

extent analysis

Fix Plan

To integrate the Xiaomi MiMo ecosystem into OpenClaw, we need to make the following changes:

1. Image Understanding Support

  • Register Xiaomi as an image-understanding provider in the image tool's provider routing logic.
  • Update the image_tool.py file to include Xiaomi in the providers dictionary:
providers = {
    # ... existing providers ...
    'xiaomi': {
        'api_url': 'https://api.xiaomimimo.com/v1',
        'model': 'MiMo-V2-Omni'
    }
}
  • Update the image_tool function to use the Xiaomi provider for image understanding:
def image_tool(input_image):
    # ... existing code ...
    if provider == 'xiaomi':
        response = requests.post(
            f"{providers['xiaomi']['api_url']}/chat/completions",
            json={'model': providers['xiaomi']['model'], 'prompt': input_image}
        )
        # ... process response ...

2. TTS Provider Integration

  • Add Xiaomi TTS as a TTS provider option in messages.tts.
  • Update the tts_providers dictionary to include Xiaomi:
tts_providers = {
    # ... existing providers ...
    'xiaomi': {
        'api_url': 'https://api.xiaomimimo.com/v1',
        'model': 'MiMo-V2-TTS'
    }
}
  • Update the tts function to use the Xiaomi provider for text-to-speech:
def tts(input_text):
    # ... existing code ...
    if provider == 'xiaomi':
        response = requests.post(
            f"{tts_providers['xiaomi']['api_url']}/audio/speech",
            json={'model': tts_providers['xiaomi']['model'], 'prompt': input_text}
        )
        # ... process response ...

3. Provider Metadata Enhancement

  • Update the provider metadata system to correctly identify Xiaomi models' multimodal capabilities.
  • Add a new field to the provider_metadata dictionary to indicate multimodal support:
provider_metadata = {
    # ... existing providers ...
    'xiaomi': {
        'multimodal': True
    }
}
  • Update the get_provider_metadata function to return the multimodal field:
def get_provider_metadata(provider):
    # ... existing code ...
    if provider == 'xiaomi':
        return {'multimodal': True}

Verification

To verify that the fixes worked, test the following scenarios:

  • Image understanding: Use the image_tool function with a test image and verify that the response is correct.
  • TTS: Use the tts function with a test input text and verify that the audio response is correct.
  • Provider metadata: Verify that the

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING