openclaw - ✅(Solved) Fix [Bug] Image tool returns empty text for all OpenRouter vision models [1 pull requests, 1 participants]

openclaw2026-04-23 00:52:41

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70410•Fetched 2026-04-23 07:25:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Deepwater1000

Participants

Deepwater1000

The image tool returns "Image model returned no text" for all OpenRouter vision models, both free and paid. Local models (LM Studio) work fine.

Error Message

Models resolve correctly — no "Unknown model" error

Error Log

Root Cause

Models resolve correctly — no "Unknown model" error
API auth works — OpenRouter accepts the request (no 401/403)
Response: finish_reason: "stop" but content is empty string
Format: pi-ai correctly converts {type: "image", data: base64} to OpenAI {type: "image_url", image_url: {url: "data:..."}}

Fix Action

Fix / Workaround

I tried patching this to include {type: "text", text: prompt} in the user message content array, but it still returned empty text after a full gateway restart. The issue may be deeper — possibly in how extractAssistantText processes the response, or in pi-ai's response handling for OpenRouter.

PR fix notes

PR #50492: feat(openrouter): add media-understanding provider for image analysis

Repository: openclaw/openclaw
Author: mculp
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/50492

Description (problem / solution / changelog)

AI-assisted: Yes

Built with Vox (OpenClaw agent) + Claude.

Problem

The image tool doesn't work with OpenRouter vision models (e.g. openrouter/xiaomi/mimo-v2-omni) because there's no OpenRouter media-understanding provider registered. The provider whitelist in src/media-understanding/providers/index.ts is hardcoded to anthropic, openai, minimax, moonshot, zai, groq, and deepgram.

Solution

Add a media-understanding provider to the OpenRouter extension that:

Bypasses the model registry (OpenRouter models aren't registered there)
Calls OpenRouter's /v1/chat/completions API directly with inline base64 images
Handles both single-image (describeImage) and multi-image (describeImages) requests
Registers in the extension's register() method alongside the existing provider registration

This means any OpenRouter vision-capable model works with the image tool, including ones like openrouter/xiaomi/mimo-v2-omni that aren't in the model registry.

Files changed

extensions/openrouter/index.ts — import + register the media-understanding provider
extensions/openrouter/media-understanding-provider.ts — new file, OpenAI-compatible vision API wrapper

Changed files

extensions/openrouter/index.ts (modified, +2/-0)
extensions/openrouter/media-understanding-provider.ts (added, +129/-0)

Code Example

function buildImageContext(prompt, images) {
    return {
        systemPrompt: prompt,
        messages: [{
            role: "user",
            content: images.map((image) => ({
                type: "image",
                data: image.buffer.toString("base64"),
                mimeType: image.mime ?? "image/jpeg"
            })),
            // ^ No text content in user message!
        }]
    };
}

---

[tools] image failed: All image models failed (2): openrouter/openai/gpt-4.1-mini: Image model returned no text (openrouter/openai/gpt-4.1-mini). | openrouter/google/gemini-2.5-flash: Image model returned no text (openrouter/google/gemini-2.5-flash).

RAW_BUFFERClick to expand / collapse

Description

The image tool returns "Image model returned no text" for all OpenRouter vision models, both free and paid. Local models (LM Studio) work fine.

Environment

OpenClaw: 2026.4.9
Node: v22.22.0
OS: macOS arm64 (Darwin 25.3.0)

Models Tested (all fail)

openrouter/openai/gpt-4.1-mini (paid, non-reasoning)
openrouter/google/gemini-2.5-flash (paid)
openrouter/google/gemma-4-31b-it:free (free)
openrouter/qwen/qwen3.5-flash-02-23 (paid)

Working

lm-studio/qwen/qwen3.5-9b (local) — works in 14-21 seconds

Analysis

Models resolve correctly — no "Unknown model" error
API auth works — OpenRouter accepts the request (no 401/403)
Response: finish_reason: "stop" but content is empty string
Format: pi-ai correctly converts {type: "image", data: base64} to OpenAI {type: "image_url", image_url: {url: "data:..."}}

Source Code Findings

In image-BekbXrJh.js, buildImageContext puts the prompt in systemPrompt but the user message content array contains only images, no text:

function buildImageContext(prompt, images) {
    return {
        systemPrompt: prompt,
        messages: [{
            role: "user",
            content: images.map((image) => ({
                type: "image",
                data: image.buffer.toString("base64"),
                mimeType: image.mime ?? "image/jpeg"
            })),
            // ^ No text content in user message!
        }]
    };
}

OpenRouter docs recommend: "we recommend sending the text prompt first, then the images" — the current implementation only sends images in the user message.

Suggested Fix

Add {type: "text", text: prompt} to user message content alongside images (per OpenRouter docs)
Add debug logging in coerceImageAssistantText to log the raw stopReason, errorMessage, and message content when text is empty
Investigate if pi-ai openai-completions.js handles OpenRouter's response format correctly for vision requests

Error Log

[tools] image failed: All image models failed (2): openrouter/openai/gpt-4.1-mini: Image model returned no text (openrouter/openai/gpt-4.1-mini). | openrouter/google/gemini-2.5-flash: Image model returned no text (openrouter/google/gemini-2.5-flash).

extent analysis

TL;DR

Modify the buildImageContext function to include the text prompt in the user message content array, as recommended by OpenRouter docs.

Guidance

Verify that the buildImageContext function is correctly modified to include the text prompt in the user message content array, alongside the images.
Add debug logging in coerceImageAssistantText to log the raw stopReason, errorMessage, and message content when text is empty, to help identify potential issues in the response handling.
Investigate if pi-ai's openai-completions.js handles OpenRouter's response format correctly for vision requests, as this may be a contributing factor to the issue.
Test the modified implementation with a local model, such as lm-studio/qwen/qwen3.5-9b, to ensure that the fix does not introduce any regressions.

Example

function buildImageContext(prompt, images) {
    return {
        systemPrompt: prompt,
        messages: [{
            role: "user",
            content: [
                { type: "text", text: prompt },
                ...images.map((image) => ({
                    type: "image",
                    data: image.buffer.toString("base64"),
                    mimeType: image.mime ?? "image/jpeg"
                })),
            ],
        }]
    };
}

Notes

The issue may be deeper than just the buildImageContext function, and may require further investigation into the response handling and pi-ai's implementation.

Recommendation

Apply the suggested fix to modify the buildImageContext function, as this is the most likely cause of the issue, and test thoroughly to ensure that the fix resolves the problem without introducing any regressions.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug] Image tool returns empty text for all OpenRouter vision models [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Log

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #50492: feat(openrouter): add media-understanding provider for image analysis

Description (problem / solution / changelog)

AI-assisted: Yes

Problem

Solution

Files changed

Changed files

Code Example

Description

Environment

Models Tested (all fail)

Working

Analysis

Source Code Findings

Suggested Fix

Error Log

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING