openclaw - ✅(Solved) Fix Image tool fails with reasoning-capable vision models that return empty content [1 pull requests]

openclaw2026-04-20 15:46:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The OpenClaw image tool fails with "Image model returned no text" when using any OpenAI-compatible reasoning-capable vision model that returns its response in reasoning_content instead of content.

Error Message

Error: "Image model returned no text (<provider>/<model>)" throw new Error(Image model returned no text (${params.provider}/${params.model}).); extractAssistantText() returns "" → throws error. if (stop === "error" || stop === "aborted") throw new Error(/* ... /); if (errorMessage) throw new Error(/ ... */); throw new Error(Image model returned no text (${params.provider}/${params.model}).);

Root Cause

The streaming provider correctly parses reasoning_content from the API response and creates thinking blocks in the message content array.

However, coerceImageAssistantText() in minimax-vlm.ts only calls extractAssistantText() which only reads text type blocks:

function coerceImageAssistantText(params) {
  // ...
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

When a reasoning-capable model returns:

content: "" (empty string)
reasoning_content: "actual response text"

The message content array becomes: [{ type: "thinking", thinking: "actual response text" }]

extractAssistantText() returns "" → throws error.

Fix Action

Workaround

Set reasoning: false on the vision model config in openclaw.json. This causes the model to return content in the content field directly, bypassing the issue. However, this disables the reasoning capability.

PR fix notes

PR #69444: fix: handle reasoning-only image responses

Repository: openclaw/openclaw
Author: sallyom
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/69444

Description (problem / solution / changelog)

Summary

Reasoning-capable image models can return only signed reasoning blocks and no final text, causing the image tool to fail instead of retrying for a user-visible text description.

This PR adds:

retry once with reasoning disabled to request a normal text response
return only final assistant text from that retry
if the retry still has no final text, fail closed
add regression coverage

Fixes #69380. Reported by @shahyashish in #69382/#69389.

Test plan

pnpm test src/agents/tools/image-tool.test.ts
pnpm check
pnpm build

Changed files

src/agents/tools/image-tool.helpers.ts (modified, +70/-0)
src/agents/tools/image-tool.test.ts (modified, +110/-0)
src/agents/tools/image-tool.ts (modified, +2/-0)
src/media-understanding/image.test.ts (modified, +117/-0)
src/media-understanding/image.ts (modified, +94/-14)

Code Example

function coerceImageAssistantText(params) {
  // ...
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

---

function coerceImageAssistantText(params) {
  const stop = params.message.stopReason;
  const errorMessage = params.message.errorMessage?.trim();
  if (stop === "error" || stop === "aborted") throw new Error(/* ... */);
  if (errorMessage) throw new Error(/* ... */);
  
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  
  // Fallback: reasoning models may return content in thinking blocks
  const thinking = extractAssistantThinking(params.message);
  if (thinking.trim()) return thinking.trim();
  
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw version: latest (container)
Provider: Custom OpenAI-compatible provider
Model: Any vision model with reasoning: true configuration

Steps to Reproduce

Configure a vision model with reasoning: true
Set it as agents.defaults.imageModel.primary
Use the image tool with any image
Error: "Image model returned no text (<provider>/<model>)"

Root Cause

The streaming provider correctly parses reasoning_content from the API response and creates thinking blocks in the message content array.

However, coerceImageAssistantText() in minimax-vlm.ts only calls extractAssistantText() which only reads text type blocks:

function coerceImageAssistantText(params) {
  // ...
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

When a reasoning-capable model returns:

content: "" (empty string)
reasoning_content: "actual response text"

The message content array becomes: [{ type: "thinking", thinking: "actual response text" }]

extractAssistantText() returns "" → throws error.

Affected Models

Any OpenAI-compatible vision model with reasoning enabled that returns empty content. Tested with multiple models from the same provider family:

Vision model A (reasoning enabled) → ❌ fails
Vision model B (reasoning enabled) → ❌ fails
Vision model C (reasoning enabled) → ❌ fails
Vision model A (reasoning disabled) → ✅ works
Vision model A (reasoning enabled, patched code) → ✅ works

Suggested Fix

In src/agents/minimax-vlm.ts, coerceImageAssistantText() should fall back to extractAssistantThinking() when text is empty:

function coerceImageAssistantText(params) {
  const stop = params.message.stopReason;
  const errorMessage = params.message.errorMessage?.trim();
  if (stop === "error" || stop === "aborted") throw new Error(/* ... */);
  if (errorMessage) throw new Error(/* ... */);
  
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  
  // Fallback: reasoning models may return content in thinking blocks
  const thinking = extractAssistantThinking(params.message);
  if (thinking.trim()) return thinking.trim();
  
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

Workaround

Additional Context

media-understanding.ts already handles this correctly with coerceOpenAiCompatibleVideoText() which checks reasoning_content as fallback
The complete() function from @mariozechner/pi-ai correctly parses reasoning_content into thinking blocks via the streaming handler
Only the image tool path (coerceImageAssistantText) is missing the fallback

Files Involved

src/agents/minimax-vlm.ts — coerceImageAssistantText() function
src/utils/pi-embedded-utils.ts — extractAssistantThinking() (already exists, used elsewhere)

extent analysis

TL;DR

The issue can be fixed by modifying the coerceImageAssistantText() function in minimax-vlm.ts to fall back to extractAssistantThinking() when the text is empty.

Guidance

The root cause of the issue is that coerceImageAssistantText() only checks for text type blocks and does not handle thinking blocks returned by reasoning-capable models.
To fix the issue, update the coerceImageAssistantText() function to check for thinking blocks when the text is empty, as shown in the suggested fix.
As a temporary workaround, setting reasoning: false on the vision model config in openclaw.json can bypass the issue, but this will disable the reasoning capability.
Verify the fix by testing the image tool with a reasoning-capable vision model and checking that it no longer returns the "Image model returned no text" error.

Example

function coerceImageAssistantText(params) {
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  const thinking = extractAssistantThinking(params.message);
  if (thinking.trim()) return thinking.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

Notes

The issue only affects the image tool path and does not impact other parts of the system.
The media-understanding.ts file already handles this correctly with coerceOpenAiCompatibleVideoText(), which checks reasoning_content as a fallback.

Recommendation

Apply the suggested fix by updating the coerceImageAssistantText() function to fall back to extractAssistantThinking() when the text is empty, as this will enable the reasoning capability for vision models while resolving the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #agent execution #callback error #memory management #API rate limit

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Image tool fails with reasoning-capable vision models that return empty content [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #69444: fix: handle reasoning-only image responses

Description (problem / solution / changelog)

Summary

Test plan

Changed files

Code Example

Summary

Environment

Steps to Reproduce

Root Cause

Affected Models

Suggested Fix

Workaround

Additional Context

Files Involved

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING