openclaw - ✅(Solved) Fix Image tool fails with reasoning-capable vision models that return empty content [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The OpenClaw image tool fails with "Image model returned no text" when using any OpenAI-compatible reasoning-capable vision model that returns its response in reasoning_content instead of content.

Error Message

  1. Error: "Image model returned no text (<provider>/<model>)" throw new Error(Image model returned no text (${params.provider}/${params.model}).); extractAssistantText() returns "" → throws error. if (stop === "error" || stop === "aborted") throw new Error(/* ... /); if (errorMessage) throw new Error(/ ... */); throw new Error(Image model returned no text (${params.provider}/${params.model}).);

Root Cause

The streaming provider correctly parses reasoning_content from the API response and creates thinking blocks in the message content array.

However, coerceImageAssistantText() in minimax-vlm.ts only calls extractAssistantText() which only reads text type blocks:

function coerceImageAssistantText(params) {
  // ...
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

When a reasoning-capable model returns:

  • content: "" (empty string)
  • reasoning_content: "actual response text"

The message content array becomes: [{ type: "thinking", thinking: "actual response text" }]

extractAssistantText() returns "" → throws error.

Fix Action

Workaround

Set reasoning: false on the vision model config in openclaw.json. This causes the model to return content in the content field directly, bypassing the issue. However, this disables the reasoning capability.

PR fix notes

PR #69444: fix: handle reasoning-only image responses

Description (problem / solution / changelog)

Summary

Reasoning-capable image models can return only signed reasoning blocks and no final text, causing the image tool to fail instead of retrying for a user-visible text description.

This PR adds:

  • retry once with reasoning disabled to request a normal text response
  • return only final assistant text from that retry
  • if the retry still has no final text, fail closed
  • add regression coverage

Fixes #69380. Reported by @shahyashish in #69382/#69389.

Test plan

  • pnpm test src/agents/tools/image-tool.test.ts
  • pnpm check
  • pnpm build

Changed files

  • src/agents/tools/image-tool.helpers.ts (modified, +70/-0)
  • src/agents/tools/image-tool.test.ts (modified, +110/-0)
  • src/agents/tools/image-tool.ts (modified, +2/-0)
  • src/media-understanding/image.test.ts (modified, +117/-0)
  • src/media-understanding/image.ts (modified, +94/-14)

Code Example

function coerceImageAssistantText(params) {
  // ...
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

---

function coerceImageAssistantText(params) {
  const stop = params.message.stopReason;
  const errorMessage = params.message.errorMessage?.trim();
  if (stop === "error" || stop === "aborted") throw new Error(/* ... */);
  if (errorMessage) throw new Error(/* ... */);
  
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  
  // Fallback: reasoning models may return content in thinking blocks
  const thinking = extractAssistantThinking(params.message);
  if (thinking.trim()) return thinking.trim();
  
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}
RAW_BUFFERClick to expand / collapse

Summary

The OpenClaw image tool fails with "Image model returned no text" when using any OpenAI-compatible reasoning-capable vision model that returns its response in reasoning_content instead of content.

Environment

  • OpenClaw version: latest (container)
  • Provider: Custom OpenAI-compatible provider
  • Model: Any vision model with reasoning: true configuration

Steps to Reproduce

  1. Configure a vision model with reasoning: true
  2. Set it as agents.defaults.imageModel.primary
  3. Use the image tool with any image
  4. Error: "Image model returned no text (<provider>/<model>)"

Root Cause

The streaming provider correctly parses reasoning_content from the API response and creates thinking blocks in the message content array.

However, coerceImageAssistantText() in minimax-vlm.ts only calls extractAssistantText() which only reads text type blocks:

function coerceImageAssistantText(params) {
  // ...
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

When a reasoning-capable model returns:

  • content: "" (empty string)
  • reasoning_content: "actual response text"

The message content array becomes: [{ type: "thinking", thinking: "actual response text" }]

extractAssistantText() returns "" → throws error.

Affected Models

Any OpenAI-compatible vision model with reasoning enabled that returns empty content. Tested with multiple models from the same provider family:

  • Vision model A (reasoning enabled) → ❌ fails
  • Vision model B (reasoning enabled) → ❌ fails
  • Vision model C (reasoning enabled) → ❌ fails
  • Vision model A (reasoning disabled) → ✅ works
  • Vision model A (reasoning enabled, patched code) → ✅ works

Suggested Fix

In src/agents/minimax-vlm.ts, coerceImageAssistantText() should fall back to extractAssistantThinking() when text is empty:

function coerceImageAssistantText(params) {
  const stop = params.message.stopReason;
  const errorMessage = params.message.errorMessage?.trim();
  if (stop === "error" || stop === "aborted") throw new Error(/* ... */);
  if (errorMessage) throw new Error(/* ... */);
  
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  
  // Fallback: reasoning models may return content in thinking blocks
  const thinking = extractAssistantThinking(params.message);
  if (thinking.trim()) return thinking.trim();
  
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

Workaround

Set reasoning: false on the vision model config in openclaw.json. This causes the model to return content in the content field directly, bypassing the issue. However, this disables the reasoning capability.

Additional Context

  • media-understanding.ts already handles this correctly with coerceOpenAiCompatibleVideoText() which checks reasoning_content as fallback
  • The complete() function from @mariozechner/pi-ai correctly parses reasoning_content into thinking blocks via the streaming handler
  • Only the image tool path (coerceImageAssistantText) is missing the fallback

Files Involved

  • src/agents/minimax-vlm.tscoerceImageAssistantText() function
  • src/utils/pi-embedded-utils.tsextractAssistantThinking() (already exists, used elsewhere)

extent analysis

TL;DR

The issue can be fixed by modifying the coerceImageAssistantText() function in minimax-vlm.ts to fall back to extractAssistantThinking() when the text is empty.

Guidance

  • The root cause of the issue is that coerceImageAssistantText() only checks for text type blocks and does not handle thinking blocks returned by reasoning-capable models.
  • To fix the issue, update the coerceImageAssistantText() function to check for thinking blocks when the text is empty, as shown in the suggested fix.
  • As a temporary workaround, setting reasoning: false on the vision model config in openclaw.json can bypass the issue, but this will disable the reasoning capability.
  • Verify the fix by testing the image tool with a reasoning-capable vision model and checking that it no longer returns the "Image model returned no text" error.

Example

function coerceImageAssistantText(params) {
  const text = extractAssistantText(params.message);
  if (text.trim()) return text.trim();
  const thinking = extractAssistantThinking(params.message);
  if (thinking.trim()) return thinking.trim();
  throw new Error(`Image model returned no text (${params.provider}/${params.model}).`);
}

Notes

  • The issue only affects the image tool path and does not impact other parts of the system.
  • The media-understanding.ts file already handles this correctly with coerceOpenAiCompatibleVideoText(), which checks reasoning_content as a fallback.

Recommendation

Apply the suggested fix by updating the coerceImageAssistantText() function to fall back to extractAssistantThinking() when the text is empty, as this will enable the reasoning capability for vision models while resolving the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING