openclaw - ✅(Solved) Fix [Bug]: WEBUI / Edge cases —— Image input causes some multimodal models to misuse tool calling [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62514Fetched 2026-04-08 03:03:15
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
cross-referenced ×2labeled ×1

In the Web UI, image input can cause some multimodal models with weak tool-calling capabilities to misuse tool calls, which may result in the image input being ignored or hallucinations in image understanding.

Error Message

error occurs in logs, there may be two possible cases:

12:29:45+00:00 error [tools] image failed: Local media file not found: /Users/hyf/.openclaw/workspace/.openclaw_cache/attachments/2026-04-07_2026-04-07T12_28_28_672Z_image_0.jpg raw_params={"prompt":"分析这张图片,描述图中包含的元素、场景和特点。请详细说明看到的内容,包括自然景观、颜色、物体等。","image":"/Users/hyf/.openclaw/workspace/.openclaw_cache/attachments/2026-04-07_2026-04-07T12_28_28_672Z_image_0.jpg"} 12:46:09+00:00 warn agents/tools/image {"subsystem":"agents/tools/image"} image tool local input missing raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 source=local-path 12:46:09+00:00 warn agents/tools/image {"subsystem":"agents/tools/image"} image tool local input missing raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 source=local-path

Root Cause

In the Web UI, image input can cause some multimodal models with weak tool-calling capabilities to misuse tool calls, which may result in the image input being ignored or hallucinations in image understanding.

Fix Action

Fixed

PR fix notes

PR #62523: fix(chat): webui image chat fix

Description (problem / solution / changelog)

Summary

  • Problem: vision-capable runs still injected the image tool even when the current user turn already included native image input.
  • Why it matters: models with weak tool-calling discipline could ignore the tool description, hallucinate a local path or unrelated remote URL, and call image against the wrong target.
  • What changed: the embedded runner now marks runs that already carry prompt images, and createOpenClawCodingTools() removes image from that turn’s tool list when the selected model already supports native image input.
  • What did NOT change (scope boundary): image handling/storage, prompt-image loading, and non-vision or text-only runs still behave as before.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #62514
  • Related #
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the runtime relied on tool-description guidance to discourage image tool use, but still exposed the tool even when the model already had native image inputs for the current turn.
  • Missing detection / guardrail: there was no runtime guard that removed image from the tool list when params.images / prompt image order already indicated native image input.
  • Contributing context (if known): glm-4.6v could ignore the descriptive prohibition and hallucinate nonexistent local attachment paths or unrelated public image URLs.

User-visible / Behavior Changes

When a vision-capable model already receives image input in the current turn, the assistant no longer exposes the fallback image tool for that turn, reducing bogus image-tool calls against hallucinated paths or URLs.

Diagram

Before:
[user sends image] -> [native image input added] -> [image tool still injected] -> [model may hallucinate path/URL] -> [wrong image tool call]

After:
[user sends image] -> [native image input added] -> [image tool removed for this turn] -> [model answers from native image input]

## Changed files

- `src/agents/pi-embedded-runner/run/attempt.ts` (modified, +2/-0)
- `src/agents/pi-tools.ts` (modified, +12/-1)

Code Example

some debug logs:

12:46:03+00:00 info agent/embedded {"subsystem":"agent/embedded"} embedded run tool defs: runId=480fd177-8207-4beb-98f9-9b88be22503c sessionKey=agent:main:main provider=zai/glm-4.6v builtInTools=0 customTools=18 customToolNames=read,edit,write,exec,process,cron,sessions_list,sessions_history,sessions_send,sessions_yield,sessions_spawn,subagents,session_status,web_search,web_fetch,image,memory_search,memory_get imageTool=name=image description=Analyze one or more images with a vision model. Use image for a single path/URL, or images for multiple (up to 20). Only use this tool when images were NOT already provided in the user's message. Images mentioned in the prompt are automatically visible to you. parameters={"type":"object","properties":{"prompt":{"type":"string"},"image":{"description":"Single image path or URL.","type":"string"},"images":{"description":"Multiple image paths or URLs (up to maxImages, default 20).","type":"array","items":{"type":"string"}},"model":{"type":"string"},"maxBytesMb":{"type":"number"},"maxImages":{"type":"number"}}}
12:46:03+00:00 info agent/embedded {"subsystem":"agent/embedded"} embedded run model request: runId=480fd177-8207-4beb-98f9-9b88be22503c sessionKey=agent:main:main provider=zai/glm-4.6v historyMessages=7 roleCounts=assistant:2,toolResult:4,user:1 historyImageBlocks=0 inputImages=1 imageOrder=inline detectedPromptRefs=0 loadedPromptImages=0 skippedPromptImages=0 promptImages=1 systemPromptChars=46346 promptChars=144 systemPromptPreview=You are a personal assistant operating inside OpenClaw. ## Tooling Structured tool definitions are the source of truth for tool names, descriptions, and parameters. Tool names are case-sensitive. Call tools exactly as listed in the structured tool definitions. If a tool is pre... promptPreview=Sender (untrusted metadata):  { "label": "openclaw-control-ui", "id": "openclaw-control-ui" }  [Tue 2026-04-07 20:46 GMT+8] 图中有什么
12:46:09+00:00 info agent/embedded {"subsystem":"agent/embedded"} embedded run tool call received: runId=480fd177-8207-4beb-98f9-9b88be22503c rawTool=image tool=image toolCallId=call_eaf0f08426a843bcadedd802 meta=描述这张图片的内容,包括场景、元素和整体氛围 args={"prompt":"描述这张图片的内容,包括场景、元素和整体氛围","image":"https://picsum.photos/id/1018/800/600"}
12:46:09+00:00 info agents/tools/image {"subsystem":"agents/tools/image"} image tool local input raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 exists=false source=local-path
12:46:09+00:00 warn agents/tools/image {"subsystem":"agents/tools/image"} image tool local input missing raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 source=local-path
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

In the Web UI, image input can cause some multimodal models with weak tool-calling capabilities to misuse tool calls, which may result in the image input being ignored or hallucinations in image understanding.

Steps to reproduce

  1. start a webui, start a new session
  2. choose a multimodal models (glm-4.6v). Notes: This is an edge case that only occurs with weaker multimodal models; state-of-the-art models such as GPT-5.4 do not have this issue.
  3. input an image, and see the logs

Expected behavior

model can understand the image, and response correctly

Actual behavior

error occurs in logs, there may be two possible cases: the first is:

12:29:45+00:00 error [tools] image failed: Local media file not found: /Users/hyf/.openclaw/workspace/.openclaw_cache/attachments/2026-04-07_2026-04-07T12_28_28_672Z_image_0.jpg raw_params={"prompt":"分析这张图片,描述图中包含的元素、场景和特点。请详细说明看到的内容,包括自然景观、颜色、物体等。","image":"/Users/hyf/.openclaw/workspace/.openclaw_cache/attachments/2026-04-07_2026-04-07T12_28_28_672Z_image_0.jpg"}

where the local image path "/Users/hyf/.openclaw/workspace/.openclaw_cache/attachments/2026-04-07_2026-04-07T12_28_28_672Z_image_0.jpg" is hallucination

the second is:

12:46:09+00:00 warn agents/tools/image {"subsystem":"agents/tools/image"} image tool local input missing raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 source=local-path

where the image url "https://picsum.photos/id/1018/800/600" is hallucination

OpenClaw version

2026.4.6

Operating system

Macos

Install method

pnpm dev

Model

glm-4.6v

Provider / routing chain

openclaw - zai - glm-4.6v

Additional provider/model setup details

No response

Logs, screenshots, and evidence

some debug logs:

12:46:03+00:00 info agent/embedded {"subsystem":"agent/embedded"} embedded run tool defs: runId=480fd177-8207-4beb-98f9-9b88be22503c sessionKey=agent:main:main provider=zai/glm-4.6v builtInTools=0 customTools=18 customToolNames=read,edit,write,exec,process,cron,sessions_list,sessions_history,sessions_send,sessions_yield,sessions_spawn,subagents,session_status,web_search,web_fetch,image,memory_search,memory_get imageTool=name=image description=Analyze one or more images with a vision model. Use image for a single path/URL, or images for multiple (up to 20). Only use this tool when images were NOT already provided in the user's message. Images mentioned in the prompt are automatically visible to you. parameters={"type":"object","properties":{"prompt":{"type":"string"},"image":{"description":"Single image path or URL.","type":"string"},"images":{"description":"Multiple image paths or URLs (up to maxImages, default 20).","type":"array","items":{"type":"string"}},"model":{"type":"string"},"maxBytesMb":{"type":"number"},"maxImages":{"type":"number"}}}
12:46:03+00:00 info agent/embedded {"subsystem":"agent/embedded"} embedded run model request: runId=480fd177-8207-4beb-98f9-9b88be22503c sessionKey=agent:main:main provider=zai/glm-4.6v historyMessages=7 roleCounts=assistant:2,toolResult:4,user:1 historyImageBlocks=0 inputImages=1 imageOrder=inline detectedPromptRefs=0 loadedPromptImages=0 skippedPromptImages=0 promptImages=1 systemPromptChars=46346 promptChars=144 systemPromptPreview=You are a personal assistant operating inside OpenClaw. ## Tooling Structured tool definitions are the source of truth for tool names, descriptions, and parameters. Tool names are case-sensitive. Call tools exactly as listed in the structured tool definitions. If a tool is pre... promptPreview=Sender (untrusted metadata):  { "label": "openclaw-control-ui", "id": "openclaw-control-ui" }  [Tue 2026-04-07 20:46 GMT+8] 图中有什么
12:46:09+00:00 info agent/embedded {"subsystem":"agent/embedded"} embedded run tool call received: runId=480fd177-8207-4beb-98f9-9b88be22503c rawTool=image tool=image toolCallId=call_eaf0f08426a843bcadedd802 meta=描述这张图片的内容,包括场景、元素和整体氛围 args={"prompt":"描述这张图片的内容,包括场景、元素和整体氛围","image":"https://picsum.photos/id/1018/800/600"}
12:46:09+00:00 info agents/tools/image {"subsystem":"agents/tools/image"} image tool local input raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 exists=false source=local-path
12:46:09+00:00 warn agents/tools/image {"subsystem":"agents/tools/image"} image tool local input missing raw=https://picsum.photos/id/1018/800/600 resolved=https://picsum.photos/id/1018/800/600 source=local-path

Impact and severity

Some models with weaker tool-calling capabilities may produce errors or hallucinations during multimodal recognition, but in essence their underlying multimodal capability is still sufficient for the recognition task.

Additional information

No response

extent analysis

TL;DR

The issue can be mitigated by ensuring that the image input is correctly handled and processed by the multimodal models, potentially by improving the tool-calling capabilities of weaker models.

Guidance

  • Verify that the image file exists at the specified local path to rule out file system issues.
  • Check the model's documentation and configuration to ensure that it is correctly set up to handle image inputs and tool calls.
  • Consider using a state-of-the-art model like GPT-5.4, which does not exhibit this issue, as a temporary workaround.
  • Review the logs to identify any patterns or common factors that may be contributing to the errors or hallucinations.

Example

No code snippet is provided as the issue is related to model behavior and tool-calling capabilities, rather than a specific code error.

Notes

The issue appears to be specific to weaker multimodal models, and the provided logs suggest that the image tool is not correctly handling local input. However, without further information about the model's implementation and configuration, it is difficult to provide a more specific solution.

Recommendation

Apply a workaround by using a state-of-the-art model like GPT-5.4, which does not exhibit this issue, until the underlying problem with the weaker multimodal models can be resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

model can understand the image, and response correctly

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING