openclaw - 💡(How to fix) Fix [Bug]: read tool and gateway media handler strip image data — multimodal models cannot see images [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81452Fetched 2026-05-14 03:32:07
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Author
Timeline (top)
labeled ×2closed ×1commented ×1

The read tool and gateway media handler strip image data entirely, preventing vision-capable models from receiving user-uploaded images in their prompt context. This is a follow-up to #14707 (self-generated images) — the same root cause also affects images sent by users via webchat.

Root Cause

The read tool and gateway media handler strip image data entirely, preventing vision-capable models from receiving user-uploaded images in their prompt context. This is a follow-up to #14707 (self-generated images) — the same root cause also affects images sent by users via webchat.

Code Example

Gateway message processing shows:
[media reference removed - already processed by model]

read tool output for image files shows:
[image data removed - already processed by model]

In both cases, no image data reaches the model context.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

The read tool and gateway media handler strip image data entirely, preventing vision-capable models from receiving user-uploaded images in their prompt context. This is a follow-up to #14707 (self-generated images) — the same root cause also affects images sent by users via webchat.

Steps to reproduce

  1. Start OpenClaw gateway with a vision-capable model configured (e.g., kimi/kimi-k2.6).
  2. Open Control UI webchat (openclaw dashboard).
  3. Send a screenshot/image to the agent via the chat input.
  4. Ask the agent to describe what is in the image.
  5. Observe that the image data is replaced with [media reference removed - already processed by model] in the agent context.
  6. Alternatively, save an image to disk and use the read tool on it.
  7. Observe that read outputs [image data removed - already processed by model] instead of passing the image to the model.

Expected behavior

When a vision-capable multimodal model (e.g., Kimi K2.6) is active, uploaded or read images should be passed as native image blocks in the model's prompt context, allowing the model to "see" and analyze the image directly.

Actual behavior

  • Gateway media handler replaces uploaded images with [media reference removed - already processed by model].
  • The read tool replaces image file content with [image data removed - already processed by model].
  • The model receives only a text placeholder instead of the actual image, making it impossible to analyze visual content.
  • This occurs even when the active model (kimi/kimi-k2.6) natively supports multimodal image input.

OpenClaw version

2026.5.7 (eeef486)

Operating system

Windows 11 (26200)

Install method

npm global

Model

kimi/kimi-k2.6 (vision-capable multimodal model)

Provider / routing chain

openclaw -> kimi

Additional provider/model setup details

Both deepseek and kimi providers are configured in ~/.openclaw/openclaw.json. The active session model is kimi/kimi-k2.6, which supports multimodal image input natively. The agents.defaults.models catalog includes kimi/kimi-k2.6 with alias "Kimi".

Logs, screenshots, and evidence

Gateway message processing shows:
[media reference removed - already processed by model]

read tool output for image files shows:
[image data removed - already processed by model]

In both cases, no image data reaches the model context.

Impact and severity

Affected: All users attempting to send images to vision-capable models via webchat or the read tool. Severity: High (breaks core multimodal functionality). Frequency: 100% reproducible (every image upload/read attempt). Consequence: Users cannot use image input with multimodal models; agents cannot analyze screenshots, photos, or diagrams. This significantly limits OpenClaw's utility for visual tasks.

Additional information

This appears related to #14707 (self-generated images cannot be injected into agent context) and #62514 (WEBUI image input edge cases). The root cause may be that the gateway media handler and read tool treat images as opaque attachments rather than passing them as multimodal content blocks to vision-capable models. A unified fix for native multimodal context injection might address all three issues.

Note on testing: This should be caught by a basic E2E test — upload an image to a vision-capable model and assert the model receives the image in its context. If such a test exists, it is not working. If it doesn't exist, it should be added.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a vision-capable multimodal model (e.g., Kimi K2.6) is active, uploaded or read images should be passed as native image blocks in the model's prompt context, allowing the model to "see" and analyze the image directly.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: read tool and gateway media handler strip image data — multimodal models cannot see images [1 comments, 2 participants]