openclaw - 💡(How to fix) Fix Persist image understanding summaries for agent attachments

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
RAW_BUFFERClick to expand / collapse

Problem

A local image-context probe showed that OpenClaw can send a native image to the model and openclaw infer image describe can produce a good durable summary/OCR-style description, but the agent/LCM path did not persist any attachment metadata or media-understanding output for reuse. Follow-up turns can only recover image content if the assistant happened to mention it later.

Desired behavior

  • Agent image attachments should expose/persist durable media memory alongside the raw image reference: filename/source label, MIME type, byte size, dimensions/hash when known, attachment provenance, and image.description text/OCR output.
  • Reuse existing media-understanding output when OpenClaw already produced it.
  • If no prior summary exists, run image.describe once at ingest or maintenance time, not repeatedly during hot context assembly.
  • Default summary budget should be about 512 tokens, with an OCR-heavy flyer/screenshot cap around 2,000 tokens.
  • Fresh turns can still send the native image to capable models, but old/compacted/tool-stubbed context should receive a metadata+summary stub by default; raw image replay should be explicit.
  • If summary generation fails, persist a metadata-only stub and raw reference so the agent still knows media existed.

Why now

Martian-Engineering/lossless-claw#775 is being hardened so LCM can store image files and assemble bounded image-memory stubs instead of replaying raw base64. For best results, OpenClaw should hand lossless-claw the media-understanding summary/metadata directly instead of requiring LCM to infer it later from assistant prose.

Acceptance criteria

  • A deterministic image with visible text is sent through a local agent session.
  • The first model call can use the native image.
  • The durable context path stores/reuses image metadata and summary without retaining base64 in normal assembled context.
  • A follow-up turn can answer from cached image memory after raw image replay is disabled.
  • OCR-heavy flyer text is preserved under the 2,000-token cap.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Persist image understanding summaries for agent attachments