openclaw - 💡(How to fix) Fix [Feature]:Expose image-tool to codex kernel so ChatGPT Plus OAuth agents can read images

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

codex kernel does not expose the image tool that pi kernel has, so agents running under ChatGPT Plus OAuth cannot natively read screenshots even when the underlying model supports vision.

Root Cause

  • Affected: Any agent flow that needs to read images from disk (competitor screenshot analysis, OCR of saved PNGs, multi-image visual QA). Specifically affects ChatGPT Plus OAuth users because they cannot switch to a pi-kernel-eligible provider without giving up the OAuth subscription.
  • Severity: Medium — workaround exists but is ~10x slower than native should be.
  • Frequency: Every multi-image analysis turn. In our workflow: every time we run product competitor analysis (3 ASIN × ~15 screenshots).
  • Consequence: ~45 minutes per product run on the wrapper path vs an estimated ~5 minutes with native image-tool / auto-injection.

Fix Action

Fix / Workaround

Current workaround in use: an external shell wrapper that reads ~/.openclaw/agents/main/agent/auth-profiles.json, writes the OAuth token to ~/.codex/auth.json, and invokes codex exec --skip-git-repo-check --model gpt-5.4 --image <path> - per image.

  • Affected: Any agent flow that needs to read images from disk (competitor screenshot analysis, OCR of saved PNGs, multi-image visual QA). Specifically affects ChatGPT Plus OAuth users because they cannot switch to a pi-kernel-eligible provider without giving up the OAuth subscription.
  • Severity: Medium — workaround exists but is ~10x slower than native should be.
  • Frequency: Every multi-image analysis turn. In our workflow: every time we run product competitor analysis (3 ASIN × ~15 screenshots).
  • Consequence: ~45 minutes per product run on the wrapper path vs an estimated ~5 minutes with native image-tool / auto-injection.
RAW_BUFFERClick to expand / collapse

Summary

codex kernel does not expose the image tool that pi kernel has, so agents running under ChatGPT Plus OAuth cannot natively read screenshots even when the underlying model supports vision.

Problem to solve

createImageTool (in src/agents/tools/image-tool.ts) is registered via createOpenClawTools and consumed by pi kernel (src/agents/pi-tools.ts) plus the auto-reply and gateway HTTP paths. The codex kernel (used when agents.defaults.model.primary = openai-codex/*) never receives it. The same kernel also bypasses detectAndLoadPromptImages auto-injection in src/agents/pi-embedded-runner/run/attempt.ts.

Result: ChatGPT Plus OAuth agents cannot read PNGs from the workspace from inside the agent loop. Models like gpt-5.4 / gpt-5.5 clearly support vision, but the kernel does not forward image content as multimodal input. Users have to write external shell wrappers that re-auth and call codex exec --image per image, paying ~75 s overhead per call (codex subprocess spawn + 78 k-token system prompt reload each time).

Proposed solution

Any one of the following would solve it:

  1. Register imageTool (and pdfTool while at it) when building the codex kernel tool set the way pi kernel does.
  2. Enable detectAndLoadPromptImages on the codex kernel input path so bare image paths in agent prompts/tool results get auto-injected as multimodal content.
  3. Failing both, document explicitly that ChatGPT Plus OAuth users do not have native vision and recommend a wrapper pattern.

Happy to take a PR shot at option 1 or 2 if a maintainer can sanity-check the right injection point — codex kernel's tool plumbing is less obvious than pi's createOpenClawTools chain.

Alternatives considered

Current workaround in use: an external shell wrapper that reads ~/.openclaw/agents/main/agent/auth-profiles.json, writes the OAuth token to ~/.codex/auth.json, and invokes codex exec --skip-git-repo-check --model gpt-5.4 --image <path> - per image.

Works, but slow (~75 s per call) and wasteful (each call re-loads 78 k tokens of codex system prompt that should have been part of the existing long session).

Alternative: tell users to drag images into chat manually — defeats the point of agent automation when there are 30+ screenshots to analyze.

Impact

  • Affected: Any agent flow that needs to read images from disk (competitor screenshot analysis, OCR of saved PNGs, multi-image visual QA). Specifically affects ChatGPT Plus OAuth users because they cannot switch to a pi-kernel-eligible provider without giving up the OAuth subscription.
  • Severity: Medium — workaround exists but is ~10x slower than native should be.
  • Frequency: Every multi-image analysis turn. In our workflow: every time we run product competitor analysis (3 ASIN × ~15 screenshots).
  • Consequence: ~45 minutes per product run on the wrapper path vs an estimated ~5 minutes with native image-tool / auto-injection.

Evidence/examples

Repro:

  1. Set agents.defaults.model.primary = openai-codex/gpt-5.4.
  2. In chat: "read this PNG and describe it" pointing at a workspace file.
  3. Agent has no native tool; it either fakes the answer or shells out via a wrapper.

Token cost per wrapper call (from our logs):

  • input_token_count ≈ 78 k (cached system prompt re-loaded every time)
  • output_token_count ≈ 200-300
  • response time ≈ 70-75 s

Additional information

Environment:

  • OpenClaw image: ghcr.io/openclaw/openclaw:2026.5.18
  • codex CLI: 0.130.0
  • Model: openai-codex/gpt-5.4
  • Auth: ChatGPT Plus OAuth

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING