codex - 💡(How to fix) Fix Expose held pointer primitives in browser automation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Many interactive browser experiences have important transient state while the pointer is held down: drag-and-drop editors, drawing/canvas tools, WebGL apps, maps, games, timeline editors, design tools, sliders, scrubbers, resize handles, and sortable lists.

A concrete example is a Unity WebGL game where dragging a card from the hand into the world shows a real-time placement preview: range circles, blocked placement text, interaction highlights, and move score deltas. The agent needs to inspect that held state before deciding where to release.

The current in-app browser CUA surface appears to provide high-level input methods such as click, double_click, drag, move, scroll, keypress, and type. drag works for all-in-one interactions, but it is atomic: press, move, release. The agent only gets to observe after release, which misses the held-pointer UI state.

RAW_BUFFERClick to expand / collapse

Feature request

Please expose low-level held-pointer primitives for Codex browser automation, especially in the Codex desktop app's in-app browser.

Useful API shapes would be one of:

  • mouseDown(x, y), mouseMove(x, y), mouseUp(x, y)
  • beginDrag(...), moveDrag(...), endDrag(...)
  • dragHold(path) that leaves the pointer down so the agent can inspect/screenshot before releasing

Why this matters

Many interactive browser experiences have important transient state while the pointer is held down: drag-and-drop editors, drawing/canvas tools, WebGL apps, maps, games, timeline editors, design tools, sliders, scrubbers, resize handles, and sortable lists.

A concrete example is a Unity WebGL game where dragging a card from the hand into the world shows a real-time placement preview: range circles, blocked placement text, interaction highlights, and move score deltas. The agent needs to inspect that held state before deciding where to release.

The current in-app browser CUA surface appears to provide high-level input methods such as click, double_click, drag, move, scroll, keypress, and type. drag works for all-in-one interactions, but it is atomic: press, move, release. The agent only gets to observe after release, which misses the held-pointer UI state.

Expected behavior

Codex should be able to perform a general held-pointer workflow:

  1. press down at a chosen screen coordinate or target element
  2. move while keeping the pointer/button held down
  3. take a screenshot or inspect visible state before release
  4. continue moving while still held, if needed
  5. release at the chosen coordinate

This should work for canvas/WebGL content as well as ordinary DOM-backed drag interactions.

Impact

This would make Codex much better at manually QA-ing browser games, WebGL tools, drawing/canvas apps, drag-and-drop editors, design tools, and any interface where hover/held pointer state is the behavior under test.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Codex should be able to perform a general held-pointer workflow:

  1. press down at a chosen screen coordinate or target element
  2. move while keeping the pointer/button held down
  3. take a screenshot or inspect visible state before release
  4. continue moving while still held, if needed
  5. release at the chosen coordinate

This should work for canvas/WebGL content as well as ordinary DOM-backed drag interactions.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Expose held pointer primitives in browser automation