hermes - 💡(How to fix) Fix [Bug] computer_use toolset: 5 bugs found during hands-on testing (macOS 26.4.1, cua-driver v0.1.6)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

During thorough testing of the computer_use toolset (Path A wrapper), I found 5 bugs. The raw MCP path (mcp_cua_driver_*) works correctly for all of these — the bugs are specific to the Hermes wrapper layer.


Error Message

Actual: Error: cua-driver error: Invalid params: Unknown tool: type_text_chars Actual: Error: drag is not supported by the cua-driver backend. All element labels are empty strings in the Path A wrapper output, while the raw MCP path preserves them. This makes element identification impossible without trial-and-error clicking.

Root Cause

During thorough testing of the computer_use toolset (Path A wrapper), I found 5 bugs. The raw MCP path (mcp_cua_driver_*) works correctly for all of these — the bugs are specific to the Hermes wrapper layer.


Fix Action

Fix / Workaround

Workaround: Call focus_app(app="計算機") first, then capture(mode="som"). Note: the app name must match the macOS localized name (e.g., "計算機" not "Calculator").

Workaround: Don't use capture_after=True. Click first, then separately call focus_app + capture.

Workaround: Use mcp_cua_driver_type_text(pid=..., text="hello") directly.

Code Example

computer_use(action="capture", mode="som", app="Calculator")

---

computer_use(action="capture", mode="som", app="計算機")  # Works after focus_app
computer_use(action="click", element=14, capture_after=True)

---

computer_use(action="type", text="hello")

---

computer_use(action="drag", from_coordinate=[100,200], to_coordinate=[400,500])

---

computer_use(action="capture", mode="som", app="計算機")

---

#14 AXButton "" @ (0, 0, 0, 0)
#15 AXButton "" @ (0, 0, 0, 0)

---

[14] AXButton (1) id=One
[15] AXButton (2) id=Two
RAW_BUFFERClick to expand / collapse

Environment

  • macOS: 26.4.1 (Tahoe)
  • cua-driver: v0.1.6
  • Hermes: latest (as of 2026-05-12)
  • capture_mode: ax (ScreenCaptureKit broken on macOS 26.4.1, see trycua/cua#1467)

Summary

During thorough testing of the computer_use toolset (Path A wrapper), I found 5 bugs. The raw MCP path (mcp_cua_driver_*) works correctly for all of these — the bugs are specific to the Hermes wrapper layer.


Bug 1: app= parameter ignored on initial capture

Repro:

computer_use(action="capture", mode="som", app="Calculator")

Expected: Captures the Calculator app window.

Actual: Captures the frontmost app (in my case, "Fuwari" — a menu bar utility). The app parameter is completely ignored on the first capture call.

Workaround: Call focus_app(app="計算機") first, then capture(mode="som"). Note: the app name must match the macOS localized name (e.g., "計算機" not "Calculator").


Bug 2: capture_after=True loses app context after actions

Repro:

computer_use(action="capture", mode="som", app="計算機")  # Works after focus_app
computer_use(action="click", element=14, capture_after=True)

Expected: Clicks element 14, then recaptures the same app (計算機) for verification.

Actual: The click itself succeeds, but the follow-up capture reverts to the wrong app (Fuwari in my case). The app context is lost between the action and the post-action capture.

Workaround: Don't use capture_after=True. Click first, then separately call focus_app + capture.


Bug 3: type action broken — "Unknown tool: type_text_chars"

Repro:

computer_use(action="type", text="hello")

Expected: Types "hello" into the focused element.

Actual: Error: cua-driver error: Invalid params: Unknown tool: type_text_chars

The wrapper appears to map action="type" to a type_text_chars tool that doesn't exist in cua-driver. The correct MCP tool name is type_text.

Workaround: Use mcp_cua_driver_type_text(pid=..., text="hello") directly.


Bug 4: drag action not supported

Repro:

computer_use(action="drag", from_coordinate=[100,200], to_coordinate=[400,500])

Expected: Performs a drag gesture from (100,200) to (400,500).

Actual: Error: drag is not supported by the cua-driver backend.

However, the raw MCP tool mcp_cua_driver_drag works perfectly. The wrapper simply hasn't implemented the drag action mapping.

Workaround: Use mcp_cua_driver_drag(pid=..., from_x=100, from_y=200, to_x=400, to_y=500).


Bug 5: Element labels stripped in capture results

Repro:

computer_use(action="capture", mode="som", app="計算機")

Actual output (Path A):

#14 AXButton "" @ (0, 0, 0, 0)
#15 AXButton "" @ (0, 0, 0, 0)

Expected output (raw MCP mcp_cua_driver_get_window_state):

[14] AXButton (1) id=One
[15] AXButton (2) id=Two

All element labels are empty strings in the Path A wrapper output, while the raw MCP path preserves them. This makes element identification impossible without trial-and-error clicking.

Workaround: Use raw MCP for discovery, then use element indices with Path A if needed.


E2E Verification

Despite these bugs, I verified the underlying mechanism works via the raw MCP path:

  • Calculator test: 12 + 3 = 15 ✅ (using mcp_cua_driver_get_window_statemcp_cua_driver_click sequence)
  • All 7 click operations succeeded
  • Result verified via AX tree query

Suggested Fixes

  1. Bug 1: Ensure app= parameter is passed through to the cua-driver focus/lookup before capture
  2. Bug 2: Preserve the app context (pid/window_id) across action → capture_after calls
  3. Bug 3: Map action="type" to type_text instead of type_text_chars
  4. Bug 4: Implement drag action mapping to mcp_cua_driver_drag
  5. Bug 5: Include element labels (AXTitle/AXDescription) in the capture output

Related

  • trycua/cua#1467 (ScreenCaptureKit broken on macOS 26.4.1)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug] computer_use toolset: 5 bugs found during hands-on testing (macOS 26.4.1, cua-driver v0.1.6)