hermes - 💡(How to fix) Fix [Feature]: GUI Teaching for Hermes Agent – Record mouse/keyboard workflows as executable skills (macOS) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#19802Fetched 2026-05-05 06:05:07
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×3closed ×1reopened ×1

Code Example

git clone https://github.com/VodoooFilms/Linka
cd Linka
npm install
npm run build:native:mac
npm run dev

### Proposed Solution

### Problem Statement

Hermes Agent currently excels at automating code and terminal tasks, but it lacks a native way to learn or automate repetitive **Graphical User Interface (GUI)** workflows (e.g., opening a browser, navigating a specific site, clicking a series of buttons).

| Before | After |
|--------|-------|
| Hermes could only automate code and terminal | Hermes can now learn and replay **any GUI workflow** |
| Skills required manual coding | Skills are **demonstrated once** and auto‑generated |
| No native GUI context | Events include **app name, window title, bounds** |

---

### Proposed Solution

I have developed **Linka**, an open‑source (MIT) remote control app that now includes a full Hermes integration with **Teach Mode** on macOS.

**Repository:** https://github.com/VodoooFilms/Linka

**How it works:**

1. Click **Teach** in Linka’s UI
2. Perform a GUI workflow once (e.g., open SafariYouTube → play a mix)
3. Click **Teach** again → name your skill
4. Linka auto‑generates an executable skill file at `~/.hermes/skills/linka/<name>.md`
5. Hermes Agent can replay it anytime

**What this enables Hermes to do:**

- Ingest recordings via `linka_ingest`
- Detect repeated patterns with `linka_analyze`
- Load skills from `~/.hermes/skills/linka/`
- Replay workflows with window‑offset compensation

**Technical implementation (already working):**

| Component | Technology | Status |
|-----------|------------|--------|
| Event capture | CGEventTap (Swift) ||
| 60‑second circular buffer | Native Swift ||
| Teach markers | `teach_start/stop` ||
| HTTP endpoint | `GET /hermes/events` ||
| Hotkey | `Cmd+Shift+Option+L` ||
| Skill generation | Markdown + YAML + JSON ||
| Hermes tools | `linka_ingest`, `linka_analyze` ||
| Smoke tests | `node:test` (3/3 pass) ||

**Files changed (14 files, +3,9202,327 lines):**

- `main.js` – hotkey, inbox dump
- `server.js``/hermes/events`, `teach_save`, skill generator
- `native/mac-input/main.swift`CGEventTap, buffer, markers
- `platform/input/macos.js``teachStart/Stop`, `dumpEvents`
- `index.html`Teach button, modal, responsive UI
- `README.md` – full documentation
- `tests/smoke.test.js` – smoke test suite

---

### Alternatives Considered

| Approach | Why not chosen |
|----------|----------------|
| Manual scripting (PyAutoGUI, AppleScript) | Requires coding, no pattern detection |
| Robot.js | No native CGEvent, permission issues |
| Hammerspoon | Lua‑only, no Hermes integration |
| Commercial tools (Keyboard Maestro) | Closed source, no Hermes bridge |

**Why Linka is different:** It records **context** (app name, window bounds, screenshot) and generates **executable Hermes skills** with zero friction.

---

### Additional Context

**Testing instructions:**

---
RAW_BUFFERClick to expand / collapse

Problem or Use Case

GUI Teaching for Hermes Agent – Record mouse/keyboard workflows as executable skills (macOS)

Repository

https://github.comVodoooFilms/Linka

This repository contains a complete, working implementation of a GUI Teaching Mode for Hermes Agent on macOS. Linka is an open‑source (MIT) remote control app that now includes full Hermes integration.


TL;DR

Click Teach in Linka’s UI → perform a GUI workflow once (e.g., open Safari → YouTube → play a mix) → click Teach again → name your skill. Linka generates an executable skill file. Hermes Agent can replay it anytime.

No code. No scripting. GUI automation without an API.


What this enables for Hermes Agent

BeforeAfter
Hermes could only automate code and terminalHermes can now learn and replay any GUI workflow
Skills required manual codingSkills are demonstrated once and auto‑generated
No native GUI contextEvents include app name, window title, bounds

Hermes Agent can now:

  • Ingest recordings via linka_ingest
  • Detect repeated patterns with linka_analyze
  • Load skills from ~/.hermes/skills/linka/
  • Replay workflows with window‑offset compensation

What is included in Linka’s Hermes integration

🎓 Teach Mode (macOS only)

  • One‑button recording – Teach button in the controller UI starts/stops capture
  • CGEventTap – listens to mouse clicks, drags, keystrokes, scrolls (uses Accessibility permission)
  • 60‑second circular buffer – only saves events recorded during the teaching period
  • Smart throttling – ignores noisy mouse moves (>20px, ~7 events/sec)
  • Native modal for skill naming – dark overlay with Cancel / Save Skill

🤖 Auto‑generated Hermes skills

Skills are saved to ~/.hermes/skills/linka/<name>.md with:

  • YAML frontmatter (name, type, created, events_count)
  • Human‑readable step list (timestamps, coordinate context)
  • Raw JSON events for exact replay
  • Reference screenshot (<name>.png) captured at recording time
  • Window bounds (Swift CGWindowList) – works around macOS 15 AppleScript coordinate bugs

📡 Event capture pipeline

  • Hotkey Cmd+Shift+Option+L – dumps the current CGEvent buffer to ~/.hermes/linka/inbox/
  • HTTP endpoint GET /hermes/events – programmatic access for Hermes
  • Foreground app context – app name, bundle ID, window title captured alongside events

🖥️ Desktop responsive UI

  • Media queries for desktop layouts (logo visible, 3‑column status bar)
  • Both source (index.html) and packaged (dist/index.html) kept in sync

🔐 Localhost auto‑authentication

  • WebSocket connections from 127.0.0.1/::1 are auto‑authenticated
  • Client‑side stale‑session retry – clears stored session and reconnects on auth_error
  • Desktop Bridge opens without QR scan when accessed from localhost

Files changed (14 files, +3,920 −2,327 lines)

FileKey changes
main.jsHotkey registration, event dump to inbox
server.js/hermes/events, teach_save, localhost auto‑auth, generateTeachSkill()
native/mac-input/main.swiftCGEventTap capture, 60s buffer, teach markers, EVENTS_JSON protocol
platform/input/macos.jsteachStart(), teachStop(), dumpEvents()
index.htmlTeach button, modal, desktop responsive CSS, auth_error retry
README.mdFull Hermes integration documentation
tests/smoke.test.jsNew smoke test (/api/status, /hermes/events, security headers)

Compatibility

  • macOS only – requires CGEventTap and Accessibility permission
  • Hermes Agent v0.12.0 or newer
  • Node.js 20+
  • Xcode Command Line Tools (for native helper)

The Windows version retains full remote control but no Hermes integration (GET /hermes/events returns 501 on Windows).


Testing

git clone https://github.com/VodoooFilms/Linka
cd Linka
npm install
npm run build:native:mac
npm run dev

### Proposed Solution

### Problem Statement

Hermes Agent currently excels at automating code and terminal tasks, but it lacks a native way to learn or automate repetitive **Graphical User Interface (GUI)** workflows (e.g., opening a browser, navigating a specific site, clicking a series of buttons).

| Before | After |
|--------|-------|
| Hermes could only automate code and terminal | Hermes can now learn and replay **any GUI workflow** |
| Skills required manual coding | Skills are **demonstrated once** and auto‑generated |
| No native GUI context | Events include **app name, window title, bounds** |

---

### Proposed Solution

I have developed **Linka**, an open‑source (MIT) remote control app that now includes a full Hermes integration with **Teach Mode** on macOS.

**Repository:** https://github.com/VodoooFilms/Linka

**How it works:**

1. Click **Teach** in Linka’s UI
2. Perform a GUI workflow once (e.g., open Safari → YouTube → play a mix)
3. Click **Teach** again → name your skill
4. Linka auto‑generates an executable skill file at `~/.hermes/skills/linka/<name>.md`
5. Hermes Agent can replay it anytime

**What this enables Hermes to do:**

- Ingest recordings via `linka_ingest`
- Detect repeated patterns with `linka_analyze`
- Load skills from `~/.hermes/skills/linka/`
- Replay workflows with window‑offset compensation

**Technical implementation (already working):**

| Component | Technology | Status |
|-----------|------------|--------|
| Event capture | CGEventTap (Swift) ||
| 60‑second circular buffer | Native Swift ||
| Teach markers | `teach_start/stop` ||
| HTTP endpoint | `GET /hermes/events` ||
| Hotkey | `Cmd+Shift+Option+L` ||
| Skill generation | Markdown + YAML + JSON ||
| Hermes tools | `linka_ingest`, `linka_analyze` ||
| Smoke tests | `node:test` (3/3 pass) ||

**Files changed (14 files, +3,920 −2,327 lines):**

- `main.js` – hotkey, inbox dump
- `server.js``/hermes/events`, `teach_save`, skill generator
- `native/mac-input/main.swift` – CGEventTap, buffer, markers
- `platform/input/macos.js``teachStart/Stop`, `dumpEvents`
- `index.html` – Teach button, modal, responsive UI
- `README.md` – full documentation
- `tests/smoke.test.js` – smoke test suite

---

### Alternatives Considered

| Approach | Why not chosen |
|----------|----------------|
| Manual scripting (PyAutoGUI, AppleScript) | Requires coding, no pattern detection |
| Robot.js | No native CGEvent, permission issues |
| Hammerspoon | Lua‑only, no Hermes integration |
| Commercial tools (Keyboard Maestro) | Closed source, no Hermes bridge |

**Why Linka is different:** It records **context** (app name, window bounds, screenshot) and generates **executable Hermes skills** with zero friction.

---

### Additional Context

**Testing instructions:**

```bash
git clone https://github.comVodoooFilms/Linka
cd Linka
npm install
npm run build:native:mac
npm run dev

### Alternatives Considered

_No response_

### Feature Type

New tool

### Scope

None

### Contribution

- [ ] I'd like to implement this myself and submit a PR

### Debug Report (optional)

```shell

extent analysis

TL;DR

To resolve issues with Linka's Hermes integration, ensure you're running the latest version of Hermes Agent (v0.12.0 or newer) and have the necessary dependencies installed, including Node.js 20+ and Xcode Command Line Tools.

Guidance

  • Verify that your environment meets the compatibility requirements: macOS only, Hermes Agent v0.12.0 or newer, Node.js 20+, and Xcode Command Line Tools.
  • Check the README.md file in the Linka repository for full Hermes integration documentation and troubleshooting tips.
  • If encountering issues with event capture or skill generation, review the changes made to files like main.js, server.js, and native/mac-input/main.swift to ensure proper implementation.
  • Run the smoke tests (tests/smoke.test.js) to identify any potential issues with the integration.

Example

No specific code example is provided due to the nature of the issue, but reviewing the main.js and server.js files for correct implementation of the /hermes/events endpoint and skill generation can be helpful.

Notes

The solution provided is based on the information given in the issue and might not cover all possible scenarios or edge cases. Ensure you have the latest version of all dependencies and follow the testing instructions provided in the issue for the best results.

Recommendation

Apply the workaround by ensuring all compatibility requirements are met and reviewing the code changes for proper implementation, as the issue seems to be related to the setup and configuration of Linka's Hermes integration rather than a version-specific bug that would require an upgrade to a fixed version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING