openclaw - 💡(How to fix) Fix [Feature]: Clicky companion support - screen-aware voice assistant as node [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63350Fetched 2026-04-09 07:54:57
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1

Support Clicky (https://github.com/farzaa/clicky) as a native macOS node/client that connects to the Gateway via WS Protocol v3 — enabling a screen-aware voice companion with persistent memory, tools, and multi-model support.

Root Cause

Support Clicky (https://github.com/farzaa/clicky) as a native macOS node/client that connects to the Gateway via WS Protocol v3 — enabling a screen-aware voice companion with persistent memory, tools, and multi-model support.

RAW_BUFFERClick to expand / collapse

Summary

Support Clicky (https://github.com/farzaa/clicky) as a native macOS node/client that connects to the Gateway via WS Protocol v3 — enabling a screen-aware voice companion with persistent memory, tools, and multi-model support.

Problem to solve

OpenClaw has Talk Mode and Voice Wake for voice interaction, but lacks a visual desktop companion that can see the user's screen and point at UI elements. Clicky (4.5K+ stars, trending) does exactly this — screen capture, push-to-talk, blue cursor overlay with [POINT:x,y:label] — but it's a stateless Claude wrapper with no memory, no tools, and no persistent context.

Users who want a screen-aware AI assistant currently have to choose: Clicky (great UX, no backend) or OpenClaw (great backend, no visual overlay). There's no way to combine them.

Related Clicky-side proposal: https://github.com/farzaa/clicky/issues/30

Proposed solution

  1. Clicky companion skill — A skill (clicky-companion) that appends POINT coordinate instructions to the system prompt when the session originates from a Clicky client. This moves the POINT prompt from Clicky's hardcoded Swift string into a maintainable, editable skill file.
  2. Session routing for Clicky clients — Clicky connects as a node with caps: ["screen", "voice"]. Sessions routed to a dedicated key pattern (e.g., clicky:<device-id>) with optional model/prompt overrides via sessionOverrides.
  3. Verify multi-image attachments in chat.send — Clicky sends labeled screenshots from multiple monitors as base64 image attachments. The media pipeline should already handle this, but needs verification with the multi-image + label pattern.
  4. TTS via talk.speak/tts.convert — Clicky would use existing Gateway TTS RPCs instead of direct ElevenLabs calls. Main concern: low-latency audio delivery for real-time voice responses.
  5. (Optional future) POINT metadata extraction — Extract [POINT:x,y:label] tags from agent responses server-side and include as structured metadata in event:agent streaming payloads. Non-blocking — client-side parsing works fine initially.

Alternatives considered

• Clicky stays standalone: users configure their own Cloudflare Worker + 3 API keys. Works but no memory, no tools, no cross-device continuity. • OCPlatform builds its own overlay: duplicates the cursor/POINT/overlay work Clicky already does well. Not worth it when integration is simpler. • Generic "screen capture" node protocol: more abstract but overkill for the immediate use case. Clicky integration is concrete and achievable now.

Impact

• Affected: macOS users who want a visual desktop AI companion with full agent capabilities • Severity: New capability (not blocking existing workflows) • Frequency: Every session — this would be the primary interaction mode for desktop users • Consequence: Without this, OpenClaw desktop interaction is text/voice only (no screen awareness, no visual pointing). Clicky users get no memory or tools.

Evidence/examples

• Clicky repo: https://github.com/farzaa/clicky (4.5K+ stars, MIT, active development) • Clicky-side feature proposal: https://github.com/farzaa/clicky/issues/30 • OpenClaw already has the full node protocol, TTS integration, and media pipeline needed • The POINT system ([POINT:x,y:label]) is proven — Clicky uses it successfully for screen element pointing

Additional information

https://github.com/farzaa/clicky

The Clicky-side PR would add OpenClawClient.swift implementing Gateway WS Protocol v3 (connect, chat.send, event:agent streaming, tts.convert). The current Cloudflare Worker mode remains the default — OpenClaw mode is opt-in.

Key design question: should Clicky connect as role: "node" (natural fit for screen/voice caps) or would a lighter-weight client role be more appropriate?

Detailed architecture doc available on request.

extent analysis

TL;DR

Integrate Clicky as a native macOS node/client that connects to the Gateway via WS Protocol v3 to enable a screen-aware voice companion with persistent memory, tools, and multi-model support.

Guidance

  • Implement a Clicky companion skill that appends POINT coordinate instructions to the system prompt when the session originates from a Clicky client.
  • Route sessions from Clicky clients to a dedicated key pattern (e.g., clicky:<device-id>) with optional model/prompt overrides via sessionOverrides.
  • Verify that the media pipeline handles multi-image attachments with labels sent by Clicky.
  • Use existing Gateway TTS RPCs (talk.speak/tts.convert) for low-latency audio delivery instead of direct ElevenLabs calls.

Example

No specific code example is provided due to the lack of explicit code references in the issue.

Notes

The integration of Clicky with OpenClaw requires careful consideration of the design and architecture to ensure seamless interaction between the two systems. The key design question of whether Clicky should connect as a "node" or a lighter-weight client role needs to be addressed.

Recommendation

Apply the proposed solution to integrate Clicky with OpenClaw, as it provides a concrete and achievable way to enable a screen-aware voice companion with persistent memory, tools, and multi-model support. This approach allows for a more comprehensive and user-friendly experience for macOS users.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING