openclaw - 💡(How to fix) Fix [Feature]: Clicky companion support - screen-aware voice assistant as node [1 participants]

openclaw2026-04-08 20:16:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#63350•Fetched 2026-04-09 07:54:57

View on GitHub

Comments

Participants

Timeline

Reactions

Author

kokoima

Participants

kokoima

Timeline (top)

labeled ×1

Support Clicky (https://github.com/farzaa/clicky) as a native macOS node/client that connects to the Gateway via WS Protocol v3 — enabling a screen-aware voice companion with persistent memory, tools, and multi-model support.

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

Problem to solve

OpenClaw has Talk Mode and Voice Wake for voice interaction, but lacks a visual desktop companion that can see the user's screen and point at UI elements. Clicky (4.5K+ stars, trending) does exactly this — screen capture, push-to-talk, blue cursor overlay with [POINT:x,y:label] — but it's a stateless Claude wrapper with no memory, no tools, and no persistent context.

Users who want a screen-aware AI assistant currently have to choose: Clicky (great UX, no backend) or OpenClaw (great backend, no visual overlay). There's no way to combine them.

Related Clicky-side proposal: https://github.com/farzaa/clicky/issues/30

Proposed solution

Clicky companion skill — A skill (clicky-companion) that appends POINT coordinate instructions to the system prompt when the session originates from a Clicky client. This moves the POINT prompt from Clicky's hardcoded Swift string into a maintainable, editable skill file.
Session routing for Clicky clients — Clicky connects as a node with caps: ["screen", "voice"]. Sessions routed to a dedicated key pattern (e.g., clicky:<device-id>) with optional model/prompt overrides via sessionOverrides.
Verify multi-image attachments in chat.send — Clicky sends labeled screenshots from multiple monitors as base64 image attachments. The media pipeline should already handle this, but needs verification with the multi-image + label pattern.
TTS via talk.speak/tts.convert — Clicky would use existing Gateway TTS RPCs instead of direct ElevenLabs calls. Main concern: low-latency audio delivery for real-time voice responses.
(Optional future) POINT metadata extraction — Extract [POINT:x,y:label] tags from agent responses server-side and include as structured metadata in event:agent streaming payloads. Non-blocking — client-side parsing works fine initially.

Alternatives considered

• Clicky stays standalone: users configure their own Cloudflare Worker + 3 API keys. Works but no memory, no tools, no cross-device continuity. • OCPlatform builds its own overlay: duplicates the cursor/POINT/overlay work Clicky already does well. Not worth it when integration is simpler. • Generic "screen capture" node protocol: more abstract but overkill for the immediate use case. Clicky integration is concrete and achievable now.

Impact

• Affected: macOS users who want a visual desktop AI companion with full agent capabilities • Severity: New capability (not blocking existing workflows) • Frequency: Every session — this would be the primary interaction mode for desktop users • Consequence: Without this, OpenClaw desktop interaction is text/voice only (no screen awareness, no visual pointing). Clicky users get no memory or tools.

Evidence/examples

• Clicky repo: https://github.com/farzaa/clicky (4.5K+ stars, MIT, active development) • Clicky-side feature proposal: https://github.com/farzaa/clicky/issues/30 • OpenClaw already has the full node protocol, TTS integration, and media pipeline needed • The POINT system ([POINT:x,y:label]) is proven — Clicky uses it successfully for screen element pointing

Additional information

https://github.com/farzaa/clicky

The Clicky-side PR would add OpenClawClient.swift implementing Gateway WS Protocol v3 (connect, chat.send, event:agent streaming, tts.convert). The current Cloudflare Worker mode remains the default — OpenClaw mode is opt-in.

Key design question: should Clicky connect as role: "node" (natural fit for screen/voice caps) or would a lighter-weight client role be more appropriate?

Detailed architecture doc available on request.

extent analysis

TL;DR

Integrate Clicky as a native macOS node/client that connects to the Gateway via WS Protocol v3 to enable a screen-aware voice companion with persistent memory, tools, and multi-model support.

Guidance

Implement a Clicky companion skill that appends POINT coordinate instructions to the system prompt when the session originates from a Clicky client.
Route sessions from Clicky clients to a dedicated key pattern (e.g., clicky:<device-id>) with optional model/prompt overrides via sessionOverrides.
Verify that the media pipeline handles multi-image attachments with labels sent by Clicky.
Use existing Gateway TTS RPCs (talk.speak/tts.convert) for low-latency audio delivery instead of direct ElevenLabs calls.

Example

No specific code example is provided due to the lack of explicit code references in the issue.

Notes

The integration of Clicky with OpenClaw requires careful consideration of the design and architecture to ensure seamless interaction between the two systems. The key design question of whether Clicky should connect as a "node" or a lighter-weight client role needs to be addressed.

Recommendation

Apply the proposed solution to integrate Clicky with OpenClaw, as it provides a concrete and achievable way to enable a screen-aware voice companion with persistent memory, tools, and multi-model support. This approach allows for a more comprehensive and user-friendly experience for macOS users.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Feature]: Clicky companion support - screen-aware voice assistant as node [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: Clicky companion support - screen-aware voice assistant as node [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Problem to solve

Proposed solution

Alternatives considered

Impact

Evidence/examples

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING