openclaw - 💡(How to fix) Fix AI-powered in-app help that guides users by walking them through the UI [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62512Fetched 2026-04-08 03:03:18
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

The consequence is important: the agent never has to guess the state of the UI. It is told. If you refactor a screen tomorrow, the guide adapts on the next question with zero changes to its prompt or its training. There is no documentation to keep in sync, because the documentation is the live DOM.

RAW_BUFFERClick to expand / collapse

The idea

Help in most apps is a dead end: a search box, a docs site in another tab, a tooltip pointing at the wrong thing. People don't want to read how to do something — they want someone to show them.

This issue proposes a new kind of help for OpenClaw: an AI guide that lives inside the app, listens to what you're trying to do in plain language, and physically walks you through it on your real screen. You ask "how do I connect Telegram?" and a small mascot ("Claw") flies to the right tab, opens the right section, highlights the right button, and explains each step in your language. If the answer lives on another page, it takes you there. Follow-ups just continue the conversation.

It's help, but powered by an LLM that can actually see your UI and act on it — not a chatbot that says "go to Settings → Channels" and leaves you to find them.

Why this is the right shape for help

  • Zero learning curve. Ask in your own words. No menus to memorize, no docs to skim.
  • Always correct. The guide reads the live DOM every turn, so it can never point at a button that no longer exists.
  • Shows instead of tells. Walking + highlighting + auto-clicking is dramatically faster than prose for "where is X?" — which is the vast majority of support questions.
  • Cross-tab aware. If the answer lives elsewhere, the guide navigates there instead of saying "go to…".
  • Conversational and multilingual. Follow-ups work; the tone stays warm; replies come in your language.
  • Replaces docs for the 80% case. Most "how do I…" questions collapse into a single sentence to Claw.

For a product as broad as OpenClaw (channels, providers, agents, gateway, sessions, plugins…), this turns the scariest part of onboarding — "I have no idea where anything is" — into a one-line ask.

Core concepts

The whole system rests on four ideas. Each one exists to solve a specific failure mode that traditional product tours and chat-based help suffer from.

1. The UI describes itself, every turn

Most "AI assistants" inside apps are blind: they answer from a static knowledge base written months ago. This one isn't. Every time the user asks something, the controller takes a fresh snapshot of the page — which tab is active, which elements are actually visible right now, what's expanded, what's collapsed, what the route is — and feeds that snapshot to the agent as part of its prompt.

The consequence is important: the agent never has to guess the state of the UI. It is told. If you refactor a screen tomorrow, the guide adapts on the next question with zero changes to its prompt or its training. There is no documentation to keep in sync, because the documentation is the live DOM.

2. A catalog of helpable elements as ground truth

A live DOM snapshot tells the agent what's currently on screen, but it doesn't tell it what exists elsewhere in the app. To answer "how do I change my language?" the agent has to know that the language picker exists at all, even if the user is currently on a different tab.

That's the job of the catalog. Every UI element the guide can talk about is tagged with a stable data-guide id and listed in a single registry. Each entry knows which tab owns it, which feature keywords describe it ("language", "model", "token"…), and whether it's hidden behind something the user has to expand first. The catalog is the agent's map of the entire product — a flat, searchable index of every helpable surface.

Two properties make this trustworthy:

  • Stable ids, not CSS selectors. Selectors break the moment a className changes. data-guide ids are intentional contracts; they survive refactors.
  • Drift is impossible. A test enforces that every data-guide literal in the source has a matching catalog entry, and vice versa. The catalog cannot silently lie about the UI, because CI fails the moment they disagree.

3. A typed action protocol instead of free text

This is the most important constraint, and the one that makes the whole thing safe.

A normal chat agent replies with prose. Prose is unbounded: it can hallucinate buttons, invent flows, contradict itself, or quietly drift off topic. None of that is acceptable when the agent is driving the UI on the user's behalf.

So the guide agent doesn't reply with prose. It replies with one typed JSON action at a time, drawn from a tiny fixed vocabulary:

  • highlight — fly to an element and explain it
  • click — auto-click an element (toggles, accordions, tabs)
  • reveal — expand a section so its children become discoverable
  • navigate — switch to another tab
  • done — terminate a successful tour
  • answer — reply to a conceptual question without a tour

Each action references an element by its catalog id, never by selector. Before the mascot executes anything, the controller validates the action against the live DOM: does this id exist? is it in the catalog? is it currently reachable, or do we need to reveal something first? Invalid actions are rejected outright.

The result: the LLM gets the freedom to plan, but the blast radius of any single response is one validated, well-typed operation on a known element. Hallucinated buttons are impossible by construction. The agent cannot lie its way into clicking the wrong thing, because there is no "wrong thing" it can name.

4. A multi-turn loop, not a single-shot tour

Traditional product tours are scripted: a fixed list of "step 1, step 2, step 3" written ahead of time. The guide doesn't work that way. It plans one step at a time, and after every step the loop restarts with a fresh snapshot of the page.

This matters because of navigate and reveal. The moment the mascot switches tabs or expands a section, an entirely new set of elements becomes reachable. A pre-planned tour would be working from a stale model of the world; a per-turn loop is always grounded in what the user can actually see now.

It also lets the agent be honest about uncertainty. Instead of committing to a five-step plan and being wrong on step three, it commits to one step, observes the result, and decides what to do next — much closer to how a human helper would behave.

A short conversation memory rides alongside the loop so follow-up questions ("and how do I change the model?") continue naturally from the previous tour, instead of starting from scratch every time.

A new dedicated agent: guide

The guide isn't a generic chat assistant with a clever prompt bolted on — it's a new first-class OpenClaw agent, with its own session, its own system prompt, and its own response contract. It rides on the same gateway WebSocket and chat plumbing every other agent uses, so it inherits streaming, history, model selection, and provider routing for free.

What makes the guide agent different from a normal chat agent is exactly the four concepts above: its system prompt is rebuilt every turn from the live UI snapshot and the catalog, its replies are constrained to the typed action protocol, and it runs as a per-turn loop rather than a single shot.

In short: a new agent purpose-built to drive a UI, with the LLM constrained to a safe, typed action protocol instead of free text.

The loop in one sentence: snapshot the page → ask the guide agent → get one typed action → validate against the live DOM → animate the mascot → repeat until done.

extent analysis

TL;DR

Implement a new AI guide agent in OpenClaw that utilizes a live DOM snapshot, a catalog of helpable elements, a typed action protocol, and a multi-turn loop to provide interactive and accurate guidance to users.

Guidance

  • Define a catalog of helpable elements with stable data-guide ids to ensure the agent's knowledge of the UI is trustworthy and up-to-date.
  • Implement a typed action protocol to constrain the agent's replies to a safe and well-defined set of actions, such as highlight, click, and navigate.
  • Develop a multi-turn loop that allows the agent to plan one step at a time, with a fresh snapshot of the page after every step, to ensure the agent is always grounded in the current state of the UI.
  • Integrate the new guide agent with the existing gateway WebSocket and chat plumbing to leverage streaming, history, model selection, and provider routing.

Example

A sample implementation of the typed action protocol could include a JSON object with the following structure:

{
  "action": "highlight",
  "elementId": "language-picker",
  "message": "Select your language"
}

This object would instruct the agent to highlight the language picker element and display a message to the user.

Notes

The implementation of the new guide agent will require significant development and testing to ensure it is safe, accurate, and effective. Additionally, the agent's ability to handle uncertainty and edge cases will be crucial to its success.

Recommendation

Apply the proposed solution by implementing the new guide agent with the specified features and constraints, as it has the potential to provide a significantly improved user experience and reduce support queries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING