``` ### CSS selectors used (must stay stable) | Selector | Purpose | |---|---| | `.agent-chat__toolbar-left` | Container to inject mic button into | | `.agen","inLanguage":"en-US","datePublished":"2026-05-06T17:08:58Z","dateModified":"2026-05-06T17:08:58Z","mainEntityOfPage":{"@type":"WebPage","@id":"https://www.stepcodex.com/en/issue/enhancement-control-ui-standalone-browser-native"},"author":{"@type":"Person","name":"erforschtbot-cmyk","url":"https://github.com/erforschtbot-cmyk","image":"https://github.com/erforschtbot-cmyk"},"publisher":{"@type":"Organization","name":"StepCodex","url":"https://www.stepcodex.com"},"articleSection":"openclaw","about":[{"@type":"Thing","name":"openclaw","url":"https://www.stepcodex.com/en/category/openclaw"}],"contributor":[{"@type":"Person","name":"clawsweeper[bot]","url":"https://github.com/clawsweeper%5Bbot%5D","image":"https://github.com/clawsweeper%5Bbot%5D"}],"keywords":"[Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API), openclaw, how to fix, fix, troubleshooting, root cause, solution, StepCodex","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/LikeAction","userInteractionCount":0}},{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.stepcodex.com/en/issue"},{"@type":"ListItem","position":2,"name":"openclaw","item":"https://www.stepcodex.com/en/category/openclaw"},{"@type":"ListItem","position":3,"name":"[Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API)","item":"https://www.stepcodex.com/en/issue/enhancement-control-ui-standalone-browser-native"}]}]

openclaw - 💡(How to fix) Fix [Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API) [1 comments, 2 participants]

openclaw2026-05-06 17:08:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#78560•Fetched 2026-05-07 03:35:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

erforschtbot-cmyk

Participants

clawsweeper[bot]

erforschtbot-cmyk

Timeline (top)

subscribed ×3mentioned ×2commented ×1

Add a self-contained, browser-native voice input button to the Control UI chat using the Web Speech API (SpeechRecognition) — no server-side audio pipeline needed. One button, two modes: short-click for auto-send dictation, hold 3 seconds for continuous conversation.

This is a different approach from the MediaRecorder-based proposals in #73634 and #41363: SpeechRecognition works entirely client-side with zero server load, supports interim (live) transcripts, and integrates cleanly via script injection.

Tested and working on: OpenClaw v2026.5.3+, Chrome/Edge (any browser with SpeechRecognition support).

Error Message

Error handling:

Root Cause

Tested and working on: OpenClaw v2026.5.3+, Chrome/Edge (any browser with SpeechRecognition support).

Fix Action

Fix / Workaround

var nativeSetter = Object.getOwnPropertyDescriptor(
  HTMLTextAreaElement.prototype, 'value'
)?.set;
nativeSetter.call(el, value);
el.dispatchEvent(new Event('input', { bubbles: true }));

The selector-based injection pattern can serve as a reference for the component's integration points and the event dispatching needed for Lit compatibility.

Code Example

index.html (script tag before </head>)
  └── assets/stt-inject.js (standalone IIFE, ~300 lines)
        ├── MutationObserver waits for .agent-chat__toolbar-left
        ├── Creates mic button with 2 interaction modes
        ├── Uses SpeechRecognition API (lang configurable)
        └── Injects text via native setter + KeyboardEvent / click fallback

---

// Silence timer — the core of continuous mode stability
recognition.onresult = function (event) {
  if (silenceTimer) { clearTimeout(silenceTimer); silenceTimer = null; }
  // ... extract interim + final text ...
  silenceTimer = setTimeout(function () {
    if (finalText.trim()) {
      setNativeValue(input, finalText.trim());
      sendMessage();
      finalText = '';
    }
  }, 1800);
};

---

var nativeSetter = Object.getOwnPropertyDescriptor(
  HTMLTextAreaElement.prototype, 'value'
)?.set;
nativeSetter.call(el, value);
el.dispatchEvent(new Event('input', { bubbles: true }));

---

<script src="./assets/stt-inject.js"></script>

RAW_BUFFERClick to expand / collapse

Summary

Tested and working on: OpenClaw v2026.5.3+, Chrome/Edge (any browser with SpeechRecognition support).

Motivation

The Control UI had a built-in STT button, but it was removed after v2026.5.3 (#51085 identified the Permissions-Policy blocker, and #50579 reported it non-responsive). Users currently have no browser-based voice input in the WebChat.

SpeechRecognition offers advantages over MediaRecorder approaches:

Zero server load — transcription happens in-browser (Google's speech servers for Chrome, but no OpenClaw gateway involvement)
Live interim transcripts — user sees text appear in real-time while speaking
No encoding/streaming complexity — the browser handles all audio capture and recognition
No additional dependencies — works out of the box in Chrome, Edge, Safari

Proposed Implementation

Architecture

index.html (script tag before </head>)
  └── assets/stt-inject.js (standalone IIFE, ~300 lines)
        ├── MutationObserver waits for .agent-chat__toolbar-left
        ├── Creates mic button with 2 interaction modes
        ├── Uses SpeechRecognition API (lang configurable)
        └── Injects text via native setter + KeyboardEvent / click fallback

Files to add/modify

1. New file: dist/control-ui/assets/stt-inject.js

Self-contained IIFE that:

Injects a microphone button into .agent-chat__toolbar-left (after attachment button)
Short click → starts SpeechRecognition (continuous=false, interimResults=true), inserts final transcript into textarea, sends message via simulated Enter keydown + click fallback, mic turns off
Hold 3 seconds → starts SpeechRecognition (continuous=true, interimResults=true), with a 1.8s silence timer per phrase: on silence → send accumulated text → reset buffer → stay listening. Works for multiple consecutive phrases without restarting
Click while recording → stops immediately
Shows live interim transcript in a div above the input field

Key implementation details:

// Silence timer — the core of continuous mode stability
recognition.onresult = function (event) {
  if (silenceTimer) { clearTimeout(silenceTimer); silenceTimer = null; }
  // ... extract interim + final text ...
  silenceTimer = setTimeout(function () {
    if (finalText.trim()) {
      setNativeValue(input, finalText.trim());
      sendMessage();
      finalText = '';
    }
  }, 1800);
};

Error handling:

no-speech errors → ignored (harmless with continuous=true)
aborted errors → only cleanup if user deliberately stopped, otherwise restart
Unexpected onend → send pending text and restart

Text insertion uses React/Lit-compatible native setter approach:

var nativeSetter = Object.getOwnPropertyDescriptor(
  HTMLTextAreaElement.prototype, 'value'
)?.set;
nativeSetter.call(el, value);
el.dispatchEvent(new Event('input', { bubbles: true }));

Message sending uses two mechanisms for reliability:

Simulated KeyboardEvent('keydown', { key: 'Enter' }) on the textarea (Lit catches this natively)
Fallback: click .chat-send-btn after 50ms delay

2. Modified file: dist/control-ui/index.html

One line added before </head>:

    <script src="./assets/stt-inject.js"></script>

CSS selectors used (must stay stable)

Selector	Purpose
`.agent-chat__toolbar-left`	Container to inject mic button into
`.agent-chat__input textarea`	Chat input field
`.agent-chat__toolbar-right .chat-send-btn`	Send button (fallback)
`.agent-chat__input`	Anchor for interim transcript div

Language

Currently hardcoded to 'de-DE' — should be made configurable or auto-detected from navigator.language.

What this does NOT do

❌ Does not record audio files or upload to server
❌ Does not use MediaRecorder or getUserMedia
❌ Does not require any server-side transcription pipeline
❌ Does not touch the minified JS bundle

Migration path to built-in solution

If a proper built-in STT component is re-added to the Control UI source:

Remove the <script> tag from index.html
Delete stt-inject.js
No other cleanup needed — the script is fully self-contained and checks for existing #stt-mic-btn before injecting

The selector-based injection pattern can serve as a reference for the component's integration points and the event dispatching needed for Lit compatibility.

Related Issues

#51085 — Identified the Permissions-Policy: microphone=() blocker → fix: microphone=(self)
#73634 — Requests MediaRecorder-based approach (different architecture)
#41363 — Comprehensive MediaRecorder + server-side proposal
#50579 — Bug: old STT button non-responsive

Verification

Tested on OpenClaw v2026.5.3+ (Linux, Chrome):

Mic button appears in toolbar-left (after attachment button)
Tooltip: "Voice (hold 3s for continuous)" when idle
Short click → red icon → speak → silence → text auto-sends → mic off
Hold 3s → continuous mode → multiple phrases → each auto-sends → mic stays active
Click mic while recording → stops, icon reverts
Survives gateway restart (re-injected on page load)
No console errors during normal operation

The full working script is available — happy to provide it as a PR or reference implementation.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Motivation

Proposed Implementation

Architecture

Files to add/modify

CSS selectors used (must stay stable)

Language

What this does NOT do

Migration path to built-in solution

Related Issues

Verification

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Motivation

Proposed Implementation

Architecture

Files to add/modify

CSS selectors used (must stay stable)

Language

What this does NOT do

Migration path to built-in solution

Related Issues

Verification

Still need to ship something?

RELATED_DISCOVERY

TRENDING