``` ### CSS selectors used (must stay stable) | Selector | Purpose | |---|---| | `.agent-chat__toolbar-left` | Container to inject mic button into | | `.agen","inLanguage":"en-US","datePublished":"2026-05-06T17:08:58Z","dateModified":"2026-05-06T17:08:58Z","mainEntityOfPage":{"@type":"WebPage","@id":"https://www.stepcodex.com/en/issue/enhancement-control-ui-standalone-browser-native"},"author":{"@type":"Person","name":"erforschtbot-cmyk","url":"https://github.com/erforschtbot-cmyk","image":"https://github.com/erforschtbot-cmyk"},"publisher":{"@type":"Organization","name":"StepCodex","url":"https://www.stepcodex.com"},"articleSection":"openclaw","about":[{"@type":"Thing","name":"openclaw","url":"https://www.stepcodex.com/en/category/openclaw"}],"contributor":[{"@type":"Person","name":"clawsweeper[bot]","url":"https://github.com/clawsweeper%5Bbot%5D","image":"https://github.com/clawsweeper%5Bbot%5D"}],"keywords":"[Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API), openclaw, how to fix, fix, troubleshooting, root cause, solution, StepCodex","interactionStatistic":{"@type":"InteractionCounter","interactionType":"https://schema.org/LikeAction","userInteractionCount":0}},{"@context":"https://schema.org","@type":"BreadcrumbList","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https://www.stepcodex.com/en/issue"},{"@type":"ListItem","position":2,"name":"openclaw","item":"https://www.stepcodex.com/en/category/openclaw"},{"@type":"ListItem","position":3,"name":"[Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API)","item":"https://www.stepcodex.com/en/issue/enhancement-control-ui-standalone-browser-native"}]}]

openclaw - 💡(How to fix) Fix [Enhancement]: Control UI — Standalone browser-native voice input (SpeechRecognition API) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78560Fetched 2026-05-07 03:35:23
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Timeline (top)
subscribed ×3mentioned ×2commented ×1

Add a self-contained, browser-native voice input button to the Control UI chat using the Web Speech API (SpeechRecognition) — no server-side audio pipeline needed. One button, two modes: short-click for auto-send dictation, hold 3 seconds for continuous conversation.

This is a different approach from the MediaRecorder-based proposals in #73634 and #41363: SpeechRecognition works entirely client-side with zero server load, supports interim (live) transcripts, and integrates cleanly via script injection.

Tested and working on: OpenClaw v2026.5.3+, Chrome/Edge (any browser with SpeechRecognition support).

Error Message

Error handling:

Root Cause

Add a self-contained, browser-native voice input button to the Control UI chat using the Web Speech API (SpeechRecognition) — no server-side audio pipeline needed. One button, two modes: short-click for auto-send dictation, hold 3 seconds for continuous conversation.

This is a different approach from the MediaRecorder-based proposals in #73634 and #41363: SpeechRecognition works entirely client-side with zero server load, supports interim (live) transcripts, and integrates cleanly via script injection.

Tested and working on: OpenClaw v2026.5.3+, Chrome/Edge (any browser with SpeechRecognition support).

Fix Action

Fix / Workaround

var nativeSetter = Object.getOwnPropertyDescriptor(
  HTMLTextAreaElement.prototype, 'value'
)?.set;
nativeSetter.call(el, value);
el.dispatchEvent(new Event('input', { bubbles: true }));

The selector-based injection pattern can serve as a reference for the component's integration points and the event dispatching needed for Lit compatibility.

Code Example

index.html (script tag before </head>)
  └── assets/stt-inject.js (standalone IIFE, ~300 lines)
        ├── MutationObserver waits for .agent-chat__toolbar-left
        ├── Creates mic button with 2 interaction modes
        ├── Uses SpeechRecognition API (lang configurable)
        └── Injects text via native setter + KeyboardEvent / click fallback

---

// Silence timer — the core of continuous mode stability
recognition.onresult = function (event) {
  if (silenceTimer) { clearTimeout(silenceTimer); silenceTimer = null; }
  // ... extract interim + final text ...
  silenceTimer = setTimeout(function () {
    if (finalText.trim()) {
      setNativeValue(input, finalText.trim());
      sendMessage();
      finalText = '';
    }
  }, 1800);
};

---

var nativeSetter = Object.getOwnPropertyDescriptor(
  HTMLTextAreaElement.prototype, 'value'
)?.set;
nativeSetter.call(el, value);
el.dispatchEvent(new Event('input', { bubbles: true }));

---

<script src="./assets/stt-inject.js"></script>
RAW_BUFFERClick to expand / collapse

Summary

Add a self-contained, browser-native voice input button to the Control UI chat using the Web Speech API (SpeechRecognition) — no server-side audio pipeline needed. One button, two modes: short-click for auto-send dictation, hold 3 seconds for continuous conversation.

This is a different approach from the MediaRecorder-based proposals in #73634 and #41363: SpeechRecognition works entirely client-side with zero server load, supports interim (live) transcripts, and integrates cleanly via script injection.

Tested and working on: OpenClaw v2026.5.3+, Chrome/Edge (any browser with SpeechRecognition support).

Motivation

The Control UI had a built-in STT button, but it was removed after v2026.5.3 (#51085 identified the Permissions-Policy blocker, and #50579 reported it non-responsive). Users currently have no browser-based voice input in the WebChat.

SpeechRecognition offers advantages over MediaRecorder approaches:

  • Zero server load — transcription happens in-browser (Google's speech servers for Chrome, but no OpenClaw gateway involvement)
  • Live interim transcripts — user sees text appear in real-time while speaking
  • No encoding/streaming complexity — the browser handles all audio capture and recognition
  • No additional dependencies — works out of the box in Chrome, Edge, Safari

Proposed Implementation

Architecture

index.html (script tag before </head>)
  └── assets/stt-inject.js (standalone IIFE, ~300 lines)
        ├── MutationObserver waits for .agent-chat__toolbar-left
        ├── Creates mic button with 2 interaction modes
        ├── Uses SpeechRecognition API (lang configurable)
        └── Injects text via native setter + KeyboardEvent / click fallback

Files to add/modify

1. New file: dist/control-ui/assets/stt-inject.js

Self-contained IIFE that:

  • Injects a microphone button into .agent-chat__toolbar-left (after attachment button)
  • Short click → starts SpeechRecognition (continuous=false, interimResults=true), inserts final transcript into textarea, sends message via simulated Enter keydown + click fallback, mic turns off
  • Hold 3 seconds → starts SpeechRecognition (continuous=true, interimResults=true), with a 1.8s silence timer per phrase: on silence → send accumulated text → reset buffer → stay listening. Works for multiple consecutive phrases without restarting
  • Click while recording → stops immediately
  • Shows live interim transcript in a div above the input field

Key implementation details:

// Silence timer — the core of continuous mode stability
recognition.onresult = function (event) {
  if (silenceTimer) { clearTimeout(silenceTimer); silenceTimer = null; }
  // ... extract interim + final text ...
  silenceTimer = setTimeout(function () {
    if (finalText.trim()) {
      setNativeValue(input, finalText.trim());
      sendMessage();
      finalText = '';
    }
  }, 1800);
};

Error handling:

  • no-speech errors → ignored (harmless with continuous=true)
  • aborted errors → only cleanup if user deliberately stopped, otherwise restart
  • Unexpected onend → send pending text and restart

Text insertion uses React/Lit-compatible native setter approach:

var nativeSetter = Object.getOwnPropertyDescriptor(
  HTMLTextAreaElement.prototype, 'value'
)?.set;
nativeSetter.call(el, value);
el.dispatchEvent(new Event('input', { bubbles: true }));

Message sending uses two mechanisms for reliability:

  1. Simulated KeyboardEvent('keydown', { key: 'Enter' }) on the textarea (Lit catches this natively)
  2. Fallback: click .chat-send-btn after 50ms delay

2. Modified file: dist/control-ui/index.html

One line added before </head>:

    <script src="./assets/stt-inject.js"></script>

CSS selectors used (must stay stable)

SelectorPurpose
.agent-chat__toolbar-leftContainer to inject mic button into
.agent-chat__input textareaChat input field
.agent-chat__toolbar-right .chat-send-btnSend button (fallback)
.agent-chat__inputAnchor for interim transcript div

Language

Currently hardcoded to 'de-DE' — should be made configurable or auto-detected from navigator.language.

What this does NOT do

  • ❌ Does not record audio files or upload to server
  • ❌ Does not use MediaRecorder or getUserMedia
  • ❌ Does not require any server-side transcription pipeline
  • ❌ Does not touch the minified JS bundle

Migration path to built-in solution

If a proper built-in STT component is re-added to the Control UI source:

  1. Remove the <script> tag from index.html
  2. Delete stt-inject.js
  3. No other cleanup needed — the script is fully self-contained and checks for existing #stt-mic-btn before injecting

The selector-based injection pattern can serve as a reference for the component's integration points and the event dispatching needed for Lit compatibility.

Related Issues

  • #51085 — Identified the Permissions-Policy: microphone=() blocker → fix: microphone=(self)
  • #73634 — Requests MediaRecorder-based approach (different architecture)
  • #41363 — Comprehensive MediaRecorder + server-side proposal
  • #50579 — Bug: old STT button non-responsive

Verification

Tested on OpenClaw v2026.5.3+ (Linux, Chrome):

  • Mic button appears in toolbar-left (after attachment button)
  • Tooltip: "Voice (hold 3s for continuous)" when idle
  • Short click → red icon → speak → silence → text auto-sends → mic off
  • Hold 3s → continuous mode → multiple phrases → each auto-sends → mic stays active
  • Click mic while recording → stops, icon reverts
  • Survives gateway restart (re-injected on page load)
  • No console errors during normal operation

The full working script is available — happy to provide it as a PR or reference implementation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING