hermes - 💡(How to fix) Fix [Feature Request]: Hold-to-Record Voice Mode (Alternative to Push-to-Toggle)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Add an optional hold-to-record voice input mode as an alternative to the current push-to-talk toggle (Ctrl+B to start, Ctrl+B to stop). In hold-to-record mode, the user holds the record key to capture audio and releases it to stop recording and submit for transcription.

Error Message

  1. Timer heuristic: On first press, start recording; if no second press within ~500ms, treat as "hold" mode and stop on next release (complex, error-prone)

Root Cause

Add an optional hold-to-record voice input mode as an alternative to the current push-to-talk toggle (Ctrl+B to start, Ctrl+B to stop). In hold-to-record mode, the user holds the record key to capture audio and releases it to stop recording and submit for transcription.

Code Example

voice:
  enabled: true
  record_key: "ctrl+b"      # existing
  record_mode: "toggle"     # new: "toggle" | "hold"
RAW_BUFFERClick to expand / collapse

Feature Request: Hold-to-Record Voice Mode (Alternative to Push-to-Talk Toggle)

Summary

Add an optional hold-to-record voice input mode as an alternative to the current push-to-talk toggle (Ctrl+B to start, Ctrl+B to stop). In hold-to-record mode, the user holds the record key to capture audio and releases it to stop recording and submit for transcription.

Motivation

The current push-to-talk toggle workflow in CLI/TUI requires two separate key presses for every voice message:

  1. Press Ctrl+B → recording starts
  2. Speak
  3. Press Ctrl+B again → recording stops, audio is transcribed

This is functional but ergonomically suboptimal for short, frequent voice messages — the kind of interaction common in messaging apps (Telegram, Discord voice messages). Users accustomed to mobile-style "press and hold to talk" find the toggle pattern disruptive.

A hold-to-record mode would reduce the interaction to a single press-and-hold gesture, matching user expectations from modern voice interfaces.

Current Behavior

  • voice.record_key (default: ctrl+b) is a toggle keybinding
  • Implemented via prompt_toolkit @kb.add(key) in cli.py (handle_voice_record)
  • TUI uses JSON-RPC voice.toggle via tui_gateway/server.py
  • Recording state: _voice_recording boolean + _voice_lock mutex
  • Auto-stop on silence is supported (supports_silence_autostop)

Proposed Behavior

Add a new config option voice.record_mode with two values:

ModeBehavior
toggle (default)Current behavior: press to start, press again to stop
holdHold to record, release to stop and transcribe

CLI (prompt_toolkit)

  • Detect KeyPress / KeyRelease events (or use a timer-based heuristic)
  • On key press: start recording (same as current)
  • On key release: stop recording + trigger transcription
  • Must handle edge cases: key held during agent run, key released before recorder initializes, etc.

TUI (Ink/React)

  • Add onKeyDown / onKeyUp handlers for the record key
  • Mirror CLI behavior: down = start, up = stop + transcribe

Config

voice:
  enabled: true
  record_key: "ctrl+b"      # existing
  record_mode: "toggle"     # new: "toggle" | "hold"

Technical Considerations

prompt_toolkit Limitation

prompt_toolkit's standard @kb.add(key) API only fires on key press, not release. Implementing hold-to-record requires one of:

  1. Raw input mode: Use prompt_toolkit.input.vt100_parser or lower-level stdin to capture KeyPress/KeyRelease sequences directly
  2. Timer heuristic: On first press, start recording; if no second press within ~500ms, treat as "hold" mode and stop on next release (complex, error-prone)
  3. Separate keybinding: Use a modifier key (e.g. ctrl+b = toggle, ctrl+shift+b = hold) — avoids parser changes but adds cognitive load

Option 1 is cleanest but requires testing across terminals (macOS Terminal, iTerm2, VS Code terminal, Windows Terminal).

TUI (Ink)

Ink (React for terminal) supports useInput with isActive flag. Adding onKeyDown/onKeyUp may require:

  • Extending the TUI's input handling layer (ui-tui/src/)
  • Or using a global key listener with press/release tracking

Backward Compatibility

  • Default remains toggle — no breaking change
  • record_mode is optional; omitted = toggle
  • Both CLI and TUI must respect the setting

Files Likely to Change

FileChange
cli.pyhandle_voice_record — add release detection, respect record_mode
hermes_cli/config.pyAdd record_mode to default config / schema
hermes_cli/voice.pyAdd helper for mode detection
tui_gateway/server.pyAdd voice.hold_start / voice.hold_stop JSON-RPC methods
ui-tui/src/Add key release handlers for record key
website/docs/user-guide/features/voice-mode.mdDocument new mode

Labels

Suggested: type/feature, comp/cli, comp/tui, tool/voice

Priority

P3 — cosmetic, nice to have. Current toggle mode works; this is a UX enhancement.


Submitted by: @vokasug

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature Request]: Hold-to-Record Voice Mode (Alternative to Push-to-Toggle)