hermes - 💡(How to fix) Fix Feature: live meeting voice bridge via Vexa /speak

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Add a meeting voice bridge plugin/toolset, initially backed by Vexa because it is open-source and already has interactive meeting controls.

Code Example

meeting_voice:
  provider: vexa
  default_mode: observer
  require_explicit_speak_approval: true
  max_utterance_seconds: 20
  tts_provider: edge  # or openai/minimax/elevenlabs/local
  platforms:
    teams:
      enabled: true
    google_meet:
      enabled: true
    zoom:
      enabled: true
  vexa:
    base_url: http://127.0.0.1:18056
    api_key_env: VEXA_API_KEY
RAW_BUFFERClick to expand / collapse

Feature description

Add a first-class live meeting voice bridge so a Hermes agent can participate in an online meeting as a real-time voice participant, not only as a post-meeting summarizer or text backchannel.

Target behavior:

  • The agent joins or attaches to an active Teams/Google Meet/Zoom meeting through a meeting-bot runtime such as Vexa.
  • Hermes receives live transcript/audio events from the meeting.
  • The user can explicitly ask the agent to speak, or a configured policy can allow limited autonomous interventions.
  • Hermes generates a short answer, renders TTS, and injects it into the bot microphone so meeting participants hear the agent in the call.
  • The agent can also optionally send meeting chat messages, but voice is the main missing path.

Motivation

Hermes already has:

  • messaging gateway + tools;
  • TTS providers;
  • Teams meeting pipeline for post-meeting summaries/transcripts;
  • live meeting/backchannel patterns;
  • external meeting-bot candidates such as Vexa, which exposes /speak, chat, screen, and transcript APIs.

The gap is an official, safe integration layer that turns these pieces into an operator-facing workflow:

live meeting transcript → Hermes reasoning/policy → TTS → meeting bot microphone injection.

Use case: an agent should be able to join a meeting as a named assistant, listen, and speak only when authorized or when a strict policy permits it.

Proposed solution

Add a meeting voice bridge plugin/toolset, initially backed by Vexa because it is open-source and already has interactive meeting controls.

Suggested tools:

  • meeting_join(platform, meeting_url | native_meeting_id, mode="observer|voice")
  • meeting_status(meeting_id)
  • meeting_transcript(meeting_id, since=None)
  • meeting_say(meeting_id, text, voice=None, provider=None)
  • meeting_chat_send(meeting_id, text)
  • meeting_leave(meeting_id)

Suggested config:

meeting_voice:
  provider: vexa
  default_mode: observer
  require_explicit_speak_approval: true
  max_utterance_seconds: 20
  tts_provider: edge  # or openai/minimax/elevenlabs/local
  platforms:
    teams:
      enabled: true
    google_meet:
      enabled: true
    zoom:
      enabled: true
  vexa:
    base_url: http://127.0.0.1:18056
    api_key_env: VEXA_API_KEY

Safety / governance requirements

This should default to safe behavior:

  • Observer-only by default.
  • Speaking into a meeting is external/reputational output; require explicit user approval unless a profile explicitly opts into autonomous speech.
  • Keep max utterance length short.
  • Log every meeting_say with timestamp, meeting ID, text, and triggering user/policy.
  • Support interruption/cancel when the user or another participant starts speaking.
  • Allow profile-level policies like backchannel_only, chat_only, voice_on_explicit_command, autonomous_voice_allowed.
  • Avoid committing commercial scope, pricing, deadlines, legal positions, or third-party actions through autonomous speech unless explicitly authorized.

Vexa integration notes

Relevant Vexa capabilities observed:

  • POST /bots can create bots for Teams/Meet/Zoom.
  • WebSocket transcript stream exists.
  • Interactive endpoints exist or are documented:
    • POST /bots/{platform}/{native_meeting_id}/speak
    • DELETE /bots/{platform}/{native_meeting_id}/speak
    • chat read/write
    • screen/avatar controls

Related Vexa issues/docs:

  • Vexa issue #120 documents a meeting interaction interface with /speak, chat, and screen sharing.
  • Vexa issue #333 requests external AI agent integration via agent URL / bot camera+mic.

Acceptance criteria

  1. Local/self-hosted Vexa can be configured from Hermes without hardcoding secrets.
  2. Hermes can join a test meeting in observer mode and stream transcript/backchannel.
  3. meeting_say causes participants to hear the agent through the meeting bot microphone.
  4. The speak path returns success/failure based on real playback status, not just command enqueue.
  5. A user can interrupt/cancel current speech.
  6. The integration works at least for Teams in a live test; Meet/Zoom can follow.
  7. All meeting speech events are auditable.
  8. Documentation explains the difference between post-meeting transcript pipelines and live meeting voice participation.

Alternatives considered

  • Continue using Teams Graph transcript pipeline only: good for post-meeting summaries, but it cannot speak live.
  • Use Telegram/Slack voice backchannel only: safe and useful, but not a true meeting participant.
  • Build a native Graph Communications bot from scratch: powerful, but much heavier than integrating an existing meeting-bot runtime first.
  • Browser-only automation: possible, but fragile without a dedicated audio bridge and bot lifecycle API.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING