openclaw - 💡(How to fix) Fix Google Meet: support reliable headless agent join with audio/transcription health checks [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72478Fetched 2026-04-27 05:29:54
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

We tested the OpenClaw Google Meet integration on macOS with OpenClaw 2026.4.24 and were able to get partial/visual meeting presence, but not a reliable realtime audio participant. This issue is both a bug report from the failed test and a product request: please move the Google Meet integration toward a supported headless agent join path, with listen/transcribe as the minimum useful mode and bidirectional voice as the target.

Root Cause

The desired workflow is: invite or ask an OpenClaw agent to attend a Google Meet, then have the agent listen, transcribe/summarize, and optionally answer questions in the meeting (e.g. someone says "hey Lando" and the agent responds). This would be more valuable than a passive notes product because the agent already has workspace/project context and can be interactive.

In our case the agent has its own Google Workspace email identity (lando@...) and can be invited to calendar events. That may not be true for all OpenClaw agents, so the integration likely needs to support multiple identity/join modes:

  • Workspace email / calendar-invited agent identity
  • Guest display-name join with host lobby admission
  • Possibly delegated/service-account style calendar discovery where available
  • Explicit meeting URL join when no calendar invite exists
RAW_BUFFERClick to expand / collapse

Summary

We tested the OpenClaw Google Meet integration on macOS with OpenClaw 2026.4.24 and were able to get partial/visual meeting presence, but not a reliable realtime audio participant. This issue is both a bug report from the failed test and a product request: please move the Google Meet integration toward a supported headless agent join path, with listen/transcribe as the minimum useful mode and bidirectional voice as the target.

Why this matters

The desired workflow is: invite or ask an OpenClaw agent to attend a Google Meet, then have the agent listen, transcribe/summarize, and optionally answer questions in the meeting (e.g. someone says "hey Lando" and the agent responds). This would be more valuable than a passive notes product because the agent already has workspace/project context and can be interactive.

In our case the agent has its own Google Workspace email identity (lando@...) and can be invited to calendar events. That may not be true for all OpenClaw agents, so the integration likely needs to support multiple identity/join modes:

  • Workspace email / calendar-invited agent identity
  • Guest display-name join with host lobby admission
  • Possibly delegated/service-account style calendar discovery where available
  • Explicit meeting URL join when no calendar invite exists

Environment tested

  • OpenClaw: 2026.4.24
  • Host: macOS Mac mini
  • Browser path: Chrome / Google Meet plugin, non-dial-in requirement
  • Audio bridge tools available/attempted: BlackHole 2ch and SoX
  • Desired transport: Chrome/Meet audio, not phone dial-in/Twilio
  • Realtime provider reported connected during one attempt

What happened

We observed two different states that can be confused with success:

  1. An openclaw googlemeet join session reported Chrome/realtime active and provider readiness, but audio health stayed idle:

    • audioInputActive=false
    • audioOutputActive=false
    • lastInputBytes=0
    • lastOutputBytes=0
    • The process was later killed.
  2. A later direct/manual Chrome flow reached Google Meet lobby/visual presence as the agent display identity, but at the time the human tested audio the plugin had no active session (openclaw googlemeet status showed no sessions). So the browser/lobby presence was not an active realtime audio bridge.

We also saw audio routing mismatch/brittleness:

  • Meet microphone showed BlackHole 2ch.
  • Meet speaker output remained Mac mini Speakers / built-in output.
  • System default input was BlackHole; output was Mac mini Speakers.
  • This likely prevented a complete bidirectional audio path: human audio did not reach realtime input, and generated audio had no clear route back into Meet.

Browser automation/recovery was also brittle in this setup because some Playwright/browser operations were unavailable/unsupported in the gateway build (snapshot, screenshot, navigate, act:evaluate), making it hard to reliably inspect or recover the Meet tab/session.

Expected behavior

A successful Meet integration should have an explicit, inspectable lifecycle such as:

  1. Join requested for a meeting URL or calendar event.
  2. Agent identity selected (invited Google Workspace user vs guest display name).
  3. Lobby/admission state reported separately from "joined" state.
  4. Once admitted, realtime bridge owns the meeting session.
  5. Health reports actual audio movement, not just provider/browser readiness:
    • input bytes/levels from Meet into realtime
    • output bytes/levels from realtime back into Meet
    • transcript/live captions if listen-only mode is enabled
  6. If audio bridge is unavailable, the command should fail clearly with remediation steps rather than appearing joined-but-deaf/mute.

Recommendations

Product direction

Please add/support a headless Meet agent mode that does not depend on visible browser automation, physical speakers, or a hand-built BlackHole/SoX route when possible.

Minimum useful target:

  • Join meeting as an agent identity
  • Listen/transcribe reliably
  • Produce meeting summary/action items afterward

Full target:

  • Bidirectional audio using realtime voice
  • Wake/attention phrase support (e.g. "hey Lando")
  • Ability to answer meeting questions using the agent's OpenClaw context

Identity / meeting-entry model

Please document and support the expected way an agent gets into a meeting:

  • Is each agent expected to have a Google Workspace mailbox and calendar invite?
  • Can an agent join as a guest display name only?
  • How should lobby admission be surfaced?
  • Should calendar invites to the agent email automatically create a joinable session?
  • What permissions/OAuth scopes are required for invited-user mode?

Audio / health model

Please separate these states in status output:

  • browser tab opened
  • lobby waiting
  • admitted to meeting
  • realtime provider connected
  • audio input active from meeting
  • audio output active to meeting
  • transcript/caption stream active

Provider connected + browser present is not enough; status should show whether the bridge is actually hearing and speaking.

Failure handling

If Chrome transport requires a virtual audio device today, please provide:

  • A supported setup guide for macOS
  • Required BlackHole device/channel configuration
  • Required system input/output settings
  • Required Google Meet mic/speaker settings
  • A built-in preflight check that confirms both directions before joining or before declaring success

Even better, if there is a way to capture/render Meet audio without OS-level virtual devices, that should be the default.

User impact

Right now it is easy to mistake visual/lobby presence for a working agent participant. The practical result is an agent that appears to have joined but is deaf/mute. For an executive assistant use case, this is worse than failing early because the human expects the assistant to be present in the meeting.

A robust headless/listen-first Meet integration would be a major improvement over passive meeting note tools because OpenClaw agents can bring project memory, calendar/email context, and interactive follow-up into the meeting.

extent analysis

TL;DR

Implement a headless Google Meet agent mode to enable reliable, inspectable audio participation without relying on visible browser automation or physical speakers.

Guidance

  • Investigate using a virtual audio device setup guide for macOS to configure BlackHole and system input/output settings for a working audio bridge.
  • Separate status output states to clearly indicate lobby waiting, admitted to meeting, realtime provider connected, and audio input/output activity.
  • Develop a preflight check to confirm bidirectional audio before joining or declaring success.
  • Consider alternative audio capture/rendering methods that do not require OS-level virtual devices.

Example

No code snippet is provided due to the complexity of the issue and the need for a high-level solution.

Notes

The current implementation relies on brittle browser automation and audio routing, which can lead to a "deaf/mute" agent presence. A headless agent mode with listen/transcribe capabilities would significantly improve the user experience.

Recommendation

Apply a workaround by implementing a headless Meet agent mode with a focus on reliable audio participation, as this will provide a more robust solution than the current implementation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A successful Meet integration should have an explicit, inspectable lifecycle such as:

  1. Join requested for a meeting URL or calendar event.
  2. Agent identity selected (invited Google Workspace user vs guest display name).
  3. Lobby/admission state reported separately from "joined" state.
  4. Once admitted, realtime bridge owns the meeting session.
  5. Health reports actual audio movement, not just provider/browser readiness:
    • input bytes/levels from Meet into realtime
    • output bytes/levels from realtime back into Meet
    • transcript/live captions if listen-only mode is enabled
  6. If audio bridge is unavailable, the command should fail clearly with remediation steps rather than appearing joined-but-deaf/mute.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Google Meet: support reliable headless agent join with audio/transcription health checks [1 participants]