openclaw - 💡(How to fix) Fix [Feature]: Support Gemini Live / realtime audio providers for Discord voice channels [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72891Fetched 2026-04-28 06:30:46
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Add support for using Gemini Live / realtime audio providers as the backend for Discord voice channel conversations.

Today, Discord voice appears to use a text-mediated pipeline:

Discord voice audio -> STT -> text LLM response via channels.discord.voice.model -> TTS -> Discord voice playback

The new channels.discord.voice.model config is useful for choosing the text response model, but it does not appear to select an end-to-end realtime audio backend like Gemini Live.

Root Cause

Add support for using Gemini Live / realtime audio providers as the backend for Discord voice channel conversations.

Today, Discord voice appears to use a text-mediated pipeline:

Discord voice audio -> STT -> text LLM response via channels.discord.voice.model -> TTS -> Discord voice playback

The new channels.discord.voice.model config is useful for choosing the text response model, but it does not appear to select an end-to-end realtime audio backend like Gemini Live.

Code Example

Discord voice audio -> STT -> text LLM response via channels.discord.voice.model -> TTS -> Discord voice playback

---

{
  "channels": {
    "discord": {
      "voice": {
        "mode": "realtime",
        "realtime": {
          "provider": "google",
          "model": "gemini-2.5-flash-native-audio-preview"
        }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Add support for using Gemini Live / realtime audio providers as the backend for Discord voice channel conversations.

Today, Discord voice appears to use a text-mediated pipeline:

Discord voice audio -> STT -> text LLM response via channels.discord.voice.model -> TTS -> Discord voice playback

The new channels.discord.voice.model config is useful for choosing the text response model, but it does not appear to select an end-to-end realtime audio backend like Gemini Live.

Motivation

OpenClaw now has realtime voice support in adjacent surfaces:

  • Voice Call / realtime mode
  • Google Meet audio bridges
  • Talk / realtime voice loops
  • Gemini Live as a realtime voice provider for backend Voice Call and Google Meet audio bridges

It would be useful for Discord voice channels to be able to use the same realtime audio provider path, rather than always going through separate STT -> text LLM -> TTS stages.

This could reduce latency and make Discord voice channel interaction feel closer to native realtime voice systems.

Proposed behavior

Provide a Discord voice configuration path that can select a realtime audio provider/session, for example Gemini Live, instead of only selecting the text LLM used after STT.

The exact config shape may differ, but conceptually this would be separate from channels.discord.voice.model, which should remain the text LLM override for the current pipeline.

Potential shape:

{
  "channels": {
    "discord": {
      "voice": {
        "mode": "realtime",
        "realtime": {
          "provider": "google",
          "model": "gemini-2.5-flash-native-audio-preview"
        }
      }
    }
  }
}

Current understanding

From the current schema, channels.discord.voice includes:

  • enabled
  • model
  • autoJoin
  • DAVE encryption settings
  • tts

The model field is documented as an optional LLM override for Discord voice channel responses, e.g. openai/gpt-5.4-mini, while STT and TTS remain on their existing media settings.

That suggests Gemini Live-style realtime audio is not currently configurable for Discord voice channels.

Related issues

Related, but not exact duplicates:

  • #60093: voice-call plugin support for Google Gemini Live as an end-to-end audio provider
  • #45561: native Gemini Live API support
  • #7200: real-time voice conversation support
  • #71262: voice-call realtime mode should expose gateway agent tools

Those issues cover Gemini Live / realtime voice generally, especially Voice Call. This request is specifically for Discord voice channels.

Open questions

  • Should Discord voice realtime mode reuse the same realtime voice bridge/session runtime as Voice Call and Google Meet?
  • How should Discord voice realtime sessions preserve existing Discord channel context, permissions, and agent routing?
  • Should realtime mode still emit transcripts for logs, moderation, memory, and thread context when the provider is audio-native?
  • How should tool calls and agent consults be bridged so Discord realtime voice does not lose the normal OpenClaw agent capabilities?

extent analysis

TL;DR

To add support for Gemini Live as a realtime audio provider for Discord voice channels, introduce a new configuration option to select a realtime audio backend.

Guidance

  • Introduce a new configuration field, e.g., realtimeProvider, under channels.discord.voice to allow selecting a realtime audio provider like Gemini Live.
  • Consider reusing the same realtime voice bridge/session runtime as Voice Call and Google Meet for Discord voice realtime mode.
  • Investigate how to preserve existing Discord channel context, permissions, and agent routing in Discord voice realtime sessions.
  • Determine whether realtime mode should still emit transcripts for logs, moderation, memory, and thread context when using an audio-native provider.

Example

{
  "channels": {
    "discord": {
      "voice": {
        "mode": "realtime",
        "realtimeProvider": "gemini-live"
      }
    }
  }
}

Notes

The implementation details may vary depending on the specific requirements and constraints of the Discord voice channel integration. The proposed configuration shape and field names are suggestions and may need to be adjusted.

Recommendation

Apply a workaround by introducing a new configuration option to select a realtime audio provider, as this will allow for more flexibility and easier maintenance in the long run.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING