openclaw - 💡(How to fix) Fix feat(realtime): support OpenAI Realtime GA (gpt-realtime-2) — needs new session schema + GA-aware transport [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80196Fetched 2026-05-11 03:17:48
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
2
Author
Timeline (top)
mentioned ×3subscribed ×3closed ×1commented ×1

OpenClaw's voice-call / realtime plugin is hardcoded to OpenAI's beta Realtime protocol (OpenAI-Beta: realtime=v1 header + beta session schema). OpenAI shipped the GA Realtime API around May 2026 with gpt-realtime-2, which is the recommended production model for agentic real-time voice (tool calling, reasoning, lowest latency). It cannot be used with OpenClaw today without local source patches.

This blocks the cleanest path for production agent voice — phone calls (Twilio bridge), Google Meet, and Talk mode — on the model OpenAI explicitly built for agents.

Root Cause

gpt-realtime-2 (GA) is OpenAI's purpose-built agentic real-time model:

  • Native tool/function calling in the realtime loop
  • Reasoning during voice turns
  • Lowest latency of any agentic voice model currently available
  • Production-stable (out of beta)

For anyone building agent-driven phone/Meet/voice infra on OpenClaw (which is a major use case — see existing issues #71195, #72891, #73019, #76952), the right model is gpt-realtime-2. Today we have to fork the plugin or build a standalone bridge to use it.

Fix Action

Fix / Workaround

OpenClaw's voice-call / realtime plugin is hardcoded to OpenAI's beta Realtime protocol (OpenAI-Beta: realtime=v1 header + beta session schema). OpenAI shipped the GA Realtime API around May 2026 with gpt-realtime-2, which is the recommended production model for agentic real-time voice (tool calling, reasoning, lowest latency). It cannot be used with OpenClaw today without local source patches.

Patching the file to drop the beta header fixes Gate 1; Gate 2 remains.

Workaround we're using today

Code Example

Missing required parameter: 'session.type'

---

"realtime": {
     "providers": {
       "openai": {
         "model": "gpt-realtime-2",
         "api": "ga"
       }
     }
   }
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw's voice-call / realtime plugin is hardcoded to OpenAI's beta Realtime protocol (OpenAI-Beta: realtime=v1 header + beta session schema). OpenAI shipped the GA Realtime API around May 2026 with gpt-realtime-2, which is the recommended production model for agentic real-time voice (tool calling, reasoning, lowest latency). It cannot be used with OpenClaw today without local source patches.

This blocks the cleanest path for production agent voice — phone calls (Twilio bridge), Google Meet, and Talk mode — on the model OpenAI explicitly built for agents.

Current behavior (OpenClaw 2026.5.7, verified 2026-05-09)

/opt/homebrew/lib/node_modules/openclaw/dist/realtime-voice-provider-Bs3Q4Qlt.js sends:

  1. WebSocket connection header: OpenAI-Beta: realtime=v1
  2. Session payload in the beta schema (no session.type, beta-only field shape)

When pointed at gpt-realtime-2 (or any GA model), the server returns errors. There are two distinct gates:

Gate 1 — Header (surmountable)

Removing OpenAI-Beta: realtime=v1 correctly routes to the GA endpoint. Verified — the GA "this header is for beta" rejection disappears.

Gate 2 — Schema (the real blocker)

GA requires session.type (and other GA-only fields) in the session.update payload:

Missing required parameter: 'session.type'

This is a plugin-code change, not a header swap. The plugin emits the legacy beta session shape and gets rejected by the GA endpoint.

Why this matters

gpt-realtime-2 (GA) is OpenAI's purpose-built agentic real-time model:

  • Native tool/function calling in the realtime loop
  • Reasoning during voice turns
  • Lowest latency of any agentic voice model currently available
  • Production-stable (out of beta)

For anyone building agent-driven phone/Meet/voice infra on OpenClaw (which is a major use case — see existing issues #71195, #72891, #73019, #76952), the right model is gpt-realtime-2. Today we have to fork the plugin or build a standalone bridge to use it.

Reproduction

  1. OpenClaw 2026.5.7 fresh install
  2. Configure voice-call plugin with realtime.providers.openai.model: gpt-realtime-2
  3. Initiate a call (openclaw voicecall smoke --mode conversation --yes or any realtime trigger)
  4. WebSocket opens to OpenAI but fails on session.update with Missing required parameter: 'session.type'

Patching the file to drop the beta header fixes Gate 1; Gate 2 remains.

Requested change

Add GA Realtime protocol support alongside the existing beta path:

  1. Conditional header: Don't send OpenAI-Beta: realtime=v1 when targeting GA models (gpt-realtime-2 and successors).
  2. GA session schema: Emit session.type and any other GA-required fields when realtime.api: "ga" (or auto-detected from model id).
  3. Config switch: Allow per-provider config to select beta vs GA, e.g.:
    "realtime": {
      "providers": {
        "openai": {
          "model": "gpt-realtime-2",
          "api": "ga"
        }
      }
    }
  4. Default for new GA model ids: auto-route gpt-realtime-* (anything not the legacy beta gpt-4o-realtime-preview-*) through the GA path.

Scope

  • Affects: realtime-voice-provider-*.js (WebSocket transport + session payload)
  • Doesn't affect: voice-call plugin's Twilio bridge layer, Google Meet integration, or any non-OpenAI realtime provider (Gemini Live, etc.)

Workaround we're using today

For gpt-realtime-2 we either:

  • (a) Patch the plugin manually (brittle — gets overwritten on every npm update -g openclaw)
  • (b) Fall back to gpt-realtime-1.5 on the beta path (works, but not the model we want for production)
  • (c) Bypass OpenClaw's voice plugin entirely and run a standalone Python bridge against OpenAI Realtime GA (clean but decouples from OpenClaw)

Option (c) is what we'll likely do for our agentic voice infra in the meantime, but native OpenClaw support is the right long-term answer — especially because so much of OpenClaw's voice value (Twilio bridge, Meet integration, Talk mode, transcription, agent routing) sits on top of the realtime transport.

Related issues

  • #71195 — OpenAI Realtime path for Talk Mode (this proposal would unblock the GA model for that work)
  • #72891 — Gemini Live in Discord voice (parallel real-time provider work)
  • #73019 — xAI Realtime via OpenAI-protocol adapter (also benefits from a clean GA-aware adapter layer)
  • #76952 — Realtime Talk docs/UX

Happy to contribute a PR if it'd help — the change is fairly localized to the realtime-voice-provider transport layer.


Environment

  • OpenClaw: 2026.5.7
  • Platform: macOS Darwin 25.4.0 arm64 (Mac Studio M4 Max)
  • Node: v22.22.0
  • Verified file: /opt/homebrew/lib/node_modules/openclaw/dist/realtime-voice-provider-Bs3Q4Qlt.js

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING