openclaw - ✅(Solved) Fix Proposal: Add xAI Realtime Voice Agent support to /voiceclaw/realtime via shared OpenAI-Realtime-protocol adapter [1 pull requests, 1 participants]

openclaw2026-04-27 20:22:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#73019•Fetched 2026-04-28 06:28:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

matthewtran172

Participants

matthewtran172

Timeline (top)

cross-referenced ×1referenced ×1

Add support for the xAI Realtime Voice Agent (grok-voice-think-fast-1.0) to OpenClaw's /voiceclaw/realtime gateway surface, implemented as a shared OpenAIRealtimeProtocolAdapter that can also serve OpenAI Realtime (issue #71195) when added.

xAI's Voice Agent API is OpenAI-Realtime-protocol-compatible (verbatim from xAI docs: "compatible with the OpenAI Realtime API. Most OpenAI client libraries and SDKs work with the xAI endpoint by changing the base URL"), so a single protocol adapter can serve both providers via parameterization.

Error Message

| error | error | 1:1 with sanitization — never reflect API key |

Error sanitization — xAI returns error reflecting key; verify sanitized error never contains key.
Voice selection — verify session.config.voice overrides operator-config default; verify all five xAI voices (eve, ara, rex, sal, leo) are accepted; verify invalid voice IDs are rejected with sanitized error.

Root Cause

xAI Realtime is GA with grok-voice-think-fast-1.0 as the flagship "think fast" voice model. The Voice Agent API is documented at https://docs.x.ai/developers/model-capabilities/audio/voice-agent and uses the OpenAI-Realtime protocol with minor wire deltas.
OpenClaw already has the right surface — /voiceclaw/realtime (added in #70938), the VoiceClawRealtimeAdapter interface, and the provider field in VoiceClawSessionConfigEvent types — but the only wired adapter today is gemini-live.ts.
Shared work with #71195 — issue #71195 requests OpenAI Realtime in /voiceclaw/realtime. Because xAI's API is OpenAI-Realtime-protocol-compatible, a shared OpenAIRealtimeProtocolAdapter base class would address both providers with one body of code, parameterized by URL / model defaults / voices / a small event-name override table.
Community signal — issue #12911 also requests Grok voice integration (in the web chat / Live Voice Mode). This proposal addresses the gateway-level capability that would unblock that and similar use cases.
Real adopter — a downstream private-AI-operator project is targeting /voiceclaw/realtime with provider: "xai" as its primary realtime voice route, preserving the credential boundary by keeping the xAI key inside OpenClaw's gateway environment. Downstream adopters are interested in driving this if maintainers are open to the design.

Fix Action

Fixed

Fixed by PR: feat(voiceclaw-realtime): add xAI Realtime Voice Agent provider (closes #73019) (https://github.com/openclaw/openclaw/pull/73032)

PR fix notes

PR #73032: feat(voiceclaw-realtime): add xAI Realtime Voice Agent provider (closes #73019)

Repository: openclaw/openclaw
Author: matthewtran172
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/73032

Description (problem / solution / changelog)

Tracks #73019.

This is a draft PR opening the design proposed in #73019 for review. Adds support for xAI's Voice Agent API (grok-voice-think-fast-1.0) as a /voiceclaw/realtime provider — the same surface that ships gemini-live.ts today. xAI's Realtime API is documented as OpenAI-Realtime-protocol-compatible (xAI docs); this PR ships the xAI specialization first.

Surface scope

This PR targets /voiceclaw/realtime (gateway endpoint, VoiceClawRealtimeAdapter interface) — the surface for external WebSocket clients. It does not target the separate talk-realtime-relay.ts / src/realtime-voice/ provider-plugin surface used by OpenClaw's browser Talk UI, macOS Talk app, voice-call telephony, and Google Meet integration. Those use a different interface (RealtimeVoiceProviderPlugin) and are wired via api.registerRealtimeVoiceProvider(...) in extension index.ts files (e.g., extensions/openai/realtime-voice-provider.ts).

Related but not addressed by this PR:

#71195 — adds OpenAI Realtime to native macOS Talk via the provider-plugin surface.
#12911 — adds Grok voice to /chat web Talk via the provider-plugin surface (would benefit from a parallel extensions/xai/realtime-voice-provider.ts follow-up; happy to scope that separately if maintainers want it).

If maintainers prefer a unified shape across surfaces, I'm open to either (a) extracting a shared OpenAI-Realtime-protocol base used by both xai-realtime.ts here AND a future extensions/xai/realtime-voice-provider.ts, or (b) keeping the two surfaces independent as today.

What this PR does

Adds provider: "xai" as a /voiceclaw/realtime option in the existing VoiceClawSessionConfigEvent.provider union.
Adds src/gateway/voiceclaw-realtime/xai-realtime.ts implementing VoiceClawRealtimeAdapter against the OpenAI-Realtime wire protocol with xAI's documented deltas:
- response.text.delta (xAI) instead of response.output_text.delta.
- Single conversation.item.input_audio_transcription.completed user-transcription event; the adapter synthesizes OpenClaw's transcript.delta + transcript.done pair.
Extends session.ts with provider-aware dispatch:
- resolveProvider(config.provider) — returns "xai" or "gemini" (back-compat default).
- defaultVoiceFor(provider) — "ara" for xAI, "Zephyr" for Gemini.
- requiredApiKeyEnvFor(provider) — XAI_API_KEY or GEMINI_API_KEY.
- createDefaultAdapterFactory() — instantiates VoiceClawXaiRealtimeAdapter or VoiceClawGeminiLiveAdapter based on the resolved provider.
Default voice for xAI is ara; all five xAI voices (eve, ara, rex, sal, leo) are supported with case-insensitive resolution and per-session override.
Updates docs/gateway/index.md (provider option) and docs/providers/xai.md (realtime row in the feature-coverage table; cross-link from "Known limits").

What this PR does not do

Does not modify gemini-live.ts (Gemini behavior preserved byte-for-byte).
Does not modify extensions/voice-call/, extensions/openai/realtime-voice-provider.ts, or extensions/xai/ batch/streaming-STT files (separate surfaces).
Does not change gateway auth, sandboxing, or the WS control protocol.
Does not add any new HTTP endpoint.
Does not add OpenAI Realtime support to /voiceclaw/realtime (the type union already accepts "openai"; the adapter is intentionally a separate, smaller PR).
Does not add an xAI RealtimeVoiceProviderPlugin for the provider-plugin surface (surface B). Happy to follow up if maintainers want it.

Security

XAI_API_KEY is read from the gateway process environment at upstream-open time; never logged, never reflected in client errors, never persisted on the adapter instance.
Sanitization defends against Bearer ... token reflection and xai-... key reflection in upstream error messages.
Test asserts XAI_API_KEY value never appears in log lines or in JSON.stringify(adapter).

Tests

Mocked WebSocket only; no live xAI calls in CI; no XAI_API_KEY value required to run tests (a fake TEST_KEY_NOT_REAL is used in tests where adapter construction is exercised).
Coverage:
- Voice helpers: default = ara, all five voices accepted case-insensitively, unknown voice falls back to ara, isValidXaiVoice correctness.
- Audio passthrough: response.output_audio.delta and OpenAI-style response.audio.delta both forward as OpenClaw audio.delta.
- Transcript: xAI response.text.delta → OpenClaw assistant transcript.delta. Single user-transcription completed event → synthesized turn.started + transcript.delta + transcript.done.
- Barge-in: input_audio_buffer.speech_started mid-assistant-response → finalizes pending assistant text with ellipsis + emits turn.started (user role).
- Function calls: response.function_call_arguments.done → tool.call. Tool result returned via conversation.item.create (with function_call_output) + response.create.
- Audio frame forwarding: audio.append → input_audio_buffer.append.
- Turn end: response.done → flush transcripts + turn.ended.
- Usage metrics: response.usage translated to OpenClaw usage.metrics.
- Error sanitization: xAI error reflecting Bearer xai-supersecretvalue123 → message contains Bearer ***, never the raw value.
- Unknown event types ignored without throwing.
- session.update payload includes resolved voice (ara by default), modalities ["text", "audio"], pcm16 audio formats, server_vad turn detection.
- Function tools serialized into session.update when registered.
- Adapter constructible without XAI_API_KEY set (key is only required at openUpstream).
- Adapter factory dispatches by provider: "xai" → VoiceClawXaiRealtimeAdapter; "gemini" and undefined → VoiceClawGeminiLiveAdapter.

Local gates run on this branch

pnpm exec vitest run src/gateway/voiceclaw-realtime/   →  120 tests passed (15 files)
pnpm exec oxlint src/gateway/voiceclaw-realtime/       →  0 warnings, 0 errors
pnpm tsgo:core                                          →  clean
pnpm tsgo:core:test                                     →  clean

No live calls, no openclaw onboard, no installs/upgrades beyond pnpm install --frozen-lockfile.

SAGE is not changed

This PR is OpenClaw upstream only. The downstream SAGE project mentioned in #73019 will integrate against this provider once it ships; no SAGE code is included here.

Design notes for review

I'd appreciate maintainer input on the Questions for maintainers section in #73019, particularly:

Single-shared-base vs separate-adapters preference.
Ship order — happy to bundle OpenAI Realtime on /voiceclaw/realtime in this PR if that's preferred, or keep PR small as proposed.
Plugin-SDK extension vs core gateway code (the gemini-live.ts precedent suggests core; happy to move).
Whether to also follow up with an extensions/xai/realtime-voice-provider.ts for the surface-B plugin path (would close #12911).

The adapter avoids modifying gemini-live.ts to keep the Gemini path byte-stable. If maintainers prefer a shared base in this PR, I can refactor.

🤖 Generated with Claude Code

Changed files

docs/gateway/index.md (modified, +26/-8)
docs/providers/xai.md (modified, +11/-4)
src/gateway/voiceclaw-realtime/session.test.ts (modified, +44/-1)
src/gateway/voiceclaw-realtime/session.ts (modified, +52/-8)
src/gateway/voiceclaw-realtime/types.ts (modified, +1/-1)
src/gateway/voiceclaw-realtime/xai-realtime.test.ts (added, +623/-0)
src/gateway/voiceclaw-realtime/xai-realtime.ts (added, +780/-0)

Code Example

{
  "voiceclaw-realtime": {
    "default_provider": "xai",
    "xai": {
      "model": "grok-voice-think-fast-1.0",
      "voice": "ara",
      "audio_format": "audio/pcm",
      "sample_rate_hz": 24000,
      "vad": {
        "type": "server_vad",
        "threshold": 0.5,
        "silence_duration_ms": 500
      }
    }
  }
}

---

{
  "type": "session.config",
  "provider": "xai",
  "model": "grok-voice-think-fast-1.0",
  "voice": "ara",
  "instructionsOverride": "...",
  "conversationHistory": [...],
  "watchdog": "enabled"
}

RAW_BUFFERClick to expand / collapse

Summary

Motivation

xAI Realtime is GA with grok-voice-think-fast-1.0 as the flagship "think fast" voice model. The Voice Agent API is documented at https://docs.x.ai/developers/model-capabilities/audio/voice-agent and uses the OpenAI-Realtime protocol with minor wire deltas.
OpenClaw already has the right surface — /voiceclaw/realtime (added in #70938), the VoiceClawRealtimeAdapter interface, and the provider field in VoiceClawSessionConfigEvent types — but the only wired adapter today is gemini-live.ts.
Shared work with #71195 — issue #71195 requests OpenAI Realtime in /voiceclaw/realtime. Because xAI's API is OpenAI-Realtime-protocol-compatible, a shared OpenAIRealtimeProtocolAdapter base class would address both providers with one body of code, parameterized by URL / model defaults / voices / a small event-name override table.
Community signal — issue #12911 also requests Grok voice integration (in the web chat / Live Voice Mode). This proposal addresses the gateway-level capability that would unblock that and similar use cases.
Real adopter — a downstream private-AI-operator project is targeting /voiceclaw/realtime with provider: "xai" as its primary realtime voice route, preserving the credential boundary by keeping the xAI key inside OpenClaw's gateway environment. Downstream adopters are interested in driving this if maintainers are open to the design.

If maintainers prefer a smaller first PR, the initial implementation can target xAI only, with the shared adapter structured so OpenAI Realtime support can follow without duplicating protocol code.

Proposed design

New files

src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts — shared OpenAIRealtimeProtocolAdapter base class implementing VoiceClawRealtimeAdapter against the OpenAI-Realtime wire protocol. Parameterized by:
- Default WebSocket base URL
- Default model ID
- Available voices
- Event-name override table (xAI's response.text.delta vs OpenAI's response.output_text.delta; xAI's user-transcription event shape vs OpenAI's split events)
src/gateway/voiceclaw-realtime/xai-realtime.ts — VoiceClawXaiRealtimeAdapter extends the shared base with xAI-specific defaults:
- Base URL: wss://api.x.ai/v1/realtime?model=<model-id>
- Default model: grok-voice-think-fast-1.0
- Available voices: eve, ara, rex, sal, leo (all five supported; per-session override via session.config.voice)
- Event-name overrides per the xAI docs

Modified files

src/gateway/voiceclaw-realtime/types.ts — add "xai" to VoiceClawSessionConfigEvent.provider union (one-line change).
src/gateway/voiceclaw-realtime/session.ts — extend the adapter factory so provider === "xai" instantiates VoiceClawXaiRealtimeAdapter.
docs/gateway/index.md — document xAI as a /voiceclaw/realtime brain provider; required env var XAI_API_KEY.
docs/providers/xai.md — note realtime full-duplex support; cross-link to gateway index.

Configuration shape

Operator config:

{
  "voiceclaw-realtime": {
    "default_provider": "xai",
    "xai": {
      "model": "grok-voice-think-fast-1.0",
      "voice": "ara",
      "audio_format": "audio/pcm",
      "sample_rate_hz": 24000,
      "vad": {
        "type": "server_vad",
        "threshold": 0.5,
        "silence_duration_ms": 500
      }
    }
  }
}

Plus env: XAI_API_KEY set in the gateway process environment. Optional follow-up: if maintainers prefer file-based secret loading, support an XAI_API_KEY_FILE pattern.

Client session.config first message (per-session voice override is supported — the example shows ara but any of the five xAI voices is valid):

{
  "type": "session.config",
  "provider": "xai",
  "model": "grok-voice-think-fast-1.0",
  "voice": "ara",
  "instructionsOverride": "...",
  "conversationHistory": [...],
  "watchdog": "enabled"
}

The adapter falls back to the operator-config default voice when session.config.voice is omitted, and falls back to a built-in adapter default (ara) when no operator config is present.

Endpoint and auth — unchanged

Path stays /voiceclaw/realtime.
Gateway auth (Bearer token in first session.config message) stays as-is.
No new HTTP endpoint added.
No change to gemini-live.ts, extensions/voice-call/, extensions/openai/realtime-voice-provider.ts, or extensions/xai/ batch/streaming-STT files.

Event mapping

OpenClaw → xAI (client → server)

OpenClaw event	xAI event	Notes
`session.config`	`session.update`	Translate fields: voice, model, instructions (from `instructionsOverride`), tools (synthesized from OpenClaw tool catalog), audio format, `turn_detection`
`audio.append` `{data: <base64 PCM16>}`	`input_audio_buffer.append` `{audio: <base64 PCM16>}`	1:1 audio frame passthrough; xAI default 24 kHz mono PCM16
`audio.commit`	`input_audio_buffer.commit`	Used only when server VAD disabled
`frame.append` (image/video)	n/a	xAI Voice Agent is audio-only at GA; drop with logged warning
`response.create`	`response.create`	1:1
`response.cancel`	(defensive cancellation handling)	xAI does not document a cancel event; best-effort impl, flagged as needs-vendor-verification
`tool.result` `{callId, output}`	`conversation.item.create` `{type: "function_call_output", call_id, output}` + `response.create`	Two-event sequence to resume generation after tool output

xAI → OpenClaw (server → client)

xAI event	OpenClaw event	Notes
`session.created` / `session.updated`	`session.ready` `{sessionId}`	Emit once on first `session.created`
`response.output_audio.delta`	`audio.delta`	1:1 audio output passthrough
`response.text.delta` (xAI rename from OpenAI's `response.output_text.delta`)	`transcript.delta` `{role: "assistant"}`	Shared base adapter handles both via configurable event-name table
`conversation.item.input_audio_transcription.completed` (xAI user transcription)	`transcript.delta` + `transcript.done` `{role: "user"}`	xAI appears to emit a single `completed` user-transcription event rather than separate delta/final transcript events. The adapter can synthesize OpenClaw's `transcript.delta` + `transcript.done` pair from that `completed` event.
`response.function_call_arguments.done`	`tool.call` `{callId, name, args}`	Adapter may send an immediate placeholder/tool-status response if supported by xAI / OpenAI-Realtime semantics; otherwise it should wait for the actual `tool.result` before sending `function_call_output` + `response.create`.
`response.done`	`turn.ended`	After assistant turn completes
`input_audio_buffer.speech_started`	`turn.started` (role=user)	Supports barge-in display in client UI
`error`	`error`	1:1 with sanitization — never reflect API key
`rate_limits.updated`	`usage.metrics`	Translate to OpenClaw shape

Audio format

Default in: audio/pcm Linear16 little-endian, 24 kHz mono (matches xAI default).
Default out: same.
Other formats supported by xAI (pcmu, pcma, lower sample rates): pass-through configurable; not used by default.

VAD / barge-in

Server-side VAD (turn_detection: {type: "server_vad", threshold, silence_duration_ms, prefix_padding_ms}) is the recommended default. xAI handles speech detection, end-of-turn, and barge-in. On barge-in mid-response, the adapter emits turn.ended (assistant role, with interrupted: true) before the next turn.started (user role).

Reconnect / resume

xAI does not document explicit resume tokens. Defensive implementation:

On WS close (codes 1001/1006/1007/1011/1012/1013): auto-reconnect up to 2 attempts with 500 ms backoff, mirroring the gemini-live.ts pattern.
On reconnect: open new WS, send fresh session.update, re-prime conversation history via conversation.item.create per historical message.
On 120-min hard cap (xAI's documented session limit): emit session.rotating → close upstream → open new upstream → emit session.rotated. Operator UX continues seamlessly.

Open question for maintainers: does OpenClaw have a preferred reconnect/resume contract for /voiceclaw/realtime adapters, or should each adapter handle it as gemini-live.ts does?

Tools / function calling

OpenClaw tool catalog is exposed to xAI as function tools in session.update. On response.function_call_arguments.done:

Adapter emits tool.call to client with callId.
Adapter may send an immediate placeholder/tool-status response if supported by xAI / OpenAI-Realtime semantics; otherwise it should wait for the actual tool.result before sending function_call_output + response.create.
Client (the OpenClaw caller) executes the tool with policy/approval gates and returns via tool.result.
Adapter sends conversation.item.create with function_call_output to xAI.
Adapter sends response.create to resume generation.

Parallel function calls (xAI may emit multiple done events before audio) are handled by tracking all callIds and resolving all before response.create.

Files likely touched

src/gateway/voiceclaw-realtime/types.ts — add "xai" to provider union
src/gateway/voiceclaw-realtime/session.ts — extend adapter factory
src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts — NEW shared base
src/gateway/voiceclaw-realtime/xai-realtime.ts — NEW xAI specialization
src/gateway/voiceclaw-realtime/openai-realtime-protocol.test.ts — NEW shared protocol tests
src/gateway/voiceclaw-realtime/xai-realtime.test.ts — NEW xAI-specific tests
src/gateway/voiceclaw-realtime/session.test.ts — add "xai" provider-selection test
docs/gateway/index.md — document xAI brain provider
docs/providers/xai.md — note realtime support

Estimated code volume: shared base ~500-800 lines; xAI specialization ~150-250 lines; types/factory updates ~30-50 lines; tests ~400-600 lines; docs ~100-200 lines.

Hard non-changes: no modification to gemini-live.ts, extensions/voice-call/, extensions/openai/realtime-voice-provider.ts, or extensions/xai/ batch/streaming-STT files. No change to gateway auth, sandboxing, or the WS control protocol. No new HTTP endpoints.

Tests

Mocked WebSocket only. No live xAI in CI. No XAI_API_KEY required to run tests.
Audio passthrough — base64 PCM16 frames pass through unmodified.
Transcript synthesis — verify adapter emits both transcript.delta AND transcript.done from xAI's single user-transcription completed event.
Function-call flow — verify tool.call emission, tool.result acceptance, conversation.item.create + response.create forwarding; parallel calls handled correctly.
Barge-in — inject input_audio_buffer.speech_started mid-response; verify turn.ended (interrupted) + turn.started (user).
Reconnect — force WS close 1006; verify retry with conversation history re-priming.
Session rotation — simulate 120-min cap; verify session.rotating / session.rotated.
Error sanitization — xAI returns error reflecting key; verify sanitized error never contains key.
Provider selection — config.provider === "xai" returns VoiceClawXaiRealtimeAdapter from session.ts factory.
Voice selection — verify session.config.voice overrides operator-config default; verify all five xAI voices (eve, ara, rex, sal, leo) are accepted; verify invalid voice IDs are rejected with sanitized error.
Secret never logged — inject mock logger; assert XAI_API_KEY value never appears in any log line.

Optional --live smoke against real xAI WSS would be operator-only / opt-in via explicit env flag, never run in CI.

Security

XAI_API_KEY stays inside the OpenClaw gateway process environment.
Never logged in any form (value, length, prefix, hash).
Never reflected in errors returned to clients.
No client-side xAI key — clients send the gateway Bearer token only, exactly as today.
No new public endpoint added.
Existing gateway auth (Bearer in first session.config message; per docs/gateway/index.md) remains unchanged.

Questions for maintainers

Surface choice — is /voiceclaw/realtime the right surface for xAI Realtime, or do maintainers prefer this go through a different path (e.g., the voice-call plugin's realtime providers, a new plugin-SDK extension)?
Adapter shape — would maintainers prefer a single shared OpenAIRealtimeProtocolAdapter base (parameterized for both xAI and OpenAI) or two provider-specific adapters with duplicated protocol logic?
Ship order — should xAI and OpenAI Realtime (issue #71195) ship in one combined PR, or xAI first as a smaller PR followed by OpenAI as a second PR? Either is workable; smaller PR is easier to review.
Code organization — should this be core gateway code (in src/gateway/voiceclaw-realtime/) or extracted via the plugin SDK? The gemini-live.ts precedent suggests core gateway code, but happy to follow maintainer preference.
Test/mocking style — any preferred WebSocket mocking pattern for voiceclaw-realtime adapters? gemini-live.test.ts is the natural reference; would happily follow the same style.
Roadmap overlap — does this overlap with planned work tied to #71195 or #12911? If maintainers are already in flight on #71195, we'd be glad to coordinate or contribute to that effort instead of opening a parallel PR.
Reconnect contract — does OpenClaw have a preferred reconnect/resume contract for /voiceclaw/realtime adapters, or is per-adapter best-effort (like gemini-live.ts) the expected pattern?

References

xAI Voice Agent docs: https://docs.x.ai/developers/model-capabilities/audio/voice-agent
xAI Voice Agent API model/pricing: https://docs.x.ai/developers/models/voice-agent-api
xAI release notes: https://docs.x.ai/developers/release-notes
xAI flagship voice model announcement: https://x.ai/news/grok-voice-think-fast-1
xAI Voice Agent API launch: https://x.ai/news/grok-voice-agent-api
xAI cookbook (voice examples): https://github.com/xai-org/xai-cookbook/tree/main/voice-examples
OpenClaw /voiceclaw/realtime types: src/gateway/voiceclaw-realtime/types.ts
OpenClaw /voiceclaw/realtime session: src/gateway/voiceclaw-realtime/session.ts
OpenClaw Gemini adapter (reference shape): src/gateway/voiceclaw-realtime/gemini-live.ts
OpenClaw gateway docs: https://docs.openclaw.ai/gateway
OpenClaw xAI provider: https://docs.openclaw.ai/providers/xai
OpenClaw issue #71195 — OpenAI Realtime in Talk Mode / /voiceclaw/realtime
OpenClaw issue #12911 — Live Voice Mode in /chat UI (Grok Voice Loop)
OpenClaw PR #70938 — origin of /voiceclaw/realtime

What we're requesting

Maintainer feedback on the design before opening a draft PR. Downstream adopters are prepared to drive the implementation but want to align on surface, adapter shape, and ship order before writing code so we don't waste review cycles.

Filed by an OpenClaw downstream adopter. Happy to attend a sync, drop in on a community call, or coordinate with anyone already working on #71195. Please flag any roadmap or design context we may be missing.

extent analysis

TL;DR

Implement a shared OpenAIRealtimeProtocolAdapter base class to support both xAI and OpenAI Realtime protocols in the OpenClaw /voiceclaw/realtime gateway surface.

Guidance

Create a new file src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts for the shared OpenAIRealtimeProtocolAdapter base class.
Extend the adapter factory in src/gateway/voiceclaw-realtime/session.ts to instantiate VoiceClawXaiRealtimeAdapter when provider === "xai".
Add xAI-specific configuration to src/gateway/voiceclaw-realtime/types.ts and docs/gateway/index.md.
Implement event mapping between OpenClaw and xAI events, including audio passthrough, transcript synthesis, and function-call flow.
Write tests for the new adapter, including mocked WebSocket tests and audio passthrough verification.

Example

// src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts
export class OpenAIRealtimeProtocolAdapter {
  // ...
}

// src/gateway/voiceclaw-realtime/xai-realtime.ts
export class VoiceClawXaiRealtimeAdapter extends OpenAIRealtimeProtocolAdapter {
  // xAI-specific defaults and event-name overrides
}

Notes

The implementation should follow the proposed design and event mapping outlined in the issue body.
The shared adapter base class should be parameterized to support both xAI and OpenAI Realtime protocols.
The xAI-specific adapter should extend the shared base class and provide xAI-specific defaults and event-name overrides.

Recommendation

Apply the proposed design and implementation to support xAI Realtime in the OpenClaw /voiceclaw/realtime gateway surface, with the option to add OpenAI

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #conversation history #tool integration #LLM response #prompt template

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Proposal: Add xAI Realtime Voice Agent support to /voiceclaw/realtime via shared OpenAI-Realtime-protocol adapter [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #73032: feat(voiceclaw-realtime): add xAI Realtime Voice Agent provider (closes #73019)

Description (problem / solution / changelog)

Surface scope

What this PR does

What this PR does not do

Security

Tests

Local gates run on this branch

SAGE is not changed

Design notes for review

Changed files

Code Example

Summary

Motivation

Proposed design

New files

Modified files

Configuration shape

Endpoint and auth — unchanged

Event mapping

OpenClaw → xAI (client → server)

xAI → OpenClaw (server → client)

Audio format

VAD / barge-in

Reconnect / resume

Tools / function calling

Files likely touched

Tests

Security

Questions for maintainers

References

What we're requesting

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING