openclaw - ✅(Solved) Fix Proposal: Add xAI Realtime Voice Agent support to /voiceclaw/realtime via shared OpenAI-Realtime-protocol adapter [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73019Fetched 2026-04-28 06:28:32
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Add support for the xAI Realtime Voice Agent (grok-voice-think-fast-1.0) to OpenClaw's /voiceclaw/realtime gateway surface, implemented as a shared OpenAIRealtimeProtocolAdapter that can also serve OpenAI Realtime (issue #71195) when added.

xAI's Voice Agent API is OpenAI-Realtime-protocol-compatible (verbatim from xAI docs: "compatible with the OpenAI Realtime API. Most OpenAI client libraries and SDKs work with the xAI endpoint by changing the base URL"), so a single protocol adapter can serve both providers via parameterization.

Error Message

| error | error | 1:1 with sanitization — never reflect API key |

  • Error sanitization — xAI returns error reflecting key; verify sanitized error never contains key.
  • Voice selection — verify session.config.voice overrides operator-config default; verify all five xAI voices (eve, ara, rex, sal, leo) are accepted; verify invalid voice IDs are rejected with sanitized error.

Root Cause

  • xAI Realtime is GA with grok-voice-think-fast-1.0 as the flagship "think fast" voice model. The Voice Agent API is documented at https://docs.x.ai/developers/model-capabilities/audio/voice-agent and uses the OpenAI-Realtime protocol with minor wire deltas.
  • OpenClaw already has the right surface/voiceclaw/realtime (added in #70938), the VoiceClawRealtimeAdapter interface, and the provider field in VoiceClawSessionConfigEvent types — but the only wired adapter today is gemini-live.ts.
  • Shared work with #71195 — issue #71195 requests OpenAI Realtime in /voiceclaw/realtime. Because xAI's API is OpenAI-Realtime-protocol-compatible, a shared OpenAIRealtimeProtocolAdapter base class would address both providers with one body of code, parameterized by URL / model defaults / voices / a small event-name override table.
  • Community signal — issue #12911 also requests Grok voice integration (in the web chat / Live Voice Mode). This proposal addresses the gateway-level capability that would unblock that and similar use cases.
  • Real adopter — a downstream private-AI-operator project is targeting /voiceclaw/realtime with provider: "xai" as its primary realtime voice route, preserving the credential boundary by keeping the xAI key inside OpenClaw's gateway environment. Downstream adopters are interested in driving this if maintainers are open to the design.

Fix Action

Fixed

PR fix notes

PR #73032: feat(voiceclaw-realtime): add xAI Realtime Voice Agent provider (closes #73019)

Description (problem / solution / changelog)

Tracks #73019.

This is a draft PR opening the design proposed in #73019 for review. Adds support for xAI's Voice Agent API (grok-voice-think-fast-1.0) as a /voiceclaw/realtime provider — the same surface that ships gemini-live.ts today. xAI's Realtime API is documented as OpenAI-Realtime-protocol-compatible (xAI docs); this PR ships the xAI specialization first.

Surface scope

This PR targets /voiceclaw/realtime (gateway endpoint, VoiceClawRealtimeAdapter interface) — the surface for external WebSocket clients. It does not target the separate talk-realtime-relay.ts / src/realtime-voice/ provider-plugin surface used by OpenClaw's browser Talk UI, macOS Talk app, voice-call telephony, and Google Meet integration. Those use a different interface (RealtimeVoiceProviderPlugin) and are wired via api.registerRealtimeVoiceProvider(...) in extension index.ts files (e.g., extensions/openai/realtime-voice-provider.ts).

Related but not addressed by this PR:

  • #71195 — adds OpenAI Realtime to native macOS Talk via the provider-plugin surface.
  • #12911 — adds Grok voice to /chat web Talk via the provider-plugin surface (would benefit from a parallel extensions/xai/realtime-voice-provider.ts follow-up; happy to scope that separately if maintainers want it).

If maintainers prefer a unified shape across surfaces, I'm open to either (a) extracting a shared OpenAI-Realtime-protocol base used by both xai-realtime.ts here AND a future extensions/xai/realtime-voice-provider.ts, or (b) keeping the two surfaces independent as today.

What this PR does

  • Adds provider: "xai" as a /voiceclaw/realtime option in the existing VoiceClawSessionConfigEvent.provider union.
  • Adds src/gateway/voiceclaw-realtime/xai-realtime.ts implementing VoiceClawRealtimeAdapter against the OpenAI-Realtime wire protocol with xAI's documented deltas:
    • response.text.delta (xAI) instead of response.output_text.delta.
    • Single conversation.item.input_audio_transcription.completed user-transcription event; the adapter synthesizes OpenClaw's transcript.delta + transcript.done pair.
  • Extends session.ts with provider-aware dispatch:
    • resolveProvider(config.provider) — returns "xai" or "gemini" (back-compat default).
    • defaultVoiceFor(provider)"ara" for xAI, "Zephyr" for Gemini.
    • requiredApiKeyEnvFor(provider)XAI_API_KEY or GEMINI_API_KEY.
    • createDefaultAdapterFactory() — instantiates VoiceClawXaiRealtimeAdapter or VoiceClawGeminiLiveAdapter based on the resolved provider.
  • Default voice for xAI is ara; all five xAI voices (eve, ara, rex, sal, leo) are supported with case-insensitive resolution and per-session override.
  • Updates docs/gateway/index.md (provider option) and docs/providers/xai.md (realtime row in the feature-coverage table; cross-link from "Known limits").

What this PR does not do

  • Does not modify gemini-live.ts (Gemini behavior preserved byte-for-byte).
  • Does not modify extensions/voice-call/, extensions/openai/realtime-voice-provider.ts, or extensions/xai/ batch/streaming-STT files (separate surfaces).
  • Does not change gateway auth, sandboxing, or the WS control protocol.
  • Does not add any new HTTP endpoint.
  • Does not add OpenAI Realtime support to /voiceclaw/realtime (the type union already accepts "openai"; the adapter is intentionally a separate, smaller PR).
  • Does not add an xAI RealtimeVoiceProviderPlugin for the provider-plugin surface (surface B). Happy to follow up if maintainers want it.

Security

  • XAI_API_KEY is read from the gateway process environment at upstream-open time; never logged, never reflected in client errors, never persisted on the adapter instance.
  • Sanitization defends against Bearer ... token reflection and xai-... key reflection in upstream error messages.
  • Test asserts XAI_API_KEY value never appears in log lines or in JSON.stringify(adapter).

Tests

  • Mocked WebSocket only; no live xAI calls in CI; no XAI_API_KEY value required to run tests (a fake TEST_KEY_NOT_REAL is used in tests where adapter construction is exercised).
  • Coverage:
    • Voice helpers: default = ara, all five voices accepted case-insensitively, unknown voice falls back to ara, isValidXaiVoice correctness.
    • Audio passthrough: response.output_audio.delta and OpenAI-style response.audio.delta both forward as OpenClaw audio.delta.
    • Transcript: xAI response.text.delta → OpenClaw assistant transcript.delta. Single user-transcription completed event → synthesized turn.started + transcript.delta + transcript.done.
    • Barge-in: input_audio_buffer.speech_started mid-assistant-response → finalizes pending assistant text with ellipsis + emits turn.started (user role).
    • Function calls: response.function_call_arguments.donetool.call. Tool result returned via conversation.item.create (with function_call_output) + response.create.
    • Audio frame forwarding: audio.appendinput_audio_buffer.append.
    • Turn end: response.done → flush transcripts + turn.ended.
    • Usage metrics: response.usage translated to OpenClaw usage.metrics.
    • Error sanitization: xAI error reflecting Bearer xai-supersecretvalue123 → message contains Bearer ***, never the raw value.
    • Unknown event types ignored without throwing.
    • session.update payload includes resolved voice (ara by default), modalities ["text", "audio"], pcm16 audio formats, server_vad turn detection.
    • Function tools serialized into session.update when registered.
    • Adapter constructible without XAI_API_KEY set (key is only required at openUpstream).
    • Adapter factory dispatches by provider: "xai"VoiceClawXaiRealtimeAdapter; "gemini" and undefinedVoiceClawGeminiLiveAdapter.

Local gates run on this branch

pnpm exec vitest run src/gateway/voiceclaw-realtime/   →  120 tests passed (15 files)
pnpm exec oxlint src/gateway/voiceclaw-realtime/       →  0 warnings, 0 errors
pnpm tsgo:core                                          →  clean
pnpm tsgo:core:test                                     →  clean

No live calls, no openclaw onboard, no installs/upgrades beyond pnpm install --frozen-lockfile.

SAGE is not changed

This PR is OpenClaw upstream only. The downstream SAGE project mentioned in #73019 will integrate against this provider once it ships; no SAGE code is included here.

Design notes for review

I'd appreciate maintainer input on the Questions for maintainers section in #73019, particularly:

  • Single-shared-base vs separate-adapters preference.
  • Ship order — happy to bundle OpenAI Realtime on /voiceclaw/realtime in this PR if that's preferred, or keep PR small as proposed.
  • Plugin-SDK extension vs core gateway code (the gemini-live.ts precedent suggests core; happy to move).
  • Whether to also follow up with an extensions/xai/realtime-voice-provider.ts for the surface-B plugin path (would close #12911).

The adapter avoids modifying gemini-live.ts to keep the Gemini path byte-stable. If maintainers prefer a shared base in this PR, I can refactor.

🤖 Generated with Claude Code

Changed files

  • docs/gateway/index.md (modified, +26/-8)
  • docs/providers/xai.md (modified, +11/-4)
  • src/gateway/voiceclaw-realtime/session.test.ts (modified, +44/-1)
  • src/gateway/voiceclaw-realtime/session.ts (modified, +52/-8)
  • src/gateway/voiceclaw-realtime/types.ts (modified, +1/-1)
  • src/gateway/voiceclaw-realtime/xai-realtime.test.ts (added, +623/-0)
  • src/gateway/voiceclaw-realtime/xai-realtime.ts (added, +780/-0)

Code Example

{
  "voiceclaw-realtime": {
    "default_provider": "xai",
    "xai": {
      "model": "grok-voice-think-fast-1.0",
      "voice": "ara",
      "audio_format": "audio/pcm",
      "sample_rate_hz": 24000,
      "vad": {
        "type": "server_vad",
        "threshold": 0.5,
        "silence_duration_ms": 500
      }
    }
  }
}

---

{
  "type": "session.config",
  "provider": "xai",
  "model": "grok-voice-think-fast-1.0",
  "voice": "ara",
  "instructionsOverride": "...",
  "conversationHistory": [...],
  "watchdog": "enabled"
}
RAW_BUFFERClick to expand / collapse

Summary

Add support for the xAI Realtime Voice Agent (grok-voice-think-fast-1.0) to OpenClaw's /voiceclaw/realtime gateway surface, implemented as a shared OpenAIRealtimeProtocolAdapter that can also serve OpenAI Realtime (issue #71195) when added.

xAI's Voice Agent API is OpenAI-Realtime-protocol-compatible (verbatim from xAI docs: "compatible with the OpenAI Realtime API. Most OpenAI client libraries and SDKs work with the xAI endpoint by changing the base URL"), so a single protocol adapter can serve both providers via parameterization.

Motivation

  • xAI Realtime is GA with grok-voice-think-fast-1.0 as the flagship "think fast" voice model. The Voice Agent API is documented at https://docs.x.ai/developers/model-capabilities/audio/voice-agent and uses the OpenAI-Realtime protocol with minor wire deltas.
  • OpenClaw already has the right surface/voiceclaw/realtime (added in #70938), the VoiceClawRealtimeAdapter interface, and the provider field in VoiceClawSessionConfigEvent types — but the only wired adapter today is gemini-live.ts.
  • Shared work with #71195 — issue #71195 requests OpenAI Realtime in /voiceclaw/realtime. Because xAI's API is OpenAI-Realtime-protocol-compatible, a shared OpenAIRealtimeProtocolAdapter base class would address both providers with one body of code, parameterized by URL / model defaults / voices / a small event-name override table.
  • Community signal — issue #12911 also requests Grok voice integration (in the web chat / Live Voice Mode). This proposal addresses the gateway-level capability that would unblock that and similar use cases.
  • Real adopter — a downstream private-AI-operator project is targeting /voiceclaw/realtime with provider: "xai" as its primary realtime voice route, preserving the credential boundary by keeping the xAI key inside OpenClaw's gateway environment. Downstream adopters are interested in driving this if maintainers are open to the design.

If maintainers prefer a smaller first PR, the initial implementation can target xAI only, with the shared adapter structured so OpenAI Realtime support can follow without duplicating protocol code.

Proposed design

New files

  • src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts — shared OpenAIRealtimeProtocolAdapter base class implementing VoiceClawRealtimeAdapter against the OpenAI-Realtime wire protocol. Parameterized by:
    • Default WebSocket base URL
    • Default model ID
    • Available voices
    • Event-name override table (xAI's response.text.delta vs OpenAI's response.output_text.delta; xAI's user-transcription event shape vs OpenAI's split events)
  • src/gateway/voiceclaw-realtime/xai-realtime.tsVoiceClawXaiRealtimeAdapter extends the shared base with xAI-specific defaults:
    • Base URL: wss://api.x.ai/v1/realtime?model=<model-id>
    • Default model: grok-voice-think-fast-1.0
    • Available voices: eve, ara, rex, sal, leo (all five supported; per-session override via session.config.voice)
    • Event-name overrides per the xAI docs

Modified files

  • src/gateway/voiceclaw-realtime/types.ts — add "xai" to VoiceClawSessionConfigEvent.provider union (one-line change).
  • src/gateway/voiceclaw-realtime/session.ts — extend the adapter factory so provider === "xai" instantiates VoiceClawXaiRealtimeAdapter.
  • docs/gateway/index.md — document xAI as a /voiceclaw/realtime brain provider; required env var XAI_API_KEY.
  • docs/providers/xai.md — note realtime full-duplex support; cross-link to gateway index.

Configuration shape

Operator config:

{
  "voiceclaw-realtime": {
    "default_provider": "xai",
    "xai": {
      "model": "grok-voice-think-fast-1.0",
      "voice": "ara",
      "audio_format": "audio/pcm",
      "sample_rate_hz": 24000,
      "vad": {
        "type": "server_vad",
        "threshold": 0.5,
        "silence_duration_ms": 500
      }
    }
  }
}

Plus env: XAI_API_KEY set in the gateway process environment. Optional follow-up: if maintainers prefer file-based secret loading, support an XAI_API_KEY_FILE pattern.

Client session.config first message (per-session voice override is supported — the example shows ara but any of the five xAI voices is valid):

{
  "type": "session.config",
  "provider": "xai",
  "model": "grok-voice-think-fast-1.0",
  "voice": "ara",
  "instructionsOverride": "...",
  "conversationHistory": [...],
  "watchdog": "enabled"
}

The adapter falls back to the operator-config default voice when session.config.voice is omitted, and falls back to a built-in adapter default (ara) when no operator config is present.

Endpoint and auth — unchanged

  • Path stays /voiceclaw/realtime.
  • Gateway auth (Bearer token in first session.config message) stays as-is.
  • No new HTTP endpoint added.
  • No change to gemini-live.ts, extensions/voice-call/, extensions/openai/realtime-voice-provider.ts, or extensions/xai/ batch/streaming-STT files.

Event mapping

OpenClaw → xAI (client → server)

OpenClaw eventxAI eventNotes
session.configsession.updateTranslate fields: voice, model, instructions (from instructionsOverride), tools (synthesized from OpenClaw tool catalog), audio format, turn_detection
audio.append {data: <base64 PCM16>}input_audio_buffer.append {audio: <base64 PCM16>}1:1 audio frame passthrough; xAI default 24 kHz mono PCM16
audio.commitinput_audio_buffer.commitUsed only when server VAD disabled
frame.append (image/video)n/axAI Voice Agent is audio-only at GA; drop with logged warning
response.createresponse.create1:1
response.cancel(defensive cancellation handling)xAI does not document a cancel event; best-effort impl, flagged as needs-vendor-verification
tool.result {callId, output}conversation.item.create {type: "function_call_output", call_id, output} + response.createTwo-event sequence to resume generation after tool output

xAI → OpenClaw (server → client)

xAI eventOpenClaw eventNotes
session.created / session.updatedsession.ready {sessionId}Emit once on first session.created
response.output_audio.deltaaudio.delta1:1 audio output passthrough
response.text.delta (xAI rename from OpenAI's response.output_text.delta)transcript.delta {role: "assistant"}Shared base adapter handles both via configurable event-name table
conversation.item.input_audio_transcription.completed (xAI user transcription)transcript.delta + transcript.done {role: "user"}xAI appears to emit a single completed user-transcription event rather than separate delta/final transcript events. The adapter can synthesize OpenClaw's transcript.delta + transcript.done pair from that completed event.
response.function_call_arguments.donetool.call {callId, name, args}Adapter may send an immediate placeholder/tool-status response if supported by xAI / OpenAI-Realtime semantics; otherwise it should wait for the actual tool.result before sending function_call_output + response.create.
response.doneturn.endedAfter assistant turn completes
input_audio_buffer.speech_startedturn.started (role=user)Supports barge-in display in client UI
errorerror1:1 with sanitization — never reflect API key
rate_limits.updatedusage.metricsTranslate to OpenClaw shape

Audio format

  • Default in: audio/pcm Linear16 little-endian, 24 kHz mono (matches xAI default).
  • Default out: same.
  • Other formats supported by xAI (pcmu, pcma, lower sample rates): pass-through configurable; not used by default.

VAD / barge-in

Server-side VAD (turn_detection: {type: "server_vad", threshold, silence_duration_ms, prefix_padding_ms}) is the recommended default. xAI handles speech detection, end-of-turn, and barge-in. On barge-in mid-response, the adapter emits turn.ended (assistant role, with interrupted: true) before the next turn.started (user role).

Reconnect / resume

xAI does not document explicit resume tokens. Defensive implementation:

  • On WS close (codes 1001/1006/1007/1011/1012/1013): auto-reconnect up to 2 attempts with 500 ms backoff, mirroring the gemini-live.ts pattern.
  • On reconnect: open new WS, send fresh session.update, re-prime conversation history via conversation.item.create per historical message.
  • On 120-min hard cap (xAI's documented session limit): emit session.rotating → close upstream → open new upstream → emit session.rotated. Operator UX continues seamlessly.

Open question for maintainers: does OpenClaw have a preferred reconnect/resume contract for /voiceclaw/realtime adapters, or should each adapter handle it as gemini-live.ts does?

Tools / function calling

OpenClaw tool catalog is exposed to xAI as function tools in session.update. On response.function_call_arguments.done:

  1. Adapter emits tool.call to client with callId.
  2. Adapter may send an immediate placeholder/tool-status response if supported by xAI / OpenAI-Realtime semantics; otherwise it should wait for the actual tool.result before sending function_call_output + response.create.
  3. Client (the OpenClaw caller) executes the tool with policy/approval gates and returns via tool.result.
  4. Adapter sends conversation.item.create with function_call_output to xAI.
  5. Adapter sends response.create to resume generation.

Parallel function calls (xAI may emit multiple done events before audio) are handled by tracking all callIds and resolving all before response.create.

Files likely touched

  • src/gateway/voiceclaw-realtime/types.ts — add "xai" to provider union
  • src/gateway/voiceclaw-realtime/session.ts — extend adapter factory
  • src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts — NEW shared base
  • src/gateway/voiceclaw-realtime/xai-realtime.ts — NEW xAI specialization
  • src/gateway/voiceclaw-realtime/openai-realtime-protocol.test.ts — NEW shared protocol tests
  • src/gateway/voiceclaw-realtime/xai-realtime.test.ts — NEW xAI-specific tests
  • src/gateway/voiceclaw-realtime/session.test.ts — add "xai" provider-selection test
  • docs/gateway/index.md — document xAI brain provider
  • docs/providers/xai.md — note realtime support

Estimated code volume: shared base ~500-800 lines; xAI specialization ~150-250 lines; types/factory updates ~30-50 lines; tests ~400-600 lines; docs ~100-200 lines.

Hard non-changes: no modification to gemini-live.ts, extensions/voice-call/, extensions/openai/realtime-voice-provider.ts, or extensions/xai/ batch/streaming-STT files. No change to gateway auth, sandboxing, or the WS control protocol. No new HTTP endpoints.

Tests

  • Mocked WebSocket only. No live xAI in CI. No XAI_API_KEY required to run tests.
  • Audio passthrough — base64 PCM16 frames pass through unmodified.
  • Transcript synthesis — verify adapter emits both transcript.delta AND transcript.done from xAI's single user-transcription completed event.
  • Function-call flow — verify tool.call emission, tool.result acceptance, conversation.item.create + response.create forwarding; parallel calls handled correctly.
  • Barge-in — inject input_audio_buffer.speech_started mid-response; verify turn.ended (interrupted) + turn.started (user).
  • Reconnect — force WS close 1006; verify retry with conversation history re-priming.
  • Session rotation — simulate 120-min cap; verify session.rotating / session.rotated.
  • Error sanitization — xAI returns error reflecting key; verify sanitized error never contains key.
  • Provider selectionconfig.provider === "xai" returns VoiceClawXaiRealtimeAdapter from session.ts factory.
  • Voice selection — verify session.config.voice overrides operator-config default; verify all five xAI voices (eve, ara, rex, sal, leo) are accepted; verify invalid voice IDs are rejected with sanitized error.
  • Secret never logged — inject mock logger; assert XAI_API_KEY value never appears in any log line.

Optional --live smoke against real xAI WSS would be operator-only / opt-in via explicit env flag, never run in CI.

Security

  • XAI_API_KEY stays inside the OpenClaw gateway process environment.
  • Never logged in any form (value, length, prefix, hash).
  • Never reflected in errors returned to clients.
  • No client-side xAI key — clients send the gateway Bearer token only, exactly as today.
  • No new public endpoint added.
  • Existing gateway auth (Bearer in first session.config message; per docs/gateway/index.md) remains unchanged.

Questions for maintainers

  1. Surface choice — is /voiceclaw/realtime the right surface for xAI Realtime, or do maintainers prefer this go through a different path (e.g., the voice-call plugin's realtime providers, a new plugin-SDK extension)?
  2. Adapter shape — would maintainers prefer a single shared OpenAIRealtimeProtocolAdapter base (parameterized for both xAI and OpenAI) or two provider-specific adapters with duplicated protocol logic?
  3. Ship order — should xAI and OpenAI Realtime (issue #71195) ship in one combined PR, or xAI first as a smaller PR followed by OpenAI as a second PR? Either is workable; smaller PR is easier to review.
  4. Code organization — should this be core gateway code (in src/gateway/voiceclaw-realtime/) or extracted via the plugin SDK? The gemini-live.ts precedent suggests core gateway code, but happy to follow maintainer preference.
  5. Test/mocking style — any preferred WebSocket mocking pattern for voiceclaw-realtime adapters? gemini-live.test.ts is the natural reference; would happily follow the same style.
  6. Roadmap overlap — does this overlap with planned work tied to #71195 or #12911? If maintainers are already in flight on #71195, we'd be glad to coordinate or contribute to that effort instead of opening a parallel PR.
  7. Reconnect contract — does OpenClaw have a preferred reconnect/resume contract for /voiceclaw/realtime adapters, or is per-adapter best-effort (like gemini-live.ts) the expected pattern?

References

What we're requesting

Maintainer feedback on the design before opening a draft PR. Downstream adopters are prepared to drive the implementation but want to align on surface, adapter shape, and ship order before writing code so we don't waste review cycles.


Filed by an OpenClaw downstream adopter. Happy to attend a sync, drop in on a community call, or coordinate with anyone already working on #71195. Please flag any roadmap or design context we may be missing.

extent analysis

TL;DR

Implement a shared OpenAIRealtimeProtocolAdapter base class to support both xAI and OpenAI Realtime protocols in the OpenClaw /voiceclaw/realtime gateway surface.

Guidance

  1. Create a new file src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts for the shared OpenAIRealtimeProtocolAdapter base class.
  2. Extend the adapter factory in src/gateway/voiceclaw-realtime/session.ts to instantiate VoiceClawXaiRealtimeAdapter when provider === "xai".
  3. Add xAI-specific configuration to src/gateway/voiceclaw-realtime/types.ts and docs/gateway/index.md.
  4. Implement event mapping between OpenClaw and xAI events, including audio passthrough, transcript synthesis, and function-call flow.
  5. Write tests for the new adapter, including mocked WebSocket tests and audio passthrough verification.

Example

// src/gateway/voiceclaw-realtime/openai-realtime-protocol.ts
export class OpenAIRealtimeProtocolAdapter {
  // ...
}

// src/gateway/voiceclaw-realtime/xai-realtime.ts
export class VoiceClawXaiRealtimeAdapter extends OpenAIRealtimeProtocolAdapter {
  // xAI-specific defaults and event-name overrides
}

Notes

  • The implementation should follow the proposed design and event mapping outlined in the issue body.
  • The shared adapter base class should be parameterized to support both xAI and OpenAI Realtime protocols.
  • The xAI-specific adapter should extend the shared base class and provide xAI-specific defaults and event-name overrides.

Recommendation

Apply the proposed design and implementation to support xAI Realtime in the OpenClaw /voiceclaw/realtime gateway surface, with the option to add OpenAI

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Proposal: Add xAI Realtime Voice Agent support to /voiceclaw/realtime via shared OpenAI-Realtime-protocol adapter [1 pull requests, 1 participants]