openclaw - 💡(How to fix) Fix Feature request: native xAI Grok Voice Agent API support in voice-call realtime

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

xAI shipped the Grok Voice Agent API in December 2025 (announcement: https://x.ai/news/grok-voice-agent-api). It is positioned as OpenAI-Realtime-API-compatible and is currently the cheapest production-grade speech-to-speech option:

Provider$/min
OpenAI Realtime (gpt-realtime-1.5)~$0.30
xAI Grok Voice (grok-voice-think-fast-1.0)~$0.05

For Twilio + voice-agent use cases (appointment booking, IVR navigation, support automation) this is a 6× cost reduction. It would be great to have first-class support in voice-call alongside the existing OpenAI provider.

Error Message

@@ runtime-entry (after agent_consult registration)

  • realtimeHandler.registerToolHandler("send_dtmf", async (args, callId) => {
  • const digits = (args?.digits ?? "").toString().trim()
  • .replace(/[^0-9*#A-Da-dwW]/g, "");
  • if (!digits) return { error: "no valid DTMF digits" };
  • const result = await manager.sendDtmf(callId, digits);
  • return result?.success ? { success: true, digits } : { error: result?.error || "dtmf failed" };
  • });
  • realtimeHandler.registerToolHandler("end_call", async (_args, callId) => {
  • const call = manager.getCall(callId);
  • if (!call) return { error: Call "${callId}" not found };
  • await provider.hangupCall({ callId, providerCallId: call.providerCallId, reason: "completed" });
  • return { success: true };
  • });

Root Cause

xAI shipped the Grok Voice Agent API in December 2025 (announcement: https://x.ai/news/grok-voice-agent-api). It is positioned as OpenAI-Realtime-API-compatible and is currently the cheapest production-grade speech-to-speech option:

Provider$/min
OpenAI Realtime (gpt-realtime-1.5)~$0.30
xAI Grok Voice (grok-voice-think-fast-1.0)~$0.05

For Twilio + voice-agent use cases (appointment booking, IVR navigation, support automation) this is a 6× cost reduction. It would be great to have first-class support in voice-call alongside the existing OpenAI provider.

Fix Action

Fix / Workaround

Detect xAI hosts (api.x.ai, *.x.ai) inside OpenAIRealtimeVoiceBridge.resolveConnectionParams / sendSessionUpdate / handleBargeIn and emit the xAI-flavored variant. ~50 lines of conditional code in dist/realtime-voice-provider-*.js (works today as a local patch, see "Reference patch" below).

Reference patch (Option A, what I'm running locally)

Code Example

"audio": {
     "input":  { "format": { "type": "audio/pcmu" }, "turn_detection": { ... } },
     "output": { "format": { "type": "audio/pcmu" }, "voice": "ara" }
   }

---

"realtime": {
  "enabled": true,
  "provider": "xai",
  "providers": {
    "xai": {
      "apiKey": "xai-...",
      "model": "grok-voice-think-fast-1.0",
      "voice": "ara"
    }
  }
}

---

@@ realtime-voice-provider (OpenAIRealtimeVoiceBridge)
+function isXaiEndpoint(endpoint) {
+  if (!endpoint || typeof endpoint !== "string") return false;
+  try {
+    const host = new URL(endpoint).host.toLowerCase();
+    return host === "api.x.ai" || host === "api.grok.x.ai" || host.endsWith(".x.ai");
+  } catch {
+    return /(^|\.)x\.ai($|\/)/.test(endpoint.toLowerCase());
+  }
+}

 constructor(config) {
   ...
+  this.isXai = isXaiEndpoint(config.azureEndpoint);
 }

 resolveConnectionParams() {
+  if (this.isXai) {
+    const base = this.config.azureEndpoint.replace(/\/$/, "")
+      .replace(/^http(s?):/, (_, s) => `ws${s}:`);
+    const url = `${base}/v1/realtime?model=${encodeURIComponent(this.config.model ?? "grok-voice-think-fast-1.0")}`;
+    return { url, headers: { Authorization: `Bearer ${this.config.apiKey}` } };
+  }
   ...existing OpenAI / Azure paths...
 }

 sendSessionUpdate() {
+  if (this.isXai) {
+    const fmt = this.resolveRealtimeAudioFormat() === "pcm16" ? "audio/pcm" : "audio/pcmu";
+    this.sendEvent({
+      type: "session.update",
+      session: {
+        type: "realtime",
+        model: this.config.model ?? "grok-voice-think-fast-1.0",
+        instructions: this.config.instructions,
+        audio: {
+          input:  { format: { type: fmt }, turn_detection: { type: "server_vad", ... } },
+          output: { format: { type: fmt }, voice: this.config.voice ?? "ara" }
+        },
+        ...(this.config.tools?.length ? { tools: this.config.tools, tool_choice: "auto" } : {})
+      }
+    });
+    return;
+  }
   ...existing OpenAI session.update...
 }

 handleBargeIn(options) {
   ...
   if (shouldInterruptProvider) {
+    if (!this.isXai) {
       this.sendEvent({ type: "conversation.item.truncate", ... });
+    }
     this.config.onClearAudio();
     ...
   }
 }

---

@@ runtime-entry (after agent_consult registration)
+ realtimeHandler.registerToolHandler("send_dtmf", async (args, callId) => {
+   const digits = (args?.digits ?? "").toString().trim()
+     .replace(/[^0-9*#A-Da-dwW]/g, "");
+   if (!digits) return { error: "no valid DTMF digits" };
+   const result = await manager.sendDtmf(callId, digits);
+   return result?.success ? { success: true, digits } : { error: result?.error || "dtmf failed" };
+ });
+ realtimeHandler.registerToolHandler("end_call", async (_args, callId) => {
+   const call = manager.getCall(callId);
+   if (!call) return { error: `Call "${callId}" not found` };
+   await provider.hangupCall({ callId, providerCallId: call.providerCallId, reason: "completed" });
+   return { success: true };
+ });
RAW_BUFFERClick to expand / collapse

Feature request: native xAI Grok Voice Agent API support in voice-call realtime

Summary

xAI shipped the Grok Voice Agent API in December 2025 (announcement: https://x.ai/news/grok-voice-agent-api). It is positioned as OpenAI-Realtime-API-compatible and is currently the cheapest production-grade speech-to-speech option:

Provider$/min
OpenAI Realtime (gpt-realtime-1.5)~$0.30
xAI Grok Voice (grok-voice-think-fast-1.0)~$0.05

For Twilio + voice-agent use cases (appointment booking, IVR navigation, support automation) this is a 6× cost reduction. It would be great to have first-class support in voice-call alongside the existing OpenAI provider.

Why the existing OpenAI provider doesn't "just work" against api.x.ai

xAI's Realtime API is mostly compatible with OpenAI's, but has documented differences (see https://docs.x.ai/developers/model-capabilities/audio/voice-agent). Pointing the OpenAI provider at api.x.ai via azureEndpoint produces static audio plus tool-call oddities. The blockers I hit:

  1. Audio format schema — The bridge sends the legacy flat schema (input_audio_format: "g711_ulaw", output_audio_format: "g711_ulaw") inside session.update. xAI silently ignores those fields and defaults to PCM @ 24 kHz, so the audio Twilio sends/receives (μ-law @ 8 kHz) is interpreted/produced at the wrong sample rate → static both directions. xAI requires the new nested schema:
    "audio": {
      "input":  { "format": { "type": "audio/pcmu" }, "turn_detection": { ... } },
      "output": { "format": { "type": "audio/pcmu" }, "voice": "ara" }
    }
  2. Unsupported conversation.item.truncate — xAI doesn't implement this client event. The bridge currently sends it during barge-in handling, which logs Audio content of Xms is already shorter than Yms errors. response.cancel alone is sufficient on xAI.
  3. OpenAI-Beta header — xAI's wss://api.x.ai/v1/realtime doesn't expect this header.

The URL (wss://api.x.ai/v1/realtime?model=...) and auth (Authorization: Bearer xai-...) work as-is — only the in-session schema and barge-in flow needed adjustment.

Proposed solution

Two reasonable options, in increasing order of cleanliness:

Option A — minimal: branch on host inside the OpenAI bridge

Detect xAI hosts (api.x.ai, *.x.ai) inside OpenAIRealtimeVoiceBridge.resolveConnectionParams / sendSessionUpdate / handleBargeIn and emit the xAI-flavored variant. ~50 lines of conditional code in dist/realtime-voice-provider-*.js (works today as a local patch, see "Reference patch" below).

Option B — clean: dedicated xAI realtime voice provider (recommended)

Add a new provider plugin under extensions/xai/realtime-voice-provider.{ts,js} paralleling extensions/openai/realtime-voice-provider.js, register realtimeVoiceProviders: ["xai"] in extensions/xai/openclaw.plugin.json, and let users select via:

"realtime": {
  "enabled": true,
  "provider": "xai",
  "providers": {
    "xai": {
      "apiKey": "xai-...",
      "model": "grok-voice-think-fast-1.0",
      "voice": "ara"
    }
  }
}

The xai extension already declares realtimeTranscriptionProviders: ["xai"], so adding realtimeVoiceProviders is a natural extension.

Reference patch (Option A, what I'm running locally)

@@ realtime-voice-provider (OpenAIRealtimeVoiceBridge)
+function isXaiEndpoint(endpoint) {
+  if (!endpoint || typeof endpoint !== "string") return false;
+  try {
+    const host = new URL(endpoint).host.toLowerCase();
+    return host === "api.x.ai" || host === "api.grok.x.ai" || host.endsWith(".x.ai");
+  } catch {
+    return /(^|\.)x\.ai($|\/)/.test(endpoint.toLowerCase());
+  }
+}

 constructor(config) {
   ...
+  this.isXai = isXaiEndpoint(config.azureEndpoint);
 }

 resolveConnectionParams() {
+  if (this.isXai) {
+    const base = this.config.azureEndpoint.replace(/\/$/, "")
+      .replace(/^http(s?):/, (_, s) => `ws${s}:`);
+    const url = `${base}/v1/realtime?model=${encodeURIComponent(this.config.model ?? "grok-voice-think-fast-1.0")}`;
+    return { url, headers: { Authorization: `Bearer ${this.config.apiKey}` } };
+  }
   ...existing OpenAI / Azure paths...
 }

 sendSessionUpdate() {
+  if (this.isXai) {
+    const fmt = this.resolveRealtimeAudioFormat() === "pcm16" ? "audio/pcm" : "audio/pcmu";
+    this.sendEvent({
+      type: "session.update",
+      session: {
+        type: "realtime",
+        model: this.config.model ?? "grok-voice-think-fast-1.0",
+        instructions: this.config.instructions,
+        audio: {
+          input:  { format: { type: fmt }, turn_detection: { type: "server_vad", ... } },
+          output: { format: { type: fmt }, voice: this.config.voice ?? "ara" }
+        },
+        ...(this.config.tools?.length ? { tools: this.config.tools, tool_choice: "auto" } : {})
+      }
+    });
+    return;
+  }
   ...existing OpenAI session.update...
 }

 handleBargeIn(options) {
   ...
   if (shouldInterruptProvider) {
+    if (!this.isXai) {
       this.sendEvent({ type: "conversation.item.truncate", ... });
+    }
     this.config.onClearAudio();
     ...
   }
 }

Plus, separately, two missing realtime-tool handlers in @openclaw/voice-call that would be useful for any provider but were necessary to make Grok actually do anything mid-call (IVR navigation, hangup):

@@ runtime-entry (after agent_consult registration)
+ realtimeHandler.registerToolHandler("send_dtmf", async (args, callId) => {
+   const digits = (args?.digits ?? "").toString().trim()
+     .replace(/[^0-9*#A-Da-dwW]/g, "");
+   if (!digits) return { error: "no valid DTMF digits" };
+   const result = await manager.sendDtmf(callId, digits);
+   return result?.success ? { success: true, digits } : { error: result?.error || "dtmf failed" };
+ });
+ realtimeHandler.registerToolHandler("end_call", async (_args, callId) => {
+   const call = manager.getCall(callId);
+   if (!call) return { error: `Call "${callId}" not found` };
+   await provider.hangupCall({ callId, providerCallId: call.providerCallId, reason: "completed" });
+   return { success: true };
+ });

The realtime AI can then declare these in realtime.tools and Grok will call send_dtmf to navigate IVRs and end_call to hang up when the goal is achieved.

Tested against

  • OpenClaw 2026.5.7 on macOS
  • @openclaw/voice-call 2026.5.7
  • Twilio Programmable Voice (μ-law @ 8 kHz Media Streams)
  • xAI model: grok-voice-think-fast-1.0, voices ara and eve confirmed working

Happy to open a PR for Option B if there's interest and it's not already in flight.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Feature request: native xAI Grok Voice Agent API support in voice-call realtime