openclaw - 💡(How to fix) Fix Add non-blocking realtime relay speech injection for async channel completions

openclaw2026-05-20 16:55:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Please consider adding an official Gateway method for speaking a short text message through an existing realtime relay session, using the same voice/provider/session that the user is already hearing.

Suggested shape:

talk.session.speak({
  sessionId: string,
  text: string
}) => { ok: true }

Internally this can call the realtime bridge sendUserMessage(text) (or an equivalent provider-neutral method) and emit a safe talk event such as completion.speech.requested.

Root Cause

Please consider adding an official Gateway method for speaking a short text message through an existing realtime relay session, using the same voice/provider/session that the user is already hearing.

Suggested shape:

talk.session.speak({
  sessionId: string,
  text: string
}) => { ok: true }

Internally this can call the realtime bridge sendUserMessage(text) (or an equivalent provider-neutral method) and emit a safe talk event such as completion.speech.requested.

Fix Action

Fix / Workaround

In a local OpenClaw 2026.5.18 deployment, a small patch adding this method allowed a browser voice channel to speak async completion updates through the active openai/gpt-realtime relay voice instead of falling back to browser speechSynthesis or a separate TTS provider.

Code Example

talk.session.speak({
  sessionId: string,
  text: string
}) => { ok: true }

RAW_BUFFERClick to expand / collapse

Summary

Please consider adding an official Gateway method for speaking a short text message through an existing realtime relay session, using the same voice/provider/session that the user is already hearing.

Suggested shape:

talk.session.speak({
  sessionId: string,
  text: string
}) => { ok: true }

Internally this can call the realtime bridge sendUserMessage(text) (or an equivalent provider-neutral method) and emit a safe talk event such as completion.speech.requested.

Problem

Voice/web channels that use OpenClaw as the execution core can receive async or deferred task completions after the original realtime tool call has already returned.

Today there are two imperfect options:

Keep the realtime tool call open with willContinue: true and later submit the final result.
- This preserves the realtime voice.
- But it can block or delay subsequent response.create work while the long task is still running, which hurts natural conversation.
Close the tool call, track the long task externally, and speak the final completion outside the realtime session.
- This keeps the conversation available.
- But the channel has to use browser TTS or a separate speech provider, so the voice often sounds different from the active OpenClaw realtime voice.

For voice channels, the ideal behavior is:

user asks naturally;
OpenClaw accepts or starts long-running work;
the realtime conversation remains available;
when OpenClaw finishes, the channel can ask the same realtime session to speak the final safe user-facing summary;
no raw task IDs, queue details, tool payloads, or technical errors are exposed to the user.

Use Case

A browser voice channel backed by OpenClaw Realtime asks OpenClaw to inspect or update a GitHub Project card. The operation may finish after the initial realtime handoff window.

The channel should be able to deliver the final result like:

Done, I moved card 12 to Cancelado.

using the same realtime voice already active in the conversation, without keeping the original tool call open and without interrupting later user turns.

Why `talk.speak` is not enough

talk.speak is useful as a general TTS method, but it may use a separate speech provider or voice profile. For realtime voice UX, channel completion delivery should preferably reuse the active realtime session itself.

Proposed Behavior

Add an official descriptor and handler for talk.session.speak or similar.
Limit it to realtime relay sessions.
Require operator.write, same family as talk.session.submitToolResult.
Validate sessionId and bounded text.
Send the text into the active realtime bridge as a message to be spoken by the current voice.
Emit a talk event with metadata only, not the full user text, to avoid logging sensitive completion content.
Do not require a pending tool call.
Do not block normal user conversation while background work is pending.

Notes from local validation

The key design constraint is that this method should be channel-delivery only. It should not decide actions, route commands, inspect projects, or become a separate agent path.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Add non-blocking realtime relay speech injection for async channel completions

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Problem

Use Case

Why `talk.speak` is not enough

Proposed Behavior

Notes from local validation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Add non-blocking realtime relay speech injection for async channel completions

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Problem

Use Case

Why talk.speak is not enough

Proposed Behavior

Notes from local validation

Still need to ship something?

TRENDING

Why `talk.speak` is not enough