openclaw - 💡(How to fix) Fix [Bug]: message-tool Telegram replies skip auto=inbound TTS because inboundAudio is not passed

Q: Expected behavior

Because the inbound Telegram turn contains audio and `messages.tts.auto` is `inbound`, the message-tool send path should apply the configured TTS transform before Telegram delivery, or should receive enough inbound context to make the same decision as the normal final-reply path. The user should receive a voice/audio reply, not text only.

Fix Action

Fix / Workaround

maybeApplyTtsToMessageActionSendPayload() forwards params.inboundAudio into maybeApplyTtsToPayload().
handleSendAction() calls maybeApplyTtsToMessageActionSendPayload() without passing inboundAudio.
maybeApplyTtsToPayload() returns early for autoMode === "inbound" unless params.inboundAudio === true.
The normal final-reply dispatch path does compute const inboundAudio = isInboundAudioContext(ctx) and passes it into maybeApplyTtsToReplyPayload().

The normal final-reply dispatch path does carry the flag:

Consequence: OpenClaw silently falls back to text-only replies even though messages.tts.auto: "inbound" is configured and the TTS provider works. Users lose the expected voice-reply behavior, and agents may add manual voice-file workarounds that should not be necessary.

Code Example

OpenClaw version:
OpenClaw 2026.5.20 (e510042)

Relevant config, redacted:
messages.tts.auto = "inbound"
messages.tts.provider = "microsoft"
messages.tts.providers.microsoft.voice = "ru-RU-DmitryNeural"

---

if (autoMode === "inbound" && params.inboundAudio !== true) return nextPayload;

---

return await maybeApplyTtsToPayload({
  payload: params.payload,
  cfg: params.cfg,
  channel: params.channel,
  kind: "final",
  inboundAudio: params.inboundAudio,
  ttsAuto,
  agentId: params.agentId,
  accountId: params.accountId ?? void 0
});

---

const ttsPayload = await maybeApplyTtsToMessageActionSendPayload({
  payload: sendPayload.payload,
  cfg,
  channel,
  accountId,
  agentId,
  sessionKey: input.sessionKey,
  dryRun
});

---

const inboundAudio = isInboundAudioContext(ctx);
...
maybeApplyTtsToReplyPayload({
  ...
  inboundAudio,
  ...
});

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

In a Telegram direct chat with messages.tts.auto: "inbound", visible replies sent through the message tool after an inbound voice message are delivered as text-only because the message-action TTS path does not receive the inbound-audio flag.

Steps to reproduce

Run OpenClaw 2026.5.20 with Telegram configured and a working TTS provider.
Configure TTS with messages.tts.auto: "inbound" and provider microsoft (observed with ru-RU-DmitryNeural; likely provider-independent as the skip happens before synthesis).
Use a Telegram direct / message-tool-only agent context where visible channel output is sent through message(action="send", message="...").
Send a Telegram voice message to the agent.
From the assistant turn, send a visible reply with message(action="send", message="...").
Observe that Telegram receives only the text message and no voice/audio attachment.

Expected behavior

Because the inbound Telegram turn contains audio and messages.tts.auto is inbound, the message-tool send path should apply the configured TTS transform before Telegram delivery, or should receive enough inbound context to make the same decision as the normal final-reply path. The user should receive a voice/audio reply, not text only.

Actual behavior

The user receives only text. This was observed repeatedly for Telegram voice-message turns in the same direct chat. Setting asVoice: true on a text-only message(action="send", message=..., asVoice=true) call also delivers text only; it does not synthesize TTS from text.

Local code inspection explains the skip:

maybeApplyTtsToMessageActionSendPayload() forwards params.inboundAudio into maybeApplyTtsToPayload().
handleSendAction() calls maybeApplyTtsToMessageActionSendPayload() without passing inboundAudio.
maybeApplyTtsToPayload() returns early for autoMode === "inbound" unless params.inboundAudio === true.
The normal final-reply dispatch path does compute const inboundAudio = isInboundAudioContext(ctx) and passes it into maybeApplyTtsToReplyPayload().

OpenClaw version

OpenClaw 2026.5.20 (e510042)

Operating system

Ubuntu 24.04.4 LTS

Install method

npm global

Model

OpenAI Codex runtime / Codex-backed Telegram direct agent; exact model label not exposed to the agent by the current runtime (runtime surfaced NOT_ENOUGH_INFO for the exact label).

Provider / routing chain

Telegram inbound voice -> OpenClaw Gateway -> Codex/OpenAI agent runtime -> message tool visible send -> Telegram outbound.

TTS config: messages.tts.auto: "inbound", provider microsoft, voice ru-RU-DmitryNeural, output format audio-24khz-48kbitrate-mono-mp3.

Additional provider/model setup details

The configured Microsoft TTS provider was verified separately by explicit conversion and produced audio. The observed failure is not provider synthesis failure; the message-tool path appears to suppress TTS before synthesis because inboundAudio is absent.

Logs, screenshots, and evidence

OpenClaw version:
OpenClaw 2026.5.20 (e510042)

Relevant config, redacted:
messages.tts.auto = "inbound"
messages.tts.provider = "microsoft"
messages.tts.providers.microsoft.voice = "ru-RU-DmitryNeural"

Installed dist/extensions/speech-core/runtime-api.js:

if (autoMode === "inbound" && params.inboundAudio !== true) return nextPayload;

Installed dist/message-action-runner-*.js:

return await maybeApplyTtsToPayload({
  payload: params.payload,
  cfg: params.cfg,
  channel: params.channel,
  kind: "final",
  inboundAudio: params.inboundAudio,
  ttsAuto,
  agentId: params.agentId,
  accountId: params.accountId ?? void 0
});

But handleSendAction() calls the helper without inboundAudio:

const ttsPayload = await maybeApplyTtsToMessageActionSendPayload({
  payload: sendPayload.payload,
  cfg,
  channel,
  accountId,
  agentId,
  sessionKey: input.sessionKey,
  dryRun
});

The normal final-reply dispatch path does carry the flag:

const inboundAudio = isInboundAudioContext(ctx);
...
maybeApplyTtsToReplyPayload({
  ...
  inboundAudio,
  ...
});

I also inspected the current npm latest, [email protected]; the same relevant message-action caller shape appears to still be present there, although I did not run the full runtime on 2026.5.27.

Impact and severity

Affected: Telegram direct users who rely on voice replies when incoming user turns are voice messages and the runtime uses the message tool for visible channel delivery.

Severity: High for voice-first Telegram workflows.

Frequency: Repeatedly observed in the affected direct chat after inbound voice messages.

Additional information

Related but not exact:

#87466 is a broader Telegram voice-delivery instability umbrella.
#83636 covers dynamic TTS tool media suppressed in message_tool_only contexts.
#81598 / PR #83543 fixed the broader message-tool TTS hook, but this appears to be a narrower residual bug: the message-action send path can pass inboundAudio, but its caller does not supply it.

Suggested fix direction:

Carry the inbound-audio context into handleSendAction() / maybeApplyTtsToMessageActionSendPayload() when the current source turn contains audio, or otherwise make message-tool sends in auto: "inbound" contexts use the same inbound audio detection as the normal final-reply delivery path.

Last known good version: NOT_ENOUGH_INFO

First known bad version: OpenClaw 2026.5.20 (e510042)

FAQ

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: message-tool Telegram replies skip auto=inbound TTS because inboundAudio is not passed

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING