openclaw - 💡(How to fix) Fix [Bug]: message-tool Telegram replies skip auto=inbound TTS because inboundAudio is not passed

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In a Telegram direct chat with messages.tts.auto: "inbound", visible replies sent through the message tool after an inbound voice message are delivered as text-only because the message-action TTS path does not receive the inbound-audio flag.

Root Cause

In a Telegram direct chat with messages.tts.auto: "inbound", visible replies sent through the message tool after an inbound voice message are delivered as text-only because the message-action TTS path does not receive the inbound-audio flag.

Fix Action

Fix / Workaround

  • maybeApplyTtsToMessageActionSendPayload() forwards params.inboundAudio into maybeApplyTtsToPayload().
  • handleSendAction() calls maybeApplyTtsToMessageActionSendPayload() without passing inboundAudio.
  • maybeApplyTtsToPayload() returns early for autoMode === "inbound" unless params.inboundAudio === true.
  • The normal final-reply dispatch path does compute const inboundAudio = isInboundAudioContext(ctx) and passes it into maybeApplyTtsToReplyPayload().

The normal final-reply dispatch path does carry the flag:

Consequence: OpenClaw silently falls back to text-only replies even though messages.tts.auto: "inbound" is configured and the TTS provider works. Users lose the expected voice-reply behavior, and agents may add manual voice-file workarounds that should not be necessary.

Code Example

OpenClaw version:
OpenClaw 2026.5.20 (e510042)

Relevant config, redacted:
messages.tts.auto = "inbound"
messages.tts.provider = "microsoft"
messages.tts.providers.microsoft.voice = "ru-RU-DmitryNeural"

---

if (autoMode === "inbound" && params.inboundAudio !== true) return nextPayload;

---

return await maybeApplyTtsToPayload({
  payload: params.payload,
  cfg: params.cfg,
  channel: params.channel,
  kind: "final",
  inboundAudio: params.inboundAudio,
  ttsAuto,
  agentId: params.agentId,
  accountId: params.accountId ?? void 0
});

---

const ttsPayload = await maybeApplyTtsToMessageActionSendPayload({
  payload: sendPayload.payload,
  cfg,
  channel,
  accountId,
  agentId,
  sessionKey: input.sessionKey,
  dryRun
});

---

const inboundAudio = isInboundAudioContext(ctx);
...
maybeApplyTtsToReplyPayload({
  ...
  inboundAudio,
  ...
});
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

In a Telegram direct chat with messages.tts.auto: "inbound", visible replies sent through the message tool after an inbound voice message are delivered as text-only because the message-action TTS path does not receive the inbound-audio flag.

Steps to reproduce

  1. Run OpenClaw 2026.5.20 with Telegram configured and a working TTS provider.
  2. Configure TTS with messages.tts.auto: "inbound" and provider microsoft (observed with ru-RU-DmitryNeural; likely provider-independent as the skip happens before synthesis).
  3. Use a Telegram direct / message-tool-only agent context where visible channel output is sent through message(action="send", message="...").
  4. Send a Telegram voice message to the agent.
  5. From the assistant turn, send a visible reply with message(action="send", message="...").
  6. Observe that Telegram receives only the text message and no voice/audio attachment.

Expected behavior

Because the inbound Telegram turn contains audio and messages.tts.auto is inbound, the message-tool send path should apply the configured TTS transform before Telegram delivery, or should receive enough inbound context to make the same decision as the normal final-reply path. The user should receive a voice/audio reply, not text only.

Actual behavior

The user receives only text. This was observed repeatedly for Telegram voice-message turns in the same direct chat. Setting asVoice: true on a text-only message(action="send", message=..., asVoice=true) call also delivers text only; it does not synthesize TTS from text.

Local code inspection explains the skip:

  • maybeApplyTtsToMessageActionSendPayload() forwards params.inboundAudio into maybeApplyTtsToPayload().
  • handleSendAction() calls maybeApplyTtsToMessageActionSendPayload() without passing inboundAudio.
  • maybeApplyTtsToPayload() returns early for autoMode === "inbound" unless params.inboundAudio === true.
  • The normal final-reply dispatch path does compute const inboundAudio = isInboundAudioContext(ctx) and passes it into maybeApplyTtsToReplyPayload().

OpenClaw version

OpenClaw 2026.5.20 (e510042)

Operating system

Ubuntu 24.04.4 LTS

Install method

npm global

Model

OpenAI Codex runtime / Codex-backed Telegram direct agent; exact model label not exposed to the agent by the current runtime (runtime surfaced NOT_ENOUGH_INFO for the exact label).

Provider / routing chain

Telegram inbound voice -> OpenClaw Gateway -> Codex/OpenAI agent runtime -> message tool visible send -> Telegram outbound.

TTS config: messages.tts.auto: "inbound", provider microsoft, voice ru-RU-DmitryNeural, output format audio-24khz-48kbitrate-mono-mp3.

Additional provider/model setup details

The configured Microsoft TTS provider was verified separately by explicit conversion and produced audio. The observed failure is not provider synthesis failure; the message-tool path appears to suppress TTS before synthesis because inboundAudio is absent.

Logs, screenshots, and evidence

OpenClaw version:
OpenClaw 2026.5.20 (e510042)

Relevant config, redacted:
messages.tts.auto = "inbound"
messages.tts.provider = "microsoft"
messages.tts.providers.microsoft.voice = "ru-RU-DmitryNeural"

Installed dist/extensions/speech-core/runtime-api.js:

if (autoMode === "inbound" && params.inboundAudio !== true) return nextPayload;

Installed dist/message-action-runner-*.js:

return await maybeApplyTtsToPayload({
  payload: params.payload,
  cfg: params.cfg,
  channel: params.channel,
  kind: "final",
  inboundAudio: params.inboundAudio,
  ttsAuto,
  agentId: params.agentId,
  accountId: params.accountId ?? void 0
});

But handleSendAction() calls the helper without inboundAudio:

const ttsPayload = await maybeApplyTtsToMessageActionSendPayload({
  payload: sendPayload.payload,
  cfg,
  channel,
  accountId,
  agentId,
  sessionKey: input.sessionKey,
  dryRun
});

The normal final-reply dispatch path does carry the flag:

const inboundAudio = isInboundAudioContext(ctx);
...
maybeApplyTtsToReplyPayload({
  ...
  inboundAudio,
  ...
});

I also inspected the current npm latest, [email protected]; the same relevant message-action caller shape appears to still be present there, although I did not run the full runtime on 2026.5.27.

Impact and severity

Affected: Telegram direct users who rely on voice replies when incoming user turns are voice messages and the runtime uses the message tool for visible channel delivery.

Severity: High for voice-first Telegram workflows.

Frequency: Repeatedly observed in the affected direct chat after inbound voice messages.

Consequence: OpenClaw silently falls back to text-only replies even though messages.tts.auto: "inbound" is configured and the TTS provider works. Users lose the expected voice-reply behavior, and agents may add manual voice-file workarounds that should not be necessary.

Additional information

Related but not exact:

  • #87466 is a broader Telegram voice-delivery instability umbrella.
  • #83636 covers dynamic TTS tool media suppressed in message_tool_only contexts.
  • #81598 / PR #83543 fixed the broader message-tool TTS hook, but this appears to be a narrower residual bug: the message-action send path can pass inboundAudio, but its caller does not supply it.

Suggested fix direction:

Carry the inbound-audio context into handleSendAction() / maybeApplyTtsToMessageActionSendPayload() when the current source turn contains audio, or otherwise make message-tool sends in auto: "inbound" contexts use the same inbound audio detection as the normal final-reply delivery path.

Last known good version: NOT_ENOUGH_INFO

First known bad version: OpenClaw 2026.5.20 (e510042)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Because the inbound Telegram turn contains audio and messages.tts.auto is inbound, the message-tool send path should apply the configured TTS transform before Telegram delivery, or should receive enough inbound context to make the same decision as the normal final-reply path. The user should receive a voice/audio reply, not text only.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING