openclaw - ✅(Solved) Fix [Bug]: Signal voice-note replies skip auto TTS when messages.tts.auto = "inbound" [1 pull requests, 1 participants]

ScientificProgrammer · 2026-04-27T23:43:32Z

[openclaw] On OpenClaw 2026.4.23 , Signal direct-chat voice notes are processed and transcribed successfully, but messages.tts.auto = "inbound" does not trigge… On OpenClaw `2026.4.23`, Signal direct-chat voice notes are processed and transcribed successfully, but `messages.tts.auto = "inbound"` does not trigger a voice reply; the assistant replies in plain text and the gateway log shows no `tts:` attempt for that DM reply. # PR #73158: fix(signal): detect generic voice-note attachments - Repository: openclaw/openclaw - Author: neeravmakwana - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/73158 ## Description (problem / solution / changelog) ## Summary - Fixes #73091 by preserving byte-sniffed audio MIME types when Signal downloads voice-note attachments that were declared as `application/octet-stream`. - Keeps verified Signal voice notes marked as `audio/*` so `messages.tts.auto: "inbound"` reaches the existing inbound-audio TTS gate. - Adds Signal regression coverage for both sniffed generic voice attachments and audio-looking filenames that remain unverified. ## Root cause Signal attachments can arrive with `application/octet-stream`. The previous Signal download path preserved that generic declared MIME even after fetching the attachment bytes, so the reply dispatcher later computed `inboundAudio = false` before media understanding could transcribe the audio. ## Why this is safe The fix uses the existing shared `detectMime` helper on the downloaded bytes and declared header MIME, then passes that verified content type through the existing Signal inbound context. Non-generic MIME values continue through the same shared detector path, and audio-looking filenames no longer upgrade generic attachments to `audio/*` by themselves. Security/runtime controls are unchanged: Signal allowlist/DM policy, mention gating, attachment fetch limits, media storage, media understanding, and the TTS `auto: "inbound"` runtime gate all remain enforced by the existing runtime paths. ## Tests - `pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/signal/src/monitor.ts extensions/signal/src/monitor/event-handler.ts extensions/signal/src/monitor/event-handler.inbound-context.test.ts` - `pnpm test extensions/signal/src/monitor/event-handler.inbound-context.test.ts -- --reporter=verbose` - `git diff --check` - `pnpm check:changed` ## Out of scope - No changes to TTS provider selection, TTS auto-mode policy, or outbound audio generation. - No changes to Signal attachment download limits or Signal access policy. - No broad media MIME policy changes outside the Signal inbound download path. ## Changed files - `CHANGELOG.md` (modified, +1/-0) - `extensions/signal/src/monitor.ts` (modified, +4/-3) - `extensions/signal/src/monitor/event-handler.inbound-context.test.ts` (modified, +40/-4) ## Fix / Workaround openclaw -> google (gemini-2.5-pro). Provider chain is not implicated; LLM is producing text correctly. The break is between LLM output and TTS gating in the dispatcher. So the agent-side body satisfies `AUDIO_HEADER_RE = /^\[Audio\b/i` and contains ` `. But `isInboundAudioContext()` at `src/auto-reply/reply/dispatch-from-config.ts:146-180` still returns false at the gating point — neither the `MediaType`/`MediaTypes` path nor the body-regex path matches at that call site for this Signal turn. I'm happy to gather further runtime evidence (e.g. instrumenting the dispatcher to log resolved `ctx.MediaType` for the failing turn) if that would help triage. ### Bug type Behavior bug (incorrect output/state without crash) ### Beta release blocker No ### Summary On OpenClaw `2026.4.23`, Signal direct-chat voice notes are processed and transcribed successfully, but `messages.tts.auto = "inbound"` does not trigger a voice reply; the assistant replies in plain text and the gateway log shows no `tts:` attempt for that DM reply. ### Steps to reproduce 1. Configure `messages.tts.auto = "inbound"` (and a matching prefs override `~/.openclaw/settings/tts.json` `tts.auto = "inbound"`) with a working ElevenLabs provider on `2026.4.23`. 2. Restart the gateway. `openclaw doctor` reports Signal `ok`. 3. From a paired Signal DM, send the bot a voice note. 4. Observe: voice arrives, transcription works (visible to the agent), plain-text reply is delivered, **no `tts:` lines in `/tmp/openclaw/openclaw-YYYY-MM-DD.log`** for that turn. ### Expected behavior `messages.tts.auto = "inbound"` should produce a voice reply when the inbound is a voice note, per the maintainer's confirmation in #65951 (Telegram analog) that `inbound` mode means "voice-reply-to-voice." This is also the documented behavior in #26493. ### Actual behavior - Inbound voice note is received and transcribed. - Assistant generates a plain-text reply. - Reply is delivered as text only. - Gateway log contains no TTS attempt entries for the turn (no `tts`, `elevenlabs`, `audioPath`, or `setLastTtsAttempt` lines). - `~/.openclaw/media/outbound/`

openclaw2026-04-27 23:43:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#73091•Fetched 2026-04-28 06:27:37

View on GitHub

Comments

Participants

Timeline

Reactions

Author

ScientificProgrammer

Participants

ScientificProgrammer

Timeline (top)

cross-referenced ×1mentioned ×1subscribed ×1

On OpenClaw 2026.4.23, Signal direct-chat voice notes are processed and transcribed successfully, but messages.tts.auto = "inbound" does not trigger a voice reply; the assistant replies in plain text and the gateway log shows no tts: attempt for that DM reply.

Error Message

Consequence: Voice-first workflows on Signal silently fall back to text, with no error to surface the misconfiguration.

Root Cause

#65951 — "[Bug]: Telegram DM voice-note replies skip auto TTS on 2026.4.11 when transcript replaces <media:audio> and inboundAudio becomes false" — same mechanism on Telegram, closed by @steipete with "fixed on current main." The fix landed in commit 6a67f6556885d376aca2aa1283b540bf485416c5 (fix(voice): reuse preflight transcripts across channels), which touched bluebubbles, discord, feishu, telegram, whatsapp, and core media-understanding — but did not touch the signal extension. OpenClaw's per-channel architecture (each channel handler builds its own inbound context) means the fix has to be applied per-channel, and Signal hasn't received it yet.
#64328 — "fix(qqbot): set MediaType 'audio' for voice attachments to enable TTS…" — same root cause, same per-channel fix pattern, applied to qqbot.
#62222 — "[codex] fix inbound TTS detection for audio attachments" — earlier general fix.
#26493 — "[Feature]: add runtime support for remaining /tts auto modes (inbound, tagged)" — documents that inbound is intended to be voice-reply-to-voice and is fully implemented in the auto-reply path; only the runtime command surface lacks the toggle.

Fix Action

Fix / Workaround

openclaw -> google (gemini-2.5-pro). Provider chain is not implicated; LLM is producing text correctly. The break is between LLM output and TTS gating in the dispatcher.

So the agent-side body satisfies AUDIO_HEADER_RE = /^\[Audio\b/i and contains <media:audio>. But isInboundAudioContext() at src/auto-reply/reply/dispatch-from-config.ts:146-180 still returns false at the gating point — neither the MediaType/MediaTypes path nor the body-regex path matches at that call site for this Signal turn.

I'm happy to gather further runtime evidence (e.g. instrumenting the dispatcher to log resolved ctx.MediaType for the failing turn) if that would help triage.

PR fix notes

PR #73158: fix(signal): detect generic voice-note attachments

Repository: openclaw/openclaw
Author: neeravmakwana
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/73158

Description (problem / solution / changelog)

Summary

Fixes #73091 by preserving byte-sniffed audio MIME types when Signal downloads voice-note attachments that were declared as application/octet-stream.
Keeps verified Signal voice notes marked as audio/* so messages.tts.auto: "inbound" reaches the existing inbound-audio TTS gate.
Adds Signal regression coverage for both sniffed generic voice attachments and audio-looking filenames that remain unverified.

Root cause

Signal attachments can arrive with application/octet-stream. The previous Signal download path preserved that generic declared MIME even after fetching the attachment bytes, so the reply dispatcher later computed inboundAudio = false before media understanding could transcribe the audio.

Why this is safe

The fix uses the existing shared detectMime helper on the downloaded bytes and declared header MIME, then passes that verified content type through the existing Signal inbound context. Non-generic MIME values continue through the same shared detector path, and audio-looking filenames no longer upgrade generic attachments to audio/* by themselves.

Security/runtime controls are unchanged: Signal allowlist/DM policy, mention gating, attachment fetch limits, media storage, media understanding, and the TTS auto: "inbound" runtime gate all remain enforced by the existing runtime paths.

Tests

pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/signal/src/monitor.ts extensions/signal/src/monitor/event-handler.ts extensions/signal/src/monitor/event-handler.inbound-context.test.ts
pnpm test extensions/signal/src/monitor/event-handler.inbound-context.test.ts -- --reporter=verbose
git diff --check
pnpm check:changed

Out of scope

No changes to TTS provider selection, TTS auto-mode policy, or outbound audio generation.
No changes to Signal attachment download limits or Signal access policy.
No broad media MIME policy changes outside the Signal inbound download path.

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/signal/src/monitor.ts (modified, +4/-3)
extensions/signal/src/monitor/event-handler.inbound-context.test.ts (modified, +40/-4)

Code Example

{
    "messages": {
      "tts": {
        "auto": "inbound",
        "provider": "elevenlabs",
        "providers": { "elevenlabs": { "apiKey": "<redacted>", ... } }
      }
    }
  }

---

# Test window: 2026-04-27 17:23-17:25 CDT (22:23-22:25 UTC)
# Filtered for tts/elevenlabs/inbound/audio/attachment/signal — only deliveries appear, zero TTS attempts.

2026-04-27T22:23:32.998Z gateway/channels/signal: delivered reply to +19195248595
2026-04-27T22:24:49.265Z gateway/channels/signal: delivered reply to +19195248595

---

[user] [Audio]
User text:
[Signal Eric Milgram id:+19195248595 +18s Mon 2026-04-27 16:53 CDT] <media:audio>
Transcript:
Beta, if you can receive this message, please report with a status report. List the health of your gateway.

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Configure messages.tts.auto = "inbound" (and a matching prefs override ~/.openclaw/settings/tts.json tts.auto = "inbound") with a working ElevenLabs provider on 2026.4.23.
Restart the gateway. openclaw doctor reports Signal ok.
From a paired Signal DM, send the bot a voice note.
Observe: voice arrives, transcription works (visible to the agent), plain-text reply is delivered, no tts: lines in /tmp/openclaw/openclaw-YYYY-MM-DD.log for that turn.

Expected behavior

messages.tts.auto = "inbound" should produce a voice reply when the inbound is a voice note, per the maintainer's confirmation in #65951 (Telegram analog) that inbound mode means "voice-reply-to-voice." This is also the documented behavior in #26493.

Actual behavior

Inbound voice note is received and transcribed.
Assistant generates a plain-text reply.
Reply is delivered as text only.
Gateway log contains no TTS attempt entries for the turn (no tts, elevenlabs, audioPath, or setLastTtsAttempt lines).
~/.openclaw/media/outbound/ does not get a fresh mp3 from this turn.

OpenClaw version

2026.4.23

Operating system

Linux Mint 21.3 "Virginia" (kernel 6.8.0-110-generic, x86_64)

Install method

npm global via nvm (/home/eric/.nvm/versions/node/v24.6.0/lib/node_modules/openclaw)

Model

google/gemini-2.5-pro

Provider / routing chain

openclaw -> google (gemini-2.5-pro). Provider chain is not implicated; LLM is producing text correctly. The break is between LLM output and TTS gating in the dispatcher.

Additional provider/model setup details

TTS provider: elevenlabs (voiceId SAz9YHcvj6GT2YYXdXww, model eleven_multilingual_v2).

TTS config:

{
  "messages": {
    "tts": {
      "auto": "inbound",
      "provider": "elevenlabs",
      "providers": { "elevenlabs": { "apiKey": "<redacted>", ... } }
    }
  }
}

Prefs file ~/.openclaw/settings/tts.json set to tts.auto = "inbound" to ensure prefs do not override the config.
ElevenLabs is known-working: with auto: "always", the same install successfully generated ~/.openclaw/media/outbound/voice-*.mp3 files in prior runs.

Logs, screenshots, and evidence

# Test window: 2026-04-27 17:23-17:25 CDT (22:23-22:25 UTC)
# Filtered for tts/elevenlabs/inbound/audio/attachment/signal — only deliveries appear, zero TTS attempts.

2026-04-27T22:23:32.998Z gateway/channels/signal: delivered reply to +19195248595
2026-04-27T22:24:49.265Z gateway/channels/signal: delivered reply to +19195248595

The agent session JSONL for the same window shows the inbound user message body did include [Audio] and <media:audio>:

[user] [Audio]
User text:
[Signal Eric Milgram id:+19195248595 +18s Mon 2026-04-27 16:53 CDT] <media:audio>
Transcript:
Beta, if you can receive this message, please report with a status report. List the health of your gateway.

Impact and severity

Affected: All Signal users running messages.tts.auto = "inbound" who expect voice-in → voice-out (the documented behavior).
Severity: Medium — inbound mode is the natural default for a voice-first assistant on Signal. Users either lose voice replies (current state) or are forced to auto: "always" and accept TTS on text replies too (wastes ElevenLabs spend on Google Chat / mixed-channel deployments).
Frequency: Reproducible 100% in our environment over 2 voice-note tests.
Consequence: Voice-first workflows on Signal silently fall back to text, with no error to surface the misconfiguration.

Additional information

This is analogous to the recently-closed:

#65951 — "[Bug]: Telegram DM voice-note replies skip auto TTS on 2026.4.11 when transcript replaces <media:audio> and inboundAudio becomes false" — same mechanism on Telegram, closed by @steipete with "fixed on current main." The fix landed in commit 6a67f6556885d376aca2aa1283b540bf485416c5 (fix(voice): reuse preflight transcripts across channels), which touched bluebubbles, discord, feishu, telegram, whatsapp, and core media-understanding — but did not touch the signal extension. OpenClaw's per-channel architecture (each channel handler builds its own inbound context) means the fix has to be applied per-channel, and Signal hasn't received it yet.
#64328 — "fix(qqbot): set MediaType 'audio' for voice attachments to enable TTS…" — same root cause, same per-channel fix pattern, applied to qqbot.
#62222 — "[codex] fix inbound TTS detection for audio attachments" — earlier general fix.
#26493 — "[Feature]: add runtime support for remaining /tts auto modes (inbound, tagged)" — documents that inbound is intended to be voice-reply-to-voice and is fully implemented in the auto-reply path; only the runtime command surface lacks the toggle.

The Signal-specific entry point is extensions/signal/src/monitor/event-handler.ts, which already passes MediaType / MediaTypes from attachment.contentType (lines 224-228, around the finalizeInboundContext call). However, no Signal counterpart to Telegram's MediaTranscribedIndexes plumbing exists, and we have not been able to confirm whether signal-cli always reports voice attachments with an audio/* contentType — that's a candidate root cause for MediaType arriving as application/octet-stream (the fallback in event-handler.ts:799), which would silently disqualify the inbound from isInboundAudioContext.

A PR will follow that mirrors the cross-channel pattern from 6a67f6556885d376aca2aa1283b540bf485416c5 for the signal extension.

I'm happy to gather further runtime evidence (e.g. instrumenting the dispatcher to log resolved ctx.MediaType for the failing turn) if that would help triage.

extent analysis

TL;DR

The most likely fix for the issue is to update the Signal extension to correctly handle voice note attachments and set the MediaType to audio, similar to the fix applied to other channels in commit 6a67f6556885d376aca2aa1283b540bf485416c5.

Guidance

Investigate the extensions/signal/src/monitor/event-handler.ts file to understand how MediaType is set for voice note attachments and why it might be arriving as application/octet-stream instead of audio/*.
Verify that signal-cli always reports voice attachments with an audio/* contentType to determine if this is a root cause of the issue.
Consider instrumenting the dispatcher to log resolved ctx.MediaType for the failing turn to gather further runtime evidence.
Review the cross-channel pattern from commit 6a67f6556885d376aca2aa1283b540bf485416c5 and apply a similar fix to the signal extension.

Example

No code snippet is provided as the issue requires a deeper understanding of the OpenClaw architecture and the Signal extension.

Notes

The issue is specific to the Signal extension and requires a per-channel fix. The provided information suggests that the root cause is related to how voice note attachments are handled and the MediaType is set.

Recommendation

Apply a workaround by setting messages.tts.auto to "always" until a proper fix is implemented for the Signal extension. This will ensure that voice replies are generated for voice notes, but it may also generate unnecessary TTS for text replies.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #ssr #integration issue #index setup #retrieval issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: Signal voice-note replies skip auto TTS when messages.tts.auto = "inbound" [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #73158: fix(signal): detect generic voice-note attachments

Description (problem / solution / changelog)

Summary

Root cause

Why this is safe

Tests

Out of scope

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING