openclaw - ✅(Solved) Fix [Bug]: Signal voice-note replies skip auto TTS when messages.tts.auto = "inbound" [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73091Fetched 2026-04-28 06:27:37
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Timeline (top)
cross-referenced ×1mentioned ×1subscribed ×1

On OpenClaw 2026.4.23, Signal direct-chat voice notes are processed and transcribed successfully, but messages.tts.auto = "inbound" does not trigger a voice reply; the assistant replies in plain text and the gateway log shows no tts: attempt for that DM reply.

Error Message

  • Consequence: Voice-first workflows on Signal silently fall back to text, with no error to surface the misconfiguration.

Root Cause

  • #65951"[Bug]: Telegram DM voice-note replies skip auto TTS on 2026.4.11 when transcript replaces <media:audio> and inboundAudio becomes false" — same mechanism on Telegram, closed by @steipete with "fixed on current main." The fix landed in commit 6a67f6556885d376aca2aa1283b540bf485416c5 (fix(voice): reuse preflight transcripts across channels), which touched bluebubbles, discord, feishu, telegram, whatsapp, and core media-understanding — but did not touch the signal extension. OpenClaw's per-channel architecture (each channel handler builds its own inbound context) means the fix has to be applied per-channel, and Signal hasn't received it yet.
  • #64328"fix(qqbot): set MediaType 'audio' for voice attachments to enable TTS…" — same root cause, same per-channel fix pattern, applied to qqbot.
  • #62222"[codex] fix inbound TTS detection for audio attachments" — earlier general fix.
  • #26493"[Feature]: add runtime support for remaining /tts auto modes (inbound, tagged)" — documents that inbound is intended to be voice-reply-to-voice and is fully implemented in the auto-reply path; only the runtime command surface lacks the toggle.

Fix Action

Fix / Workaround

openclaw -> google (gemini-2.5-pro). Provider chain is not implicated; LLM is producing text correctly. The break is between LLM output and TTS gating in the dispatcher.

So the agent-side body satisfies AUDIO_HEADER_RE = /^\[Audio\b/i and contains <media:audio>. But isInboundAudioContext() at src/auto-reply/reply/dispatch-from-config.ts:146-180 still returns false at the gating point — neither the MediaType/MediaTypes path nor the body-regex path matches at that call site for this Signal turn.

I'm happy to gather further runtime evidence (e.g. instrumenting the dispatcher to log resolved ctx.MediaType for the failing turn) if that would help triage.

PR fix notes

PR #73158: fix(signal): detect generic voice-note attachments

Description (problem / solution / changelog)

Summary

  • Fixes #73091 by preserving byte-sniffed audio MIME types when Signal downloads voice-note attachments that were declared as application/octet-stream.
  • Keeps verified Signal voice notes marked as audio/* so messages.tts.auto: "inbound" reaches the existing inbound-audio TTS gate.
  • Adds Signal regression coverage for both sniffed generic voice attachments and audio-looking filenames that remain unverified.

Root cause

Signal attachments can arrive with application/octet-stream. The previous Signal download path preserved that generic declared MIME even after fetching the attachment bytes, so the reply dispatcher later computed inboundAudio = false before media understanding could transcribe the audio.

Why this is safe

The fix uses the existing shared detectMime helper on the downloaded bytes and declared header MIME, then passes that verified content type through the existing Signal inbound context. Non-generic MIME values continue through the same shared detector path, and audio-looking filenames no longer upgrade generic attachments to audio/* by themselves.

Security/runtime controls are unchanged: Signal allowlist/DM policy, mention gating, attachment fetch limits, media storage, media understanding, and the TTS auto: "inbound" runtime gate all remain enforced by the existing runtime paths.

Tests

  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md extensions/signal/src/monitor.ts extensions/signal/src/monitor/event-handler.ts extensions/signal/src/monitor/event-handler.inbound-context.test.ts
  • pnpm test extensions/signal/src/monitor/event-handler.inbound-context.test.ts -- --reporter=verbose
  • git diff --check
  • pnpm check:changed

Out of scope

  • No changes to TTS provider selection, TTS auto-mode policy, or outbound audio generation.
  • No changes to Signal attachment download limits or Signal access policy.
  • No broad media MIME policy changes outside the Signal inbound download path.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/signal/src/monitor.ts (modified, +4/-3)
  • extensions/signal/src/monitor/event-handler.inbound-context.test.ts (modified, +40/-4)

Code Example

{
    "messages": {
      "tts": {
        "auto": "inbound",
        "provider": "elevenlabs",
        "providers": { "elevenlabs": { "apiKey": "<redacted>", ... } }
      }
    }
  }

---

# Test window: 2026-04-27 17:23-17:25 CDT (22:23-22:25 UTC)
# Filtered for tts/elevenlabs/inbound/audio/attachment/signal — only deliveries appear, zero TTS attempts.

2026-04-27T22:23:32.998Z gateway/channels/signal: delivered reply to +19195248595
2026-04-27T22:24:49.265Z gateway/channels/signal: delivered reply to +19195248595

---

[user] [Audio]
User text:
[Signal Eric Milgram id:+19195248595 +18s Mon 2026-04-27 16:53 CDT] <media:audio>
Transcript:
Beta, if you can receive this message, please report with a status report. List the health of your gateway.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

On OpenClaw 2026.4.23, Signal direct-chat voice notes are processed and transcribed successfully, but messages.tts.auto = "inbound" does not trigger a voice reply; the assistant replies in plain text and the gateway log shows no tts: attempt for that DM reply.

Steps to reproduce

  1. Configure messages.tts.auto = "inbound" (and a matching prefs override ~/.openclaw/settings/tts.json tts.auto = "inbound") with a working ElevenLabs provider on 2026.4.23.
  2. Restart the gateway. openclaw doctor reports Signal ok.
  3. From a paired Signal DM, send the bot a voice note.
  4. Observe: voice arrives, transcription works (visible to the agent), plain-text reply is delivered, no tts: lines in /tmp/openclaw/openclaw-YYYY-MM-DD.log for that turn.

Expected behavior

messages.tts.auto = "inbound" should produce a voice reply when the inbound is a voice note, per the maintainer's confirmation in #65951 (Telegram analog) that inbound mode means "voice-reply-to-voice." This is also the documented behavior in #26493.

Actual behavior

  • Inbound voice note is received and transcribed.
  • Assistant generates a plain-text reply.
  • Reply is delivered as text only.
  • Gateway log contains no TTS attempt entries for the turn (no tts, elevenlabs, audioPath, or setLastTtsAttempt lines).
  • ~/.openclaw/media/outbound/ does not get a fresh mp3 from this turn.

OpenClaw version

2026.4.23

Operating system

Linux Mint 21.3 "Virginia" (kernel 6.8.0-110-generic, x86_64)

Install method

npm global via nvm (/home/eric/.nvm/versions/node/v24.6.0/lib/node_modules/openclaw)

Model

google/gemini-2.5-pro

Provider / routing chain

openclaw -> google (gemini-2.5-pro). Provider chain is not implicated; LLM is producing text correctly. The break is between LLM output and TTS gating in the dispatcher.

Additional provider/model setup details

  • TTS provider: elevenlabs (voiceId SAz9YHcvj6GT2YYXdXww, model eleven_multilingual_v2).
  • TTS config:
    {
      "messages": {
        "tts": {
          "auto": "inbound",
          "provider": "elevenlabs",
          "providers": { "elevenlabs": { "apiKey": "<redacted>", ... } }
        }
      }
    }
  • Prefs file ~/.openclaw/settings/tts.json set to tts.auto = "inbound" to ensure prefs do not override the config.
  • ElevenLabs is known-working: with auto: "always", the same install successfully generated ~/.openclaw/media/outbound/voice-*.mp3 files in prior runs.

Logs, screenshots, and evidence

# Test window: 2026-04-27 17:23-17:25 CDT (22:23-22:25 UTC)
# Filtered for tts/elevenlabs/inbound/audio/attachment/signal — only deliveries appear, zero TTS attempts.

2026-04-27T22:23:32.998Z gateway/channels/signal: delivered reply to +19195248595
2026-04-27T22:24:49.265Z gateway/channels/signal: delivered reply to +19195248595

The agent session JSONL for the same window shows the inbound user message body did include [Audio] and <media:audio>:

[user] [Audio]
User text:
[Signal Eric Milgram id:+19195248595 +18s Mon 2026-04-27 16:53 CDT] <media:audio>
Transcript:
Beta, if you can receive this message, please report with a status report. List the health of your gateway.

So the agent-side body satisfies AUDIO_HEADER_RE = /^\[Audio\b/i and contains <media:audio>. But isInboundAudioContext() at src/auto-reply/reply/dispatch-from-config.ts:146-180 still returns false at the gating point — neither the MediaType/MediaTypes path nor the body-regex path matches at that call site for this Signal turn.

Impact and severity

  • Affected: All Signal users running messages.tts.auto = "inbound" who expect voice-in → voice-out (the documented behavior).
  • Severity: Medium — inbound mode is the natural default for a voice-first assistant on Signal. Users either lose voice replies (current state) or are forced to auto: "always" and accept TTS on text replies too (wastes ElevenLabs spend on Google Chat / mixed-channel deployments).
  • Frequency: Reproducible 100% in our environment over 2 voice-note tests.
  • Consequence: Voice-first workflows on Signal silently fall back to text, with no error to surface the misconfiguration.

Additional information

This is analogous to the recently-closed:

  • #65951"[Bug]: Telegram DM voice-note replies skip auto TTS on 2026.4.11 when transcript replaces <media:audio> and inboundAudio becomes false" — same mechanism on Telegram, closed by @steipete with "fixed on current main." The fix landed in commit 6a67f6556885d376aca2aa1283b540bf485416c5 (fix(voice): reuse preflight transcripts across channels), which touched bluebubbles, discord, feishu, telegram, whatsapp, and core media-understanding — but did not touch the signal extension. OpenClaw's per-channel architecture (each channel handler builds its own inbound context) means the fix has to be applied per-channel, and Signal hasn't received it yet.
  • #64328"fix(qqbot): set MediaType 'audio' for voice attachments to enable TTS…" — same root cause, same per-channel fix pattern, applied to qqbot.
  • #62222"[codex] fix inbound TTS detection for audio attachments" — earlier general fix.
  • #26493"[Feature]: add runtime support for remaining /tts auto modes (inbound, tagged)" — documents that inbound is intended to be voice-reply-to-voice and is fully implemented in the auto-reply path; only the runtime command surface lacks the toggle.

The Signal-specific entry point is extensions/signal/src/monitor/event-handler.ts, which already passes MediaType / MediaTypes from attachment.contentType (lines 224-228, around the finalizeInboundContext call). However, no Signal counterpart to Telegram's MediaTranscribedIndexes plumbing exists, and we have not been able to confirm whether signal-cli always reports voice attachments with an audio/* contentType — that's a candidate root cause for MediaType arriving as application/octet-stream (the fallback in event-handler.ts:799), which would silently disqualify the inbound from isInboundAudioContext.

A PR will follow that mirrors the cross-channel pattern from 6a67f6556885d376aca2aa1283b540bf485416c5 for the signal extension.

I'm happy to gather further runtime evidence (e.g. instrumenting the dispatcher to log resolved ctx.MediaType for the failing turn) if that would help triage.

extent analysis

TL;DR

The most likely fix for the issue is to update the Signal extension to correctly handle voice note attachments and set the MediaType to audio, similar to the fix applied to other channels in commit 6a67f6556885d376aca2aa1283b540bf485416c5.

Guidance

  • Investigate the extensions/signal/src/monitor/event-handler.ts file to understand how MediaType is set for voice note attachments and why it might be arriving as application/octet-stream instead of audio/*.
  • Verify that signal-cli always reports voice attachments with an audio/* contentType to determine if this is a root cause of the issue.
  • Consider instrumenting the dispatcher to log resolved ctx.MediaType for the failing turn to gather further runtime evidence.
  • Review the cross-channel pattern from commit 6a67f6556885d376aca2aa1283b540bf485416c5 and apply a similar fix to the signal extension.

Example

No code snippet is provided as the issue requires a deeper understanding of the OpenClaw architecture and the Signal extension.

Notes

The issue is specific to the Signal extension and requires a per-channel fix. The provided information suggests that the root cause is related to how voice note attachments are handled and the MediaType is set.

Recommendation

Apply a workaround by setting messages.tts.auto to "always" until a proper fix is implemented for the Signal extension. This will ensure that voice replies are generated for voice notes, but it may also generate unnecessary TTS for text replies.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

messages.tts.auto = "inbound" should produce a voice reply when the inbound is a voice note, per the maintainer's confirmation in #65951 (Telegram analog) that inbound mode means "voice-reply-to-voice." This is also the documented behavior in #26493.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Signal voice-note replies skip auto TTS when messages.tts.auto = "inbound" [1 pull requests, 1 participants]