openclaw - ✅(Solved) Fix BlueBubbles native iOS voice-memo delivery broken end-to-end with ElevenLabs (and other non-Azure TTS providers) [3 pull requests, 3 comments, 2 participants]

omarshahine · 2026-04-27T02:09:22Z

[openclaw] Sending TTS audio to a BlueBubbles iMessage chat using the bundled tts agent tool or tts.convert RPC currently always renders as a plain audio attac… Sending TTS audio to a BlueBubbles iMessage chat using the bundled `tts` agent tool (or `tts.convert` RPC) currently always renders as a **plain audio attachment** in iMessage, never as a **native iOS voice memo** (the bubble with the waveform / scrubber UI). Two distinct upstream gaps in the same pipeline are conspiring to make this delivery mode unreachable for any non-Azure TTS provider, even though every individual link in the chain otherwise works. # PR #72564: fix(tts): pick file extension from output format and expose target - Repository: openclaw/openclaw - Author: volcano303 - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/72564 ## Description (problem / solution / changelog) Fixes #72506 ## Problem Two gaps in the TTS-to-BlueBubbles voice-memo pipeline: 1. **ElevenLabs file extension didn't match the resolved output format.** The provider hardcoded `fileExtension: req.target === "voice-note" ? ".opus" : ".mp3"`, so when a caller (or the channel mapping) overrode `outputFormat` — e.g. requesting `mp3_44100_128` for BlueBubbles voice memos, which reject opus — the audio bytes were mp3 but the file landed as `.opus`, and BlueBubbles refused it as a voice memo. 2. **No way to request a synthesis target from the agent or RPC.** Both the bundled `tts` agent tool and the `tts.convert` gateway RPC only accepted `channel`; there was no parameter to override the channel-derived `audio-file` ↔ `voice-note` decision. ## Fix - Add `deriveElevenLabsFileExtension(outputFormat)` and use it in the ElevenLabs `synthesize()` return so the on-disk extension follows the actual codec (`mp3_*` → `.mp3`, `opus_*` → `.opus`, `flac_*` → `.flac`, `pcm_*`/`ulaw_*` → `.wav`, unknown → `.mp3`). - Add an optional `target?: TtsSpeechTarget` to `TtsRequestParams` and thread it through `synthesizeSpeech` / `textToSpeech`. When omitted, the channel-derived default is unchanged. - Expose `target` on the `tts` agent tool schema (with a `ToolInputError` for invalid values) and on the `tts.convert` RPC handler (with an `INVALID_REQUEST` response for invalid values). No behavior change when callers don't set `target` and outputs use the default codec — both paths continue producing the same bytes and extension. ## Test plan - [x] `pnpm exec vitest run src/agents/tools/tts-tool.test.ts` — 14/14 (3 new: target forwarded, invalid target rejected, optional target preserves channel-derived default) - [x] New `deriveElevenLabsFileExtension` cases in `extensions/elevenlabs/speech-provider.test.ts` (mp3/opus/flac/pcm/ulaw + uppercase + unknown-codec fallback) - [x] `pnpm tsgo:core:test` — clean - [x] `pnpm lint` — clean - [x] `pnpm check:architecture` — 0 cycles - [x] Manual: BlueBubbles agent + ElevenLabs TTS, confirm voice-memo waveform UI on iOS ## Changed files - `extensions/elevenlabs/speech-provider.test.ts` (modified, +25/-1) - `extensions/elevenlabs/speech-provider.ts` (modified, +24/-1) - `extensions/speech-core/src/tts.ts` (modified, +6/-1) - `src/agents/tools/tts-tool.test.ts` (modified, +45/-0) - `src/agents/tools/tts-tool.ts` (modified, +19/-0) - `src/gateway/server-methods/tts.ts` (modified, +17/-0) - `src/plugin-sdk/tts-runtime.types.ts` (modified, +5/-0) --- # PR #72586: fix(tts): pre-transcode synthesized audio to channel-preferred container before voice-memo delivery - Repository: openclaw/openclaw - Author: omarshahine - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/72586 ## Description (problem / solution / changelog) ## Summary Fixes #72506. After end-to-end testing on a real macOS + BlueBubbles + ElevenLabs stack, voice-memo replies from agents now render as **native iMessage voice-memo bubbles** (waveform UI, real duration) instead of plain file attachments. The fix is a small, opt-in channel capability (`tts.voice.preferAudioFileFormat`) plus a macOS `afconvert`-backed pre-transcode in the speech-core pipeline. BlueBubbles declares `preferAudioFileFormat: "caf"` and the speech-core layer transcodes synthesized MP3 to opus-in-CAF before handing the file to the channel. Other channels are unaffected. ## Diagnostic journey The discovery process iterated through three CAF flavors. The descriptor block matters at every hop along `OpenClaw → BlueBubbles server → Messages.app private API → iMessage`: | Pre-encoded CAF flavor | BlueBubbles' internal CAF→MP3 conversion | iMessage rendering | |---------------------------|------------------------------------------|-----------------------------------| | (no fix; MP3 + isAudioMessage) | Renames to .caf, conversion fails (race) | Plain audio attachment | | PCM int16 @ 44.1 kHz | Conversion fails | Voice-memo bubble, **0 s** duration | | AAC @ 22.05 kHz mono | Conversion succeeds → **silent downgrade** | Plain audio attachment | | **Opus @ 24 kHz mono** | n/a — pas

openclaw2026-04-27 02:09:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72506•Fetched 2026-04-28 06:35:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

referenced ×6cross-referenced ×4commented ×3assigned ×1

Sending TTS audio to a BlueBubbles iMessage chat using the bundled tts agent tool (or tts.convert RPC) currently always renders as a plain audio attachment in iMessage, never as a native iOS voice memo (the bubble with the waveform / scrubber UI). Two distinct upstream gaps in the same pipeline are conspiring to make this delivery mode unreachable for any non-Azure TTS provider, even though every individual link in the chain otherwise works.

Error Message

if (!voiceInfo.isAudio) { throw new Error("BlueBubbles voice messages require audio media (mp3 or caf)."); } else { throw new Error("BlueBubbles voice messages require mp3 or caf audio (convert before sending)."); }

Root Cause

So overriding outputFormat: \"mp3_44100_128\" to coax MP3 out doesn't fix it either, because fileExtension is hardcoded to .opus whenever target === \"voice-note\" regardless of the actual format. BlueBubbles would receive .opus filename + MP3 bytes → voiceInfo.isMp3 derived from filename would be false.

Fix Action

Fix / Workaround

{
  messages: {
    tts: {
      provider: \"elevenlabs\",
      providers: {
        elevenlabs: {
          apiKey: \"<literal sk_… key (workaround for #72496)>\",
          voiceId: \"<voice-id>\",
          model: \"eleven_v3\",
          outputFormat: \"mp3_44100_128\"
        }
      }
    }
  }
}

PR fix notes

PR #72564: fix(tts): pick file extension from output format and expose target

Repository: openclaw/openclaw
Author: volcano303
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/72564

Description (problem / solution / changelog)

Fixes #72506

Problem

Two gaps in the TTS-to-BlueBubbles voice-memo pipeline:

ElevenLabs file extension didn't match the resolved output format. The provider hardcoded fileExtension: req.target === "voice-note" ? ".opus" : ".mp3", so when a caller (or the channel mapping) overrode outputFormat — e.g. requesting mp3_44100_128 for BlueBubbles voice memos, which reject opus — the audio bytes were mp3 but the file landed as .opus, and BlueBubbles refused it as a voice memo.
No way to request a synthesis target from the agent or RPC. Both the bundled tts agent tool and the tts.convert gateway RPC only accepted channel; there was no parameter to override the channel-derived audio-file ↔ voice-note decision.

Fix

Add deriveElevenLabsFileExtension(outputFormat) and use it in the ElevenLabs synthesize() return so the on-disk extension follows the actual codec (mp3_* → .mp3, opus_* → .opus, flac_* → .flac, pcm_*/ulaw_* → .wav, unknown → .mp3).
Add an optional target?: TtsSpeechTarget to TtsRequestParams and thread it through synthesizeSpeech / textToSpeech. When omitted, the channel-derived default is unchanged.
Expose target on the tts agent tool schema (with a ToolInputError for invalid values) and on the tts.convert RPC handler (with an INVALID_REQUEST response for invalid values).

No behavior change when callers don't set target and outputs use the default codec — both paths continue producing the same bytes and extension.

Test plan

pnpm exec vitest run src/agents/tools/tts-tool.test.ts — 14/14 (3 new: target forwarded, invalid target rejected, optional target preserves channel-derived default)
New deriveElevenLabsFileExtension cases in extensions/elevenlabs/speech-provider.test.ts (mp3/opus/flac/pcm/ulaw + uppercase + unknown-codec fallback)
pnpm tsgo:core:test — clean
pnpm lint — clean
pnpm check:architecture — 0 cycles
Manual: BlueBubbles agent + ElevenLabs TTS, confirm voice-memo waveform UI on iOS

Changed files

extensions/elevenlabs/speech-provider.test.ts (modified, +25/-1)
extensions/elevenlabs/speech-provider.ts (modified, +24/-1)
extensions/speech-core/src/tts.ts (modified, +6/-1)
src/agents/tools/tts-tool.test.ts (modified, +45/-0)
src/agents/tools/tts-tool.ts (modified, +19/-0)
src/gateway/server-methods/tts.ts (modified, +17/-0)
src/plugin-sdk/tts-runtime.types.ts (modified, +5/-0)

PR #72586: fix(tts): pre-transcode synthesized audio to channel-preferred container before voice-memo delivery

Repository: openclaw/openclaw
Author: omarshahine
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/72586

Description (problem / solution / changelog)

Summary

Fixes #72506. After end-to-end testing on a real macOS + BlueBubbles + ElevenLabs stack, voice-memo replies from agents now render as native iMessage voice-memo bubbles (waveform UI, real duration) instead of plain file attachments.

The fix is a small, opt-in channel capability (tts.voice.preferAudioFileFormat) plus a macOS afconvert-backed pre-transcode in the speech-core pipeline. BlueBubbles declares preferAudioFileFormat: "caf" and the speech-core layer transcodes synthesized MP3 to opus-in-CAF before handing the file to the channel. Other channels are unaffected.

Diagnostic journey

The discovery process iterated through three CAF flavors. The descriptor block matters at every hop along OpenClaw → BlueBubbles server → Messages.app private API → iMessage:

Pre-encoded CAF flavor	BlueBubbles' internal CAF→MP3 conversion	iMessage rendering
(no fix; MP3 + isAudioMessage)	Renames to .caf, conversion fails (race)	Plain audio attachment
PCM int16 @ 44.1 kHz	Conversion fails	Voice-memo bubble, 0 s duration
AAC @ 22.05 kHz mono	Conversion succeeds → silent downgrade	Plain audio attachment
Opus @ 24 kHz mono	n/a — passes through	Native voice memo, real duration + waveform

What unlocked it was inspecting an Apple-recorded voice memo (a native iMessage Audio Message.caf Apple's Messages.app produces when the user holds the mic button). The descriptor is exactly 1 ch, 24000 Hz, opus, 480 frames/packet, and afconvert -f caff -d opus@24000 -c 1 produces a byte-identical container. iMessage uses that descriptor block as its native voice-memo recognizer; anything else gets downgraded somewhere along the path.

The AAC row in particular was the surprising one: BlueBubbles' internal CAF→MP3 conversion succeeded against AAC-CAF, and BlueBubbles' code path then sent the converted MP3 as audio/mp3 instead of forwarding the original CAF, silently downgrading from voice-memo bubble to plain attachment. PCM-CAF tripped the same conversion logic in the failure direction, which (counter-intuitively) made BlueBubbles fall back to forwarding the CAF — getting most of the way to a voice memo, except iMessage couldn't compute a duration from raw-PCM CAF, so the bubble showed 0 s.

A second, independent gap surfaced along the way: OpenClaw's auto-reply host-local-media validator uses the bundled file-type library to verify outbound buffers, and file-type v22 has no native CAF detector. Without the magic-byte fallback below, the validator drops the pre-transcoded buffer as an unknown binary blob and the agent ends up sending "⚠️ Media failed." instead of the voice memo. Adding a four-byte caff magic sniff in src/media/mime.ts returns audio/x-caf, which the validator already classifies as audio.

Pipeline pieces

src/channels/plugins/types.core.ts — extend ChannelTtsVoiceDeliveryCapabilities with optional preferAudioFileFormat?: string. Doc comment explains the intent.
extensions/speech-core/src/audio-transcode.ts (new) — transcodeAudioBuffer helper. macOS-only afconvert path; quietly returns undefined on any unsupported pair, missing platform, or process failure. Ships the MP3→CAF recipe used by BlueBubbles voice memos (-f caff -d opus@24000 -c 1) and a CAF→m4a fallback for symmetry with what BlueBubbles itself attempts.
extensions/speech-core/src/tts.ts — call the helper between synthesis and file-write inside textToSpeech. When transcoded, swap audioBuffer / fileExtension / outputFormat and use the new values for both the on-disk path and the shouldDeliverTtsAsVoice decision so the resulting audioAsVoice flag reflects the actual file shape that lands on the channel.
extensions/bluebubbles/src/channel-shared.ts — declare preferAudioFileFormat: "caf" on BlueBubbles capabilities, with a comment pointing at the Messages.app voice-memo descriptor so future readers know what the format choice protects.
src/media/mime.ts — add audio/x-caf → .caf to EXT_BY_MIME, plus a small caff-magic-bytes fallback in sniffMime so host-local validators recognize CAF as audio when file-type doesn't.
Tests:
- extensions/speech-core/src/audio-transcode.test.ts (new) — covers the no-op cases (matching extensions, unsupported recipe, empty source) and platform-portable assertion that off-Darwin always returns undefined without invoking the binary.
- src/media/mime.test.ts — adds two regression cases for the CAF magic-byte sniff (with and without a corroborating filename).

Behavior matrix

Host platform	Channel `preferAudioFileFormat`	Source format	Result
macOS	`caf`	mp3	Pre-transcoded to opus-in-CAF; uploaded with `isAudioMessage=true`; renders as native voice-memo bubble in iMessage
macOS	unset (other channels)	any	Unchanged behavior
Linux/Windows	`caf`	mp3	`transcodeAudioBuffer` returns `undefined`; original MP3 buffer preserved (BlueBubbles is macOS-only anyway)
any	matches source already	any	Helper returns `undefined`; no extra work
any	recipe not implemented	any	Helper returns `undefined`; original buffer preserved

Tests

pnpm exec vitest run src/media/mime.test.ts extensions/speech-core/src/audio-transcode.test.ts — 63/63 pass (includes existing tests; new cases for CAF sniff + transcode no-op paths).
pnpm exec tsc --noEmit -p tsconfig.json clean.
End-to-end manual on macOS Apple Silicon + BlueBubbles + ElevenLabs: [[tts:...]] directive in agent reply → native iMessage voice-memo bubble with real duration and waveform.

Test plan

Unit tests pass on macOS Apple Silicon
TypeScript checks pass
E2E: real device renders the result as a native voice-memo bubble
Reviewer with a BlueBubbles + macOS setup: send any TTS-tagged reply through any agent and confirm voice-memo bubble UI
Reviewer on Linux: confirm non-Darwin path returns the unchanged MP3 buffer (no regression for other channels)
Reviewer with Discord/Slack/Telegram TTS: confirm those channels continue to receive their existing format (no preferAudioFileFormat declared, no pre-transcode)

🤖 Generated with Claude Code

Changed files

CHANGELOG.md (modified, +4/-0)
extensions/bluebubbles/src/channel-shared.ts (modified, +7/-0)
extensions/speech-core/src/audio-transcode.test.ts (added, +64/-0)
extensions/speech-core/src/audio-transcode.ts (added, +134/-0)
extensions/speech-core/src/tts.test.ts (modified, +79/-0)
extensions/speech-core/src/tts.ts (modified, +67/-5)
src/channels/plugins/types.core.ts (modified, +19/-0)
src/media/mime.test.ts (modified, +17/-0)
src/media/mime.ts (modified, +18/-2)

PR #73111: fix(gateway): strip SecretRef apiKey from messages.tts.providers before talk.config hands it to speech providers

Repository: openclaw/openclaw
Author: omarshahine
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/73111

Description (problem / solution / changelog)

Summary

Closes the gap left by #72496 on the parallel messages.tts.providers.<id>.apiKey site. After #72496 landed, talk.config still throws unresolved SecretRef whenever an operator pins their TTS apiKey as a SecretRef on the messages.tts side — visible in production with the exact same user-facing symptom #72496 was filed for (iOS / macOS / Control UI Talk overlays silently fall back to local AVSpeechSynthesizer because the discovery handshake errors out).

Reproduction on a build that contains 8ce4f8fc84:

$ openclaw gateway call talk.config
Gateway call failed: GatewayClientRequestError: Error: messages.tts.providers.elevenlabs.apiKey: unresolved SecretRef \"file:secrets:/skills/elevenlabs\". Resolve this command against an active gateway runtime snapshot before reading it.

The throwing call site is the strict-resolver normalizeResolvedSecretInputString invocation inside (e.g.) extensions/elevenlabs/speech-provider.ts::normalizeElevenLabsProviderConfig, which reads raw?.apiKey straight off baseTtsConfig.providers.elevenlabs and calls the strict normalizer on it — exactly the same shape as the bug #72496 fixed for talk.providers.

Detailed write-up at #73109.

Fix

Mirror #72496's approach. Add stripUnresolvedSecretApiKeysFromBaseTtsProviders in src/gateway/server-methods/talk.ts, walking each entry of baseTtsConfig.providers and applying the existing stripUnresolvedSecretApiKey helper. Apply at the resolveTalkResponseFromConfig call site so the base TTS config handed down to speechProvider.resolveTalkConfig({ baseTtsConfig }) no longer carries unresolved SecretRef wrappers on apiKey.

The strip is conservative — it only mutates when at least one provider entry's apiKey was a non-string, non-undefined value (i.e. a SecretRef-shaped object). All other entries pass through unchanged, including ones that already carry resolved string keys.

Files

src/gateway/server-methods/talk.ts — new stripUnresolvedSecretApiKeysFromBaseTtsProviders(base) helper plus the call at the existing resolveTalkResponseFromConfig site (line 376), upstream of speechProvider.resolveTalkConfig({ baseTtsConfig }). Doc comment cross-links #72496 so the relationship between the two patches is visible at the seam.
src/gateway/server.talk-config.test.ts — new it(\"does not throw when SecretRef apiKey on messages.tts.providers flows through a strict provider resolver\", ...) regression. Mirrors #72496's strict-resolver fixture but configures the SecretRef on messages.tts.providers.<id>.apiKey instead of talk.providers.<id>.apiKey. Verified the test fails on the parent commit (the production-reported error) and passes on this branch.

Test plan

pnpm exec vitest run src/gateway/server.talk-config.test.ts — 10/10 pass on this branch (including the new regression)
Confirmed the new test fails on origin/main with the exact production-reported unresolved SecretRef \"file:secrets:/skills/elevenlabs\" error
Manual: a gateway running this build with messages.tts.providers.elevenlabs.apiKey: { source, provider, id } SecretRef returns a clean talk.config response (provider: \"elevenlabs\", full resolved config, redacted apiKey for read-scope callers)
Reviewer: verify the strip pattern doesn't accidentally mutate plain-string apiKeys (the helper short-circuits when mutated === false, returning the original base reference)
Reviewer: confirm there are no other strict resolvers reading apiKey out of baseTtsConfig.providers via a different shape that would need an additional strip pass

#72496 — fixed the talk.providers side; this PR closes the parallel messages.tts.providers gap.
#72506 — separate BlueBubbles voice-memo issue; not affected by either patch.

🤖 Generated with Claude Code

Changed files

CHANGELOG.md (modified, +1/-0)
src/gateway/server-methods/talk.ts (modified, +55/-1)
src/gateway/server.talk-config.test.ts (modified, +167/-0)

Code Example

const outputFormat =
  trimToUndefined(overrides.outputFormat) ??
  (req.target === \"voice-note\" ? \"opus_48000_64\" : \"mp3_44100_128\");
// ...
fileExtension: req.target === \"voice-note\" ? \".opus\" : \".mp3\",
voiceCompatible: req.target === \"voice-note\",

---

if (isAudioMessage) {
  const voiceInfo = resolveVoiceInfo(filename, contentType);
  if (!voiceInfo.isAudio) { throw new Error(\"BlueBubbles voice messages require audio media (mp3 or caf).\"); }
  if (voiceInfo.isMp3) { /* ok */ }
  else if (voiceInfo.isCaf) { /* ok */ }
  else { throw new Error(\"BlueBubbles voice messages require mp3 or caf audio (convert before sending).\"); }
}

---

{
  messages: {
    tts: {
      provider: \"elevenlabs\",
      providers: {
        elevenlabs: {
          apiKey: \"<literal sk_… key (workaround for #72496)>\",
          voiceId: \"<voice-id>\",
          model: \"eleven_v3\",
          outputFormat: \"mp3_44100_128\"
        }
      }
    }
  }
}

---

openclaw gateway call tts.convert --params '{\"text\":\"hi\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\",\"target\":\"voice-note\"}'
openclaw gateway call tts.convert --params '{\"text\":\"[[audio_as_voice]] hi\",\"channel\":\"bluebubbles\"}'

RAW_BUFFERClick to expand / collapse

Summary

Pipeline (what should happen)

For native voice-memo rendering, the chain must complete:

TTS provider returns voiceCompatible: true for the synthesized clip.
The bundled tts agent tool sets details.media.audioAsVoice = true based on result.audioAsVoice || result.voiceCompatible (src/agents/tools/tts-tool.ts:97).
The reply-delivery layer propagates audioAsVoice through to the BlueBubbles channel monitor (extensions/bluebubbles/src/monitor-processing.ts:1689 reads payload.audioAsVoice === true into asVoice).
extensions/bluebubbles/src/attachments.ts:134-188 flips wantsVoice = true and adds the isAudioMessage=true form field on the upload.
The BlueBubbles server converts MP3 → CAF and posts via the private API as a native iMessage voice memo.

Where the chain breaks

Gap 1 — `target=voice-note` is never set when delivering TTS to BlueBubbles

extensions/elevenlabs/speech-provider.ts:514 only marks voiceCompatible: true when req.target === \"voice-note\". But there's no path that sets target = \"voice-note\" automatically based on the destination channel:

tts.convert RPC handler (src/gateway/server-methods/tts.ts:92-144) does not accept a target param. It calls textToSpeech({ text, cfg, channel, overrides, disableFallback }) — the channel is forwarded, but I cannot find any branch in the runtime that maps channel === \"bluebubbles\" → target = \"voice-note\".
The bundled tts agent tool (src/agents/tools/tts-tool.ts) likewise has no target param in TtsToolSchema and does not set one explicitly.
Adding [[audio_as_voice]] to the input text (or passing \"target\": \"voice-note\" directly to tts.convert) does not cause the synthesis to flip — voiceCompatible stays false (verified on v2026.4.24, see repro below).

Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus

Even if Gap 1 were closed, extensions/elevenlabs/speech-provider.ts:469-513 defaults to opus_48000_64 with file extension .opus whenever req.target === \"voice-note\":

const outputFormat =
  trimToUndefined(overrides.outputFormat) ??
  (req.target === \"voice-note\" ? \"opus_48000_64\" : \"mp3_44100_128\");
// ...
fileExtension: req.target === \"voice-note\" ? \".opus\" : \".mp3\",
voiceCompatible: req.target === \"voice-note\",

But extensions/bluebubbles/src/attachments.ts:170-188 requires MP3 or CAF for isAudioMessage=true and explicitly rejects opus:

if (isAudioMessage) {
  const voiceInfo = resolveVoiceInfo(filename, contentType);
  if (!voiceInfo.isAudio) { throw new Error(\"BlueBubbles voice messages require audio media (mp3 or caf).\"); }
  if (voiceInfo.isMp3) { /* ok */ }
  else if (voiceInfo.isCaf) { /* ok */ }
  else { throw new Error(\"BlueBubbles voice messages require mp3 or caf audio (convert before sending).\"); }
}

Net effect

There is no provider+channel combination today (other than possibly Azure Speech, which has explicit voiceNoteOutputFormat config) that can produce a TTS clip BlueBubbles will accept as a native voice memo. The isAudioMessage/asVoice plumbing on the BlueBubbles side is fully wired and works (extensions/bluebubbles/src/actions.ts:448 accepts an explicit asVoice param on direct attachment posts) — but the agent-facing surfaces (tts tool, tts.convert, auto-reply delivery) cannot reach it for synthesized speech.

Reproduction

Environment:

OpenClaw v2026.4.24 (file-backed secrets, macOS LaunchAgent, BlueBubbles bundled channel)
BlueBubbles server with private API enabled (verified separately — asVoice works for non-TTS attachments via bluebubbles_send_attachment with asVoice: true)

Config:

{
  messages: {
    tts: {
      provider: \"elevenlabs\",
      providers: {
        elevenlabs: {
          apiKey: \"<literal sk_… key (workaround for #72496)>\",
          voiceId: \"<voice-id>\",
          model: \"eleven_v3\",
          outputFormat: \"mp3_44100_128\"
        }
      }
    }
  }
}

Tests (all return voiceCompatible: false):

openclaw gateway call tts.convert --params '{\"text\":\"hi\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\"}'
openclaw gateway call tts.convert --params '{\"text\":\"hi\",\"channel\":\"bluebubbles\",\"target\":\"voice-note\"}'
openclaw gateway call tts.convert --params '{\"text\":\"[[audio_as_voice]] hi\",\"channel\":\"bluebubbles\"}'

Same result via the agent-facing tts tool: BlueBubbles delivery shows provider: \"elevenlabs\" in the tool result details, no audioAsVoice flag in details.media, BlueBubbles renders a generic audio attachment instead of a native voice memo.

Suggested fix

Two complementary changes that together unblock the pipeline:

Auto-target voice-note for voice-capable channels (or expose target on the agent surface). When textToSpeech({ channel }) is called with a channel whose downstream supports voice-memo rendering (BlueBubbles, WhatsApp, Telegram voice notes, etc.), set target = \"voice-note\" by default. Alternatively/additionally, expose target as a parameter on tts.convert and the bundled tts agent tool's input schema so callers can opt in explicitly. Also consider honoring [[audio_as_voice]] reply directives at the synthesis stage (today they only affect downstream delivery).
Honor outputFormat override for voice-note in ElevenLabs (and friends), and align fileExtension. In extensions/elevenlabs/speech-provider.ts:469-513, derive fileExtension from the resolved outputFormat rather than hardcoding .opus for voice-note. That lets users pin outputFormat: \"mp3_44100_128\" and have ElevenLabs return MP3 with .mp3 extension while still marking voiceCompatible: true. (Optional: add a sibling voiceNoteOutputFormat config field matching the Azure provider's pattern, for symmetry.)

Both changes are relatively contained. Either one alone is insufficient — closing Gap 1 only routes us into the opus-rejection trap; closing Gap 2 only is unreachable without Gap 1.

#72496 — same bug family for talk.config SecretRef redaction, also blocking iOS/macOS Talk Mode end-to-end.
#68690 — umbrella SecretRef coverage gaps; explicitly lists messages.tts.providers.<id>.apiKey siblings as broken (compounds this issue when secrets are stored as SecretRefs).

No PII

All voice IDs, key material, file paths, and account-specific identifiers are placeholders. Reproduces on a clean LaunchAgent install with any ElevenLabs voice and a BlueBubbles server with the private API enabled.

extent analysis

TL;DR

To fix the issue of TTS audio not rendering as native iOS voice memos in BlueBubbles iMessage chats, two changes are needed: auto-targeting voice-note for voice-capable channels and honoring output format overrides for voice-note in ElevenLabs.

Guidance

Modify the textToSpeech function: When called with a channel that supports voice-memo rendering, set target = "voice-note" by default to ensure voiceCompatible: true is returned.
Expose target as a parameter: Add target as a parameter on tts.convert and the bundled tts agent tool's input schema, allowing callers to opt-in explicitly.
Update ElevenLabs speech provider: Derive fileExtension from the resolved outputFormat instead of hardcoding .opus for voice-note, enabling users to override outputFormat and receive MP3 with a .mp3 extension.
Verify the fix: Test the changes using the provided reproduction steps and ensure that TTS audio is rendered as native voice memos in BlueBubbles iMessage chats.

Example

No code snippet is provided as the issue requires changes to the existing codebase, and the exact implementation details are not specified.

Notes

The suggested fix requires modifications to the textToSpeech function and the ElevenLabs speech provider. The changes are relatively contained, but careful testing is necessary to ensure that the fix works as expected and does not introduce new issues.

Recommendation

Apply the suggested workaround by implementing the two complementary changes: auto-targeting voice-note for voice-capable channels and honoring output format overrides for voice-note in ElevenLabs. This will unblock the pipeline and allow TTS audio to be rendered as native iOS voice memos in BlueBubbles iMessage chats.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #runtime error #dependency conflict #environment setup #docker error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix BlueBubbles native iOS voice-memo delivery broken end-to-end with ElevenLabs (and other non-Azure TTS providers) [3 pull requests, 3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #72564: fix(tts): pick file extension from output format and expose target

Description (problem / solution / changelog)

Problem

Fix

Test plan

Changed files

PR #72586: fix(tts): pre-transcode synthesized audio to channel-preferred container before voice-memo delivery

Description (problem / solution / changelog)

Summary

Diagnostic journey

Pipeline pieces

Behavior matrix

Tests

Test plan

Changed files

PR #73111: fix(gateway): strip SecretRef apiKey from messages.tts.providers before talk.config hands it to speech providers

Description (problem / solution / changelog)

Summary

Fix

Files

Test plan

Related

Changed files

Code Example

Summary

Pipeline (what should happen)

Where the chain breaks

Gap 1 — target=voice-note is never set when delivering TTS to BlueBubbles

Gap 2 — ElevenLabs returns opus for voice-note target, but BlueBubbles rejects opus

Net effect

Reproduction

Suggested fix

Related

No PII

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Gap 1 — `target=voice-note` is never set when delivering TTS to BlueBubbles