openclaw - ✅(Solved) Fix [Feature]: 1: Feishu voice bubble — MP3/WAV sent as file instead of voice bubble、2: Per-agent TTS voice override for Microsoft Edge TTS [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56231Fetched 2026-04-08 01:43:14
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1
GapImpactSuggested Fix
No microsoft_voice in parseTtsDirectives()Can't override Microsoft TTS voice per-messageAdd microsoft_voice key
No per-agent TTS config in agents.list[]All agents sound the sameAdd tts field to agent config
Cron announce skips TTS tag parsing[[tts:...]] appears as literal text in groupProcess TTS tags in announce delivery pipeline

Root Cause

GapImpactSuggested Fix
No microsoft_voice in parseTtsDirectives()Can't override Microsoft TTS voice per-messageAdd microsoft_voice key
No per-agent TTS config in agents.list[]All agents sound the sameAdd tts field to agent config
Cron announce skips TTS tag parsing[[tts:...]] appears as literal text in groupProcess TTS tags in announce delivery pipeline

Fix Action

Workaround

Currently patching the compiled send-*.js file manually to add ffmpeg conversion. This patch breaks on every OpenClaw update.


Issue 2: Per-agent TTS voice override for Microsoft Edge TTS

PR fix notes

PR #62573: [codex] Support per-agent TTS voice overrides

Description (problem / solution / changelog)

Summary

  • add agents.list[].tts to the config surface and refresh the generated config baselines
  • deep-merge per-agent TTS overrides over global messages.tts
  • use the active agent's TTS config in reply dispatch, ACP dispatch, /tts, status output, system-prompt hints, and the TTS tool

Why

Per-agent TTS voice configuration is a recurring request, but current main rejects agents.list[].tts during config validation, so agents cannot keep distinct voices or providers without changing the global TTS settings.

Impact

Agents can now keep different TTS voices/providers while still inheriting any unset values from messages.tts.

This intentionally keeps the voice-call telephony path unchanged because that runtime does not currently carry agent context; this PR stays scoped to agent/session reply flows that already know which agent is active.

Root Cause

The config schema and types only recognized global messages.tts, and the TTS runtime always resolved settings from that global block. That meant per-agent overrides failed validation up front and could never take effect anywhere in the runtime.

Validation

  • pnpm test -- src/config/config.agent-tts-validation.test.ts src/agents/agent-scope.test.ts src/agents/tools/tts-tool.test.ts src/tts/tts.test.ts src/auto-reply/status.test.ts
  • pnpm build
  • pnpm check

Fixes #11483 Related: #17516, #20451, #20565, #56231, #56701

Changed files

  • extensions/speech-core/src/tts.test.ts (added, +45/-0)
  • extensions/speech-core/src/tts.ts (modified, +54/-9)
  • src/agents/agent-scope.test.ts (modified, +25/-0)
  • src/agents/agent-scope.ts (modified, +2/-0)
  • src/agents/cli-runner/helpers.ts (modified, +3/-1)
  • src/agents/openclaw-tools.ts (modified, +2/-0)
  • src/agents/pi-embedded-runner/compact.ts (modified, +3/-1)
  • src/agents/pi-embedded-runner/run/attempt.ts (modified, +3/-1)
  • src/agents/tools/tts-tool.test.ts (modified, +17/-0)
  • src/agents/tools/tts-tool.ts (modified, +2/-0)
  • src/auto-reply/reply/commands-system-prompt.ts (modified, +1/-1)
  • src/auto-reply/reply/commands-tts.ts (modified, +2/-1)
  • src/auto-reply/reply/dispatch-acp-delivery.ts (modified, +6/-1)
  • src/auto-reply/reply/dispatch-acp.ts (modified, +12/-8)
  • src/auto-reply/reply/dispatch-from-config.ts (modified, +4/-1)
  • src/auto-reply/status.test.ts (modified, +29/-0)
  • src/auto-reply/status.ts (modified, +3/-1)
  • src/config/config.agent-tts-validation.test.ts (added, +26/-0)
  • src/config/schema.base.generated.ts (modified, +362/-0)
  • src/config/types.agents.ts (modified, +3/-0)
  • src/config/zod-schema.agent-runtime.ts (modified, +2/-0)
  • src/tts/status-config.test.ts (modified, +33/-0)
  • src/tts/status-config.ts (modified, +3/-1)
  • src/tts/tts-config.ts (modified, +32/-3)
  • src/tts/tts.test.ts (modified, +13/-0)

Code Example

ffmpeg -y -i input.mp3 -ac 1 -b:a 32k -application voip output.opus

---

// In sendMediaFeishu(), after buffer is loaded, before routing:
const ext = path.extname(name).toLowerCase();
const audioFormat = detectAudioFormat(buffer); // check magic bytes
if (['.wav', '.mp3'].includes(ext) && audioFormat !== 'unknown') {
  buffer = await convertToOpus(buffer, audioFormat); // ffmpeg conversion
  name = path.basename(name, ext) + '.opus';
  contentType = 'audio/opus';
}

---

{
  agents: {
    list: [
      {
        id: "main",
        tts: { microsoft: { voice: "zh-TW-HsiaoChenNeural" } }
      },
      {
        id: "fanqie",
        tts: { microsoft: { voice: "zh-CN-XiaoxiaoNeural" } }
      }
    ]
  }
}

---

[[tts:microsoft_voice=zh-CN-XiaoxiaoNeural]]

---

ffmpeg -y -i input.mp3 -ac 1 -b:a 32k -application voip output.opus

---

// In sendMediaFeishu(), after buffer is loaded, before routing:
const ext = path.extname(name).toLowerCase();
const audioFormat = detectAudioFormat(buffer); // check magic bytes
if (['.wav', '.mp3'].includes(ext) && audioFormat !== 'unknown') {
  buffer = await convertToOpus(buffer, audioFormat); // ffmpeg conversion
  name = path.basename(name, ext) + '.opus';
  contentType = 'audio/opus';
}

---

{
  agents: {
    list: [
      {
        id: "main",
        tts: { microsoft: { voice: "zh-TW-HsiaoChenNeural" } }
      },
      {
        id: "fanqie",
        tts: { microsoft: { voice: "zh-CN-XiaoxiaoNeural" } }
      }
    ]
  }
}

---

[[tts:microsoft_voice=zh-CN-XiaoxiaoNeural]]
RAW_BUFFERClick to expand / collapse

Summary

Issue 1: Feishu voice bubble — MP3/WAV files sent as file attachment instead of voice bubble

Description

When OpenClaw sends TTS-generated MP3 files to Feishu, they are delivered as file attachments instead of voice bubbles (audio messages). Feishu requires Opus format (audio msg_type) to render voice bubbles natively.

Current Behavior

  1. TTS provider (Microsoft Edge TTS) generates an MP3 file
  2. sendMediaFeishu() in the Feishu extension detects the file extension
  3. .mp3 is not in the Opus/OGG match list, so it falls through to fileType: "stream"msgType: "file"
  4. Feishu displays it as a downloadable file, not a playable voice bubble

Expected Behavior

MP3 and WAV files sent through the Feishu channel should be automatically converted to Opus (using ffmpeg) and sent as msgType: "audio", so they render as native voice bubbles in the Feishu client.

Suggested Implementation

In extensions/feishu/src/media.ts (or the compiled equivalent), before resolveFeishuOutboundMediaKind() is called in sendMediaFeishu():

  1. Detect audio format from buffer magic bytes (MP3: ID3 header or 0xFF 0xE0 frame sync; WAV: RIFF header)
  2. If the file is MP3 or WAV, convert to Opus via ffmpeg:
    ffmpeg -y -i input.mp3 -ac 1 -b:a 32k -application voip output.opus
  3. Update the filename to .opus and contentType to audio/opus
  4. Then resolveFeishuOutboundMediaKind() will correctly route it as msgType: "audio"

Minimal code sketch

// In sendMediaFeishu(), after buffer is loaded, before routing:
const ext = path.extname(name).toLowerCase();
const audioFormat = detectAudioFormat(buffer); // check magic bytes
if (['.wav', '.mp3'].includes(ext) && audioFormat !== 'unknown') {
  buffer = await convertToOpus(buffer, audioFormat); // ffmpeg conversion
  name = path.basename(name, ext) + '.opus';
  contentType = 'audio/opus';
}

Environment

  • OpenClaw: v2026.3.24
  • Channel: Feishu
  • TTS Provider: Microsoft Edge TTS (outputs MP3)
  • OS: Linux (Debian 13, ffmpeg 7.1.3 available)

Workaround

Currently patching the compiled send-*.js file manually to add ffmpeg conversion. This patch breaks on every OpenClaw update.


Issue 2: Per-agent TTS voice override for Microsoft Edge TTS

Description

In a multi-agent setup, different agents should be able to use different TTS voices to be distinguishable. Currently, [[tts:...]] directives only support OpenAI and ElevenLabs voice overrides — Microsoft Edge TTS voice cannot be overridden per-agent or per-message.

Current Behavior

  1. Global TTS config sets messages.tts.microsoft.voice: "zh-TW-HsiaoChenNeural"
  2. Agent "fanqie" (番茄) tries to use a different voice via [[tts:zh-CN-XiaoxiaoNeural]]
  3. parseTtsDirectives() parses [[tts:key=value]] format only — bare voice names without = are skipped (if (eqIndex === -1) continue;)
  4. Even with correct key=value format, only voice (OpenAI) and voiceId (ElevenLabs) keys are recognized — there is no microsoft_voice key
  5. Result: all agents use the same Microsoft TTS voice, making them indistinguishable

Expected Behavior

Option A: Per-agent TTS config (preferred)

Allow agents.list[] entries to include TTS overrides:

{
  agents: {
    list: [
      {
        id: "main",
        tts: { microsoft: { voice: "zh-TW-HsiaoChenNeural" } }
      },
      {
        id: "fanqie",
        tts: { microsoft: { voice: "zh-CN-XiaoxiaoNeural" } }
      }
    ]
  }
}

Option B: [[tts:...]] directive support for Microsoft voice

Add a microsoft_voice (or just extend voice to be provider-aware) key to parseTtsDirectives():

[[tts:microsoft_voice=zh-CN-XiaoxiaoNeural]]

This would allow agents to override the Microsoft voice inline, same as OpenAI/ElevenLabs voices today.

Option C: Both

Per-agent config as the default, with [[tts:...]] as a per-message override.

Additional Context

Use Case

Multi-agent team where each agent has a distinct persona:

  • 小师妹 (main): zh-TW-HsiaoChenNeural — sweet Taiwanese accent
  • 番茄 (fanqie): zh-CN-XiaoxiaoNeural — warm news-anchor voice

Cron delivery TTS gap

Related issue: when a cron job uses --announce delivery to a Feishu group, [[tts:...]] tags in the agent's response are not parsed — they appear as literal text in the message. The TTS post-processing pipeline seems to be skipped for cron announce delivery.

This means even if [[tts:...]] supported Microsoft voices, cron-delivered messages to Feishu groups would still not get voice output. The per-agent config approach (Option A) would solve this more cleanly since it doesn't depend on inline tags.

Environment

  • OpenClaw: v2026.3.24
  • TTS Provider: Microsoft Edge TTS
  • Multi-agent setup with Feishu channel
  • Agents: main (Claude Opus 4.6) + fanqie (GLM-5-Turbo)

Summary

GapImpactSuggested Fix
No microsoft_voice in parseTtsDirectives()Can't override Microsoft TTS voice per-messageAdd microsoft_voice key
No per-agent TTS config in agents.list[]All agents sound the sameAdd tts field to agent config
Cron announce skips TTS tag parsing[[tts:...]] appears as literal text in groupProcess TTS tags in announce delivery pipeline

Problem to solve

Issue 1: Feishu voice bubble — MP3/WAV sent as file instead of voice bubble Issue 2: Per-agent TTS voice override for Microsoft Edge TTS

Proposed solution

Issue 1: Feishu voice bubble — MP3/WAV files sent as file attachment instead of voice bubble

Description

When OpenClaw sends TTS-generated MP3 files to Feishu, they are delivered as file attachments instead of voice bubbles (audio messages). Feishu requires Opus format (audio msg_type) to render voice bubbles natively.

Current Behavior

  1. TTS provider (Microsoft Edge TTS) generates an MP3 file
  2. sendMediaFeishu() in the Feishu extension detects the file extension
  3. .mp3 is not in the Opus/OGG match list, so it falls through to fileType: "stream"msgType: "file"
  4. Feishu displays it as a downloadable file, not a playable voice bubble

Expected Behavior

MP3 and WAV files sent through the Feishu channel should be automatically converted to Opus (using ffmpeg) and sent as msgType: "audio", so they render as native voice bubbles in the Feishu client.

Suggested Implementation

In extensions/feishu/src/media.ts (or the compiled equivalent), before resolveFeishuOutboundMediaKind() is called in sendMediaFeishu():

  1. Detect audio format from buffer magic bytes (MP3: ID3 header or 0xFF 0xE0 frame sync; WAV: RIFF header)
  2. If the file is MP3 or WAV, convert to Opus via ffmpeg:
    ffmpeg -y -i input.mp3 -ac 1 -b:a 32k -application voip output.opus
  3. Update the filename to .opus and contentType to audio/opus
  4. Then resolveFeishuOutboundMediaKind() will correctly route it as msgType: "audio"

Minimal code sketch

// In sendMediaFeishu(), after buffer is loaded, before routing:
const ext = path.extname(name).toLowerCase();
const audioFormat = detectAudioFormat(buffer); // check magic bytes
if (['.wav', '.mp3'].includes(ext) && audioFormat !== 'unknown') {
  buffer = await convertToOpus(buffer, audioFormat); // ffmpeg conversion
  name = path.basename(name, ext) + '.opus';
  contentType = 'audio/opus';
}

Environment

  • OpenClaw: v2026.3.24
  • Channel: Feishu
  • TTS Provider: Microsoft Edge TTS (outputs MP3)
  • OS: Linux (Debian 13, ffmpeg 7.1.3 available)

Workaround

Currently patching the compiled send-*.js file manually to add ffmpeg conversion. This patch breaks on every OpenClaw update.


Issue 2: Per-agent TTS voice override for Microsoft Edge TTS

Description

In a multi-agent setup, different agents should be able to use different TTS voices to be distinguishable. Currently, [[tts:...]] directives only support OpenAI and ElevenLabs voice overrides — Microsoft Edge TTS voice cannot be overridden per-agent or per-message.

Current Behavior

  1. Global TTS config sets messages.tts.microsoft.voice: "zh-TW-HsiaoChenNeural"
  2. Agent "fanqie" (番茄) tries to use a different voice via [[tts:zh-CN-XiaoxiaoNeural]]
  3. parseTtsDirectives() parses [[tts:key=value]] format only — bare voice names without = are skipped (if (eqIndex === -1) continue;)
  4. Even with correct key=value format, only voice (OpenAI) and voiceId (ElevenLabs) keys are recognized — there is no microsoft_voice key
  5. Result: all agents use the same Microsoft TTS voice, making them indistinguishable

Expected Behavior

Option A: Per-agent TTS config (preferred)

Allow agents.list[] entries to include TTS overrides:

{
  agents: {
    list: [
      {
        id: "main",
        tts: { microsoft: { voice: "zh-TW-HsiaoChenNeural" } }
      },
      {
        id: "fanqie",
        tts: { microsoft: { voice: "zh-CN-XiaoxiaoNeural" } }
      }
    ]
  }
}

Option B: [[tts:...]] directive support for Microsoft voice

Add a microsoft_voice (or just extend voice to be provider-aware) key to parseTtsDirectives():

[[tts:microsoft_voice=zh-CN-XiaoxiaoNeural]]

This would allow agents to override the Microsoft voice inline, same as OpenAI/ElevenLabs voices today.

Option C: Both

Per-agent config as the default, with [[tts:...]] as a per-message override.

Additional Context

Use Case

Multi-agent team where each agent has a distinct persona:

  • 小师妹 (main): zh-TW-HsiaoChenNeural — sweet Taiwanese accent
  • 番茄 (fanqie): zh-CN-XiaoxiaoNeural — warm news-anchor voice

Cron delivery TTS gap

Related issue: when a cron job uses --announce delivery to a Feishu group, [[tts:...]] tags in the agent's response are not parsed — they appear as literal text in the message. The TTS post-processing pipeline seems to be skipped for cron announce delivery.

This means even if [[tts:...]] supported Microsoft voices, cron-delivered messages to Feishu groups would still not get voice output. The per-agent config approach (Option A) would solve this more cleanly since it doesn't depend on inline tags.

Environment

  • OpenClaw: v2026.3.24
  • TTS Provider: Microsoft Edge TTS
  • Multi-agent setup with Feishu channel
  • Agents: main (Claude Opus 4.6) + fanqie (GLM-5-Turbo)

Summary

GapImpactSuggested Fix
No microsoft_voice in parseTtsDirectives()Can't override Microsoft TTS voice per-messageAdd microsoft_voice key
No per-agent TTS config in agents.list[]All agents sound the sameAdd tts field to agent config
Cron announce skips TTS tag parsing[[tts:...]] appears as literal text in groupProcess TTS tags in announce delivery pipeline

Alternatives considered

No response

Impact

Feishu voice bubble — MP3/WAV sent as file instead of voice bubble Per-agent TTS voice override for Microsoft Edge TTS

Evidence/examples

No response

Additional information

No response

extent analysis

Fix Plan

To fix Issue 1 and Issue 2, follow these steps:

Issue 1: Feishu Voice Bubble

  1. Detect audio format: Check the magic bytes of the buffer to detect if it's an MP3 or WAV file.
  2. Convert to Opus: Use ffmpeg to convert the MP3 or WAV file to Opus format.
  3. Update filename and content type: Update the filename to .opus and set the content type to audio/opus.

Example code:

const ext = path.extname(name).toLowerCase();
const audioFormat = detectAudioFormat(buffer); // check magic bytes
if (['.wav', '.mp3'].includes(ext) && audioFormat !== 'unknown') {
  buffer = await convertToOpus(buffer, audioFormat); // ffmpeg conversion
  name = path.basename(name, ext) + '.opus';
  contentType = 'audio/opus';
}

Issue 2: Per-agent TTS Voice Override

  1. Add per-agent TTS config: Allow agents.list[] entries to include TTS overrides.
  2. Add microsoft_voice key: Add a microsoft_voice key to parseTtsDirectives() to support Microsoft voice overrides.

Example code:

{
  agents: {
    list: [
      {
        id: "main",
        tts: { microsoft: { voice: "zh-TW-HsiaoChenNeural" } }
      },
      {
        id: "fanqie",
        tts: { microsoft: { voice: "zh-CN-XiaoxiaoNeural" } }
      }
    ]
  }
}

Verification

To verify the fixes, test the following scenarios:

  • Send an MP3 or WAV file to Feishu and check if it's rendered as a voice bubble.
  • Configure different TTS voices for each agent and check if they sound distinct.

Extra Tips

  • Make sure to update the parseTtsDirectives() function to support the microsoft_voice key.
  • Consider adding a fallback mechanism to handle cases where the audio format is not supported.
  • Test the fixes thoroughly to ensure they work as expected in different scenarios.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Feature]: 1: Feishu voice bubble — MP3/WAV sent as file instead of voice bubble、2: Per-agent TTS voice override for Microsoft Edge TTS [1 pull requests, 1 participants]