openclaw - 💡(How to fix) Fix [Bug] Edge TTS reads SSML/XML markup as text instead of speaking content [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76124Fetched 2026-05-03 04:42:05
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Author
Timeline (top)
commented ×1unsubscribed ×1

Root Cause

查看了 edge-tts-D1NvSesj.js 中的 ttsPromise 方法,SSML 通过 WebSocket 文本帧发送:

_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n
  <speak version="1.0" ...>
    <voice name="${this.voice}">
      <prosody rate="${this.rate}" ...>
        ${escapeXml(text)}
      </prosody>
    </voice>
  </speak>`);

Edge TTS WebSocket 服务器似乎没有正确解析 SSML content-type,把整个消息当作纯文本进行合成,导致 XML 标记被朗读。

Code Example

[00:00] Speak version 等于...
[00:01] XMLS 等于 HTTP...
[00:07] Synthesis XML
[00:10] Lang 等于
[00:11] THCN Voice Name 等于 THCN 用心柔柔Prostory

---

<speak version="1.0" xmlns="http://www.w3.org/..." 
      xmlns:mstts="https://www.w3.org/..." 
      xml:lang="zh-CN">
  <voice name="zh-CN-XiaoxiaoNeural">
    <prosody rate="..." pitch="..." volume="...">

---

_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n
  <speak version="1.0" ...>
    <voice name="${this.voice}">
      <prosody rate="${this.rate}" ...>
        ${escapeXml(text)}
      </prosody>
    </voice>
  </speak>`);
RAW_BUFFERClick to expand / collapse

Describe the bug

Edge TTS 在发送语音时,把 SSML/XML 标记语言当作文本朗读出来,而不是只朗读标记内的文字内容。

Steps to reproduce

  1. 配置飞书通道 + Edge TTS provider
  2. 飞书群 @机器人 发送文字
  3. 机器人回复语音气泡
  4. 播放语音 → 听到的是 SSML 源码而非回复文字

Expected behavior

只朗读 SSML <speak> 标签内的文字内容

Actual behavior

朗读了完整的 SSML/XML 标记,Whisper 转录前 15 秒:

[00:00] Speak version 等于...
[00:01] XMLS 等于 HTTP...
[00:07] Synthesis XML
[00:10] Lang 等于
[00:11] THCN Voice Name 等于 THCN 用心柔柔Prostory

实际念的是:

<speak version="1.0" xmlns="http://www.w3.org/..." 
      xmlns:mstts="https://www.w3.org/..." 
      xml:lang="zh-CN">
  <voice name="zh-CN-XiaoxiaoNeural">
    <prosody rate="..." pitch="..." volume="...">

Environment

  • OpenClaw: 2026.4.29 (a448042)
  • Channel: feishu (websocket)
  • TTS provider: Microsoft Edge (edge)
  • OS: Windows 10

Root cause analysis

查看了 edge-tts-D1NvSesj.js 中的 ttsPromise 方法,SSML 通过 WebSocket 文本帧发送:

_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n
  <speak version="1.0" ...>
    <voice name="${this.voice}">
      <prosody rate="${this.rate}" ...>
        ${escapeXml(text)}
      </prosody>
    </voice>
  </speak>`);

Edge TTS WebSocket 服务器似乎没有正确解析 SSML content-type,把整个消息当作纯文本进行合成,导致 XML 标记被朗读。

Additional context

此问题导致飞书群聊中的文字转语音功能完全不可用。

extent analysis

TL;DR

The Edge TTS provider is not correctly parsing the SSML content-type, causing the XML tags to be read aloud instead of the intended text.

Guidance

  • Verify that the Content-Type header is set to application/ssml+xml in the WebSocket request to ensure the Edge TTS server recognizes the SSML format.
  • Check the Edge TTS server documentation to see if there are any specific requirements or limitations for handling SSML content.
  • Consider modifying the ttsPromise method to validate the response from the Edge TTS server to ensure it is correctly parsing the SSML content.
  • Test the Edge TTS provider with a simple SSML example to isolate the issue and determine if it is specific to the current implementation.

Example

// Example of a simple SSML request
const ssml = `
  <speak version="1.0">
    <voice name="zh-CN-XiaoxiaoNeural">
      <prosody rate="1" pitch="1" volume="1">
        Hello, world!
      </prosody>
    </voice>
  </speak>
`;
_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n${ssml}`);

Notes

The issue may be specific to the Edge TTS provider or the current implementation, and further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by modifying the ttsPromise method to remove or escape the XML tags before sending the request to the Edge TTS server, until the provider correctly supports SSML content-type.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

只朗读 SSML <speak> 标签内的文字内容

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug] Edge TTS reads SSML/XML markup as text instead of speaking content [1 comments, 2 participants]