只朗读 SSML ` ` 标签内的文字内容

openclaw - 💡(How to fix) Fix [Bug] Edge TTS reads SSML/XML markup as text instead of speaking content [1 comments, 2 participants]

openclaw2026-05-02 15:06:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#76124•Fetched 2026-05-03 04:42:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

17329971

Participants

17329971

clawsweeper[bot]

Timeline (top)

commented ×1unsubscribed ×1

Root Cause

查看了 edge-tts-D1NvSesj.js 中的 ttsPromise 方法，SSML 通过 WebSocket 文本帧发送：

_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n
  <speak version="1.0" ...>
    <voice name="${this.voice}">
      <prosody rate="${this.rate}" ...>
        ${escapeXml(text)}
      </prosody>
    </voice>
  </speak>`);

Edge TTS WebSocket 服务器似乎没有正确解析 SSML content-type，把整个消息当作纯文本进行合成，导致 XML 标记被朗读。

Code Example

[00:00] Speak version 等于...
[00:01] XMLS 等于 HTTP...
[00:07] Synthesis XML
[00:10] Lang 等于
[00:11] THCN Voice Name 等于 THCN 用心柔柔Prostory

---

<speak version="1.0" xmlns="http://www.w3.org/..." 
      xmlns:mstts="https://www.w3.org/..." 
      xml:lang="zh-CN">
  <voice name="zh-CN-XiaoxiaoNeural">
    <prosody rate="..." pitch="..." volume="...">

---

_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n
  <speak version="1.0" ...>
    <voice name="${this.voice}">
      <prosody rate="${this.rate}" ...>
        ${escapeXml(text)}
      </prosody>
    </voice>
  </speak>`);

RAW_BUFFERClick to expand / collapse

Describe the bug

Edge TTS 在发送语音时，把 SSML/XML 标记语言当作文本朗读出来，而不是只朗读标记内的文字内容。

Steps to reproduce

配置飞书通道 + Edge TTS provider
飞书群 @机器人发送文字
机器人回复语音气泡
播放语音 → 听到的是 SSML 源码而非回复文字

Expected behavior

只朗读 SSML <speak> 标签内的文字内容

Actual behavior

朗读了完整的 SSML/XML 标记，Whisper 转录前 15 秒：

[00:00] Speak version 等于...
[00:01] XMLS 等于 HTTP...
[00:07] Synthesis XML
[00:10] Lang 等于
[00:11] THCN Voice Name 等于 THCN 用心柔柔Prostory

实际念的是：

<speak version="1.0" xmlns="http://www.w3.org/..." 
      xmlns:mstts="https://www.w3.org/..." 
      xml:lang="zh-CN">
  <voice name="zh-CN-XiaoxiaoNeural">
    <prosody rate="..." pitch="..." volume="...">

Environment

OpenClaw: 2026.4.29 (a448042)
Channel: feishu (websocket)
TTS provider: Microsoft Edge (edge)
OS: Windows 10

Root cause analysis

查看了 edge-tts-D1NvSesj.js 中的 ttsPromise 方法，SSML 通过 WebSocket 文本帧发送：

_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n
  <speak version="1.0" ...>
    <voice name="${this.voice}">
      <prosody rate="${this.rate}" ...>
        ${escapeXml(text)}
      </prosody>
    </voice>
  </speak>`);

Edge TTS WebSocket 服务器似乎没有正确解析 SSML content-type，把整个消息当作纯文本进行合成，导致 XML 标记被朗读。

Additional context

此问题导致飞书群聊中的文字转语音功能完全不可用。

extent analysis

TL;DR

The Edge TTS provider is not correctly parsing the SSML content-type, causing the XML tags to be read aloud instead of the intended text.

Guidance

Verify that the Content-Type header is set to application/ssml+xml in the WebSocket request to ensure the Edge TTS server recognizes the SSML format.
Check the Edge TTS server documentation to see if there are any specific requirements or limitations for handling SSML content.
Consider modifying the ttsPromise method to validate the response from the Edge TTS server to ensure it is correctly parsing the SSML content.
Test the Edge TTS provider with a simple SSML example to isolate the issue and determine if it is specific to the current implementation.

Example

// Example of a simple SSML request
const ssml = `
  <speak version="1.0">
    <voice name="zh-CN-XiaoxiaoNeural">
      <prosody rate="1" pitch="1" volume="1">
        Hello, world!
      </prosody>
    </voice>
  </speak>
`;
_wsConnect.send(`X-RequestId:...\r\nContent-Type:application/ssml+xml\r\nPath:ssml\r\n\r\n${ssml}`);

Notes

The issue may be specific to the Edge TTS provider or the current implementation, and further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by modifying the ttsPromise method to remove or escape the XML tags before sending the request to the Edge TTS server, until the provider correctly supports SSML content-type.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

只朗读 SSML <speak> 标签内的文字内容

#configuration error #environment variable #network issue #logging issue #authentication issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug] Edge TTS reads SSML/XML markup as text instead of speaking content [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Describe the bug

Steps to reproduce

Expected behavior

Actual behavior

Environment

Root cause analysis

Additional context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug] Edge TTS reads SSML/XML markup as text instead of speaking content [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Describe the bug

Steps to reproduce

Expected behavior

Actual behavior

Environment

Root cause analysis

Additional context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING