openclaw - ✅(Solved) Fix TTS `parseTtsDirectives` is markdown-blind: `[[tts:xxx]]` inside code spans / code blocks triggers auto TTS in `tagged` mode [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68769Fetched 2026-04-19 15:07:50
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

parseTtsDirectives in provider-error-utils-CyJAWFR1.js uses the regex /\[\[tts:([^\]]+)\]\]/gi, which matches any literal [[tts:xxx]] occurrence regardless of surrounding markdown context. An assistant writing about TTS directives — for example in a troubleshooting reply, a | TTS 链路诊断 | 查 \[[tts:text]]` ... |table cell, or an inline/fenced code block — unintentionally triggershasDirective=true, and in auto: "tagged"` mode the TTS pipeline then generates and delivers a voice message that the assistant never intended to produce.

Error Message

parseTtsDirectives in provider-error-utils-CyJAWFR1.js uses the regex /\[\[tts:([^\]]+)\]\]/gi, which matches any literal [[tts:xxx]] occurrence regardless of surrounding markdown context. An assistant writing about TTS directives — for example in a troubleshooting reply, a | TTS 链路诊断 | 查 \[[tts:text]]` ... |table cell, or an inline/fenced code block — unintentionally triggershasDirective=true, and in auto: "tagged"mode the TTS pipeline then generates and delivers a voice message that the assistant never intended to produce.dist/provider-error-utils-CyJAWFR1.js` (v2026.4.15):

Root Cause

parseTtsDirectives in provider-error-utils-CyJAWFR1.js uses the regex /\[\[tts:([^\]]+)\]\]/gi, which matches any literal [[tts:xxx]] occurrence regardless of surrounding markdown context. An assistant writing about TTS directives — for example in a troubleshooting reply, a | TTS 链路诊断 | 查 \[[tts:text]]` ... |table cell, or an inline/fenced code block — unintentionally triggershasDirective=true, and in auto: "tagged"` mode the TTS pipeline then generates and delivers a voice message that the assistant never intended to produce.

Fix Action

Fix / Workaround

Workaround used locally

PR fix notes

PR #68806: [AI-assisted] fix(tts): ignore literal directives inside markdown code

Description (problem / solution / changelog)

Summary

  • Problem: parseTtsDirectives treated literal [[tts:...]] tokens inside inline code spans and fenced code blocks as real directives.
  • Why it matters: in messages.tts.auto="tagged", troubleshooting or documentation replies could trigger unintended TTS synthesis and strip visible example text.
  • What changed: skip TTS directive parsing when the directive tag itself starts inside a markdown code region, and add regression tests for inline code, fenced code, mixed literal+real directives, and real text blocks that contain inline code.
  • What did NOT change (scope boundary): this does not address other TTS parser issues such as orphaned closing-tag leakage in #68553.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #68769
  • Related #68553
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the TTS directive regexes were markdown-blind and stripped any literal [[tts:...]] token even when that token lived inside inline or fenced code.
  • Missing detection / guardrail: no code-region check existed before applying the directive regexes.
  • Contributing context (if known): [[tts:text]]...[[/tts:text]] blocks can legitimately contain inline code, so the guardrail has to ignore directive tags inside code without suppressing real text blocks that merely contain code.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/tts/directives.test.ts
  • Scenario the test should lock in: literal directive examples inside inline/fenced markdown code stay visible text and do not trigger TTS; real directives outside code still parse.
  • Why this is the smallest reliable guardrail: the bug lives entirely inside parseTtsDirectives, so a focused parser unit test exercises the broken branch directly without needing live provider/channel setup.
  • Existing test that already covers this (if any): none
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Literal TTS directive examples inside inline code or fenced code stay visible text and no longer trigger tagged-mode TTS.
  • Real TTS directives outside markdown code still behave as before, including [[tts:text]]...[[/tts:text]] blocks that contain inline code.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node v24.15.0 / pnpm v10.33.0
  • Model/provider: N/A
  • Integration/channel (if any): tagged TTS parser (messages.tts.auto="tagged")
  • Relevant config (redacted): messages.tts.auto="tagged"

Steps

  1. Put a literal TTS token inside markdown code, for example `[[tts:text]]` or a fenced block containing [[tts:text]]...[[/tts:text]].
  2. Run the message through parseTtsDirectives.
  3. Observe whether the parser marks hasDirective=true and strips the literal text.

Expected

  • Literal examples inside markdown code remain untouched and do not trigger TTS.
  • Real directives outside code still parse.

Actual

  • Before this patch, literal examples inside inline/fenced code set hasDirective=true and were stripped; mixed content also lost the inline-code literal token.
  • After this patch, only directives whose tag starts outside code are parsed.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • On a detached origin/main checkout, the new parser regressions failed for inline code, fenced code, and mixed literal+real directive boundaries.
    • On this branch, corepack pnpm test src/tts/directives.test.ts passed.
    • On this branch, corepack pnpm test src/shared/text/code-regions.test.ts passed.
    • On this branch, corepack pnpm build passed.
  • Edge cases checked:
    • real directive adjacent to inline code
    • real [[tts:text]]...[[/tts:text]] block containing inline code
  • What you did not verify:
    • live Telegram or provider delivery end-to-end; this patch stays in the parser unit.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk: malformed tags that straddle code-boundary positions could still be preserved as literal text.
    • Mitigation: scope stays intentionally narrow, and the new tests lock the supported inline/fenced boundaries plus real text-block parsing with inline code.

Changed files

  • src/tts/directives.test.ts (modified, +59/-0)
  • src/tts/directives.ts (modified, +57/-8)

Code Example

| 3 | TTS 链路诊断 |`[[tts:text]]` 为什么 gateway log 完全没处理痕迹 |

---

`messages.tts.auto = "tagged"` → 需要 `[[tts:text]]` tag 才触发

---

import re
single_re = re.compile(r'\[\[tts:([^\]]+)\]\]', re.IGNORECASE)
msg = '`messages.tts.auto = "tagged"` → 需要 `[[tts:text]]` tag 才触发'
print(single_re.findall(msg))  # -> ['text']  => hasDirective=True

---

cleanedText = cleanedText.replace(
  /\[\[tts:text\]\]([\s\S]*?)\[\[\/tts:text\]\]/gi,
  (_match, inner) => { hasDirective = true; /* ... */ return ""; }
);
cleanedText = cleanedText.replace(
  /\[\[tts:([^\]]+)\]\]/gi,                // ← greedy, markdown-blind
  (_match, body) => { hasDirective = true; /* ... */ return ""; }
);

---

if (autoMode === "tagged" && !directives.hasDirective) return nextPayload;
RAW_BUFFERClick to expand / collapse

Summary

parseTtsDirectives in provider-error-utils-CyJAWFR1.js uses the regex /\[\[tts:([^\]]+)\]\]/gi, which matches any literal [[tts:xxx]] occurrence regardless of surrounding markdown context. An assistant writing about TTS directives — for example in a troubleshooting reply, a | TTS 链路诊断 | 查 \[[tts:text]]` ... |table cell, or an inline/fenced code block — unintentionally triggershasDirective=true, and in auto: "tagged"` mode the TTS pipeline then generates and delivers a voice message that the assistant never intended to produce.

Environment

  • OpenClaw 2026.4.15 (041266a)
  • Node 25.7.0, macOS 26.4.1 (Tahoe)
  • Config: messages.tts.auto = "tagged", provider elevenlabs
  • Assistant running as system-architect agent, Telegram channel, Anthropic claude-opus-4-7

Reproduction

With messages.tts.auto: "tagged" and a working ElevenLabs / MiniMax provider, have the assistant reply containing the literal substring [[tts:text]] inside a code span or table cell without any closing [[/tts:text]] block, e.g.:

| 3 | TTS 链路诊断 | 查 `[[tts:text]]` 为什么 gateway log 完全没处理痕迹 |

or:

`messages.tts.auto = "tagged"` → 需要 `[[tts:text]]` tag 才触发

Observed: hasDirective=true → TTS synthesis runs → voice note MP3 is delivered to the user.

Minimal Python reproducer (mirroring the JS regex):

import re
single_re = re.compile(r'\[\[tts:([^\]]+)\]\]', re.IGNORECASE)
msg = '`messages.tts.auto = "tagged"` → 需要 `[[tts:text]]` tag 才触发'
print(single_re.findall(msg))  # -> ['text']  => hasDirective=True

Expected behavior

TTS directives should only trigger when they appear as active markup, not when they occur inside:

  • inline code spans (`[[tts:text]]`)
  • fenced code blocks (``` ... ```)
  • indented code blocks (4-space)
  • optionally: table cells containing code spans (or at least when inside code spans)

In other words, parseTtsDirectives should walk the text with basic markdown code-context awareness, or a pre-pass should strip/mask code spans before the directive regex runs.

Actual behavior

All [[tts:xxx]] substrings in the reply text are treated as directives. An assistant authoring a documentation/debug/explanation reply that contains these tags as examples ends up emitting a voice note. In tagged mode this is effectively unavoidable without the assistant knowing to avoid the literal token forms entirely.

Impact

  • Unintended TTS spend (ElevenLabs / MiniMax usage).
  • Surprise voice notes in the chat, confusing users; the assistant in my case denied sending them for three turns (the path also has no Telegram log line — see sibling issue on Telegram plugin logging).
  • Self-reinforcing loop: explaining the bug in a reply triggers the bug.

Source references

dist/provider-error-utils-CyJAWFR1.js (v2026.4.15):

cleanedText = cleanedText.replace(
  /\[\[tts:text\]\]([\s\S]*?)\[\[\/tts:text\]\]/gi,
  (_match, inner) => { hasDirective = true; /* ... */ return ""; }
);
cleanedText = cleanedText.replace(
  /\[\[tts:([^\]]+)\]\]/gi,                // ← greedy, markdown-blind
  (_match, body) => { hasDirective = true; /* ... */ return ""; }
);

Caller maybeApplyTtsToPayload in dist/extensions/speech-core/runtime-api.js:

if (autoMode === "tagged" && !directives.hasDirective) return nextPayload;

Proposed fix (sketch)

Before running the directive regexes, mask/strip markdown code regions:

  1. Fenced blocks: /[^\n]\n[\s\S]?\n/g
  2. Inline spans: / +[^`\n]++ /g and /`[^`\n]+`/g
  3. (Optionally) indented code blocks.

Replace matches with placeholders before applying the directive regexes, then re-insert at the end (or simply keep them in cleanedText and only mask for directive detection).

Alternatively, require fenced code blocks to be a hard boundary: directives inside fenced blocks are treated as literal text, directives outside behave as today.

Happy to open a PR if the team agrees on the scope.

Workaround used locally

Added a rule to the assistant's TOOLS.md forbidding literal [[tts:xxx]] in reply text when discussing TTS; use alternative forms like `tts:text` tag or "双中括号 tts 指令". This works but is fragile and agent-specific.

extent analysis

TL;DR

Mask markdown code regions before applying the TTS directive regex to prevent unintended voice note generation.

Guidance

  • Identify and mask fenced code blocks, inline code spans, and optionally indented code blocks in the text before running the TTS directive regex.
  • Replace masked code regions with placeholders to prevent them from being treated as directives.
  • Consider requiring fenced code blocks to be a hard boundary for directive detection.
  • Verify the fix by testing with examples that previously triggered unintended TTS synthesis, such as the provided Python reproducer.

Example

// Mask fenced code blocks
const fencedBlockRegex = /```[^\n]*\n[\s\S]*?\n```/g;
cleanedText = cleanedText.replace(fencedBlockRegex, (match) => `{{placeholder:${match}}}`);

// Mask inline code spans
const inlineSpanRegex = /`[^`\n]+`/g;
cleanedText = cleanedText.replace(inlineSpanRegex, (match) => `{{placeholder:${match}}}`);

// Apply TTS directive regex
const ttsDirectiveRegex = /\[\[tts:([^\]]+)\]\]/gi;
cleanedText = cleanedText.replace(ttsDirectiveRegex, (_match, body) => { hasDirective = true; /* ... */ return ""; });

Notes

The proposed fix requires careful consideration of edge cases, such as nested code blocks or directives within code spans. The workaround used locally, adding a rule to the assistant's TOOLS.md, is fragile and may not be effective in all scenarios.

Recommendation

Apply the workaround of masking markdown code regions before running the TTS directive regex, as it is a more comprehensive solution that addresses the root cause of the issue. This approach will prevent unintended TTS synthesis and provide a more robust solution than the local workaround.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

TTS directives should only trigger when they appear as active markup, not when they occur inside:

  • inline code spans (`[[tts:text]]`)
  • fenced code blocks (``` ... ```)
  • indented code blocks (4-space)
  • optionally: table cells containing code spans (or at least when inside code spans)

In other words, parseTtsDirectives should walk the text with basic markdown code-context awareness, or a pre-pass should strip/mask code spans before the directive regex runs.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix TTS `parseTtsDirectives` is markdown-blind: `[[tts:xxx]]` inside code spans / code blocks triggers auto TTS in `tagged` mode [1 pull requests, 1 participants]