openclaw - 💡(How to fix) Fix Track provider-aware automatic TTS emotion mapping from #75043 [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Track and review the provider-aware automatic TTS emotion mapping proposed in #75043.

The PR adds an opt-in messages.tts.autoEmotion config that infers a conservative abstract emotion from synthesized text, then maps that abstract emotion to provider-native TTS controls at the speech provider boundary.

Related PR: #75043 Related issue: #67539

Root Cause

The main product-boundary question is whether #75043 should land as a shared speech-core autoEmotion policy, whether #67539 should land first as a narrower provider-owned prompt-hint seam, or whether both should exist because they solve different parts of TTS expressiveness.

Fix Action

Fixed

Code Example

{
  "messages": {
    "tts": {
      "autoEmotion": {
        "enabled": true,
        "fallback": "neutral",
        "allowed": ["happy", "calm", "sad"]
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Track and review the provider-aware automatic TTS emotion mapping proposed in #75043.

The PR adds an opt-in messages.tts.autoEmotion config that infers a conservative abstract emotion from synthesized text, then maps that abstract emotion to provider-native TTS controls at the speech provider boundary.

Related PR: #75043 Related issue: #67539

Problem

OpenClaw TTS providers expose expressiveness through different APIs:

  • OpenAI TTS can use model-specific instructions on supported models.
  • Microsoft and Azure Speech expose prosody controls such as rate, pitch, and volume.
  • ElevenLabs exposes voice settings.
  • Volcengine and Xiaomi expose provider-specific emotion/style surfaces.

Today users can configure provider-specific behavior manually, but there is no shared opt-in mechanism for lightweight, context-sensitive emotional variation when the user has not already selected an explicit provider emotion/style/prosody.

Proposal in #75043

#75043 implements a conservative shared autoEmotion layer in speech-core:

  1. extensions/speech-core/src/tts.ts infers an abstract emotion from the final synthesized text.
  2. Speech-core checks for explicit overrides first, including provider config, persona provider bindings, trusted request overrides, and allowed model directives.
  3. If no explicit emotion-equivalent setting is present, speech-core maps the abstract emotion into provider-native overrides.
  4. Provider adapters remain responsible for translating those overrides into the actual provider request.

The new public config is:

{
  "messages": {
    "tts": {
      "autoEmotion": {
        "enabled": true,
        "fallback": "neutral",
        "allowed": ["happy", "calm", "sad"]
      }
    }
  }
}

The option is disabled by default. Existing TTS behavior should remain unchanged unless users enable messages.tts.autoEmotion.enabled.

Precedence and safety model

The PR is designed so explicit user/provider intent wins over inferred behavior:

  • Persona provider bindings win over auto emotion.
  • Provider config wins over auto emotion.
  • Trusted request overrides win over auto emotion.
  • Allowed model-provided TTS directives win over auto emotion.
  • Auto emotion only fills the gap when no equivalent provider-specific emotion/style/prosody is already set.

This keeps the feature opt-in and avoids overriding intentionally configured voices or personas.

Provider mapping shape

The PR maps inferred abstract emotions into provider-owned controls:

  • OpenAI: instructions
  • Microsoft: rate, pitch, volume
  • Azure Speech: rate, pitch, volume
  • ElevenLabs: voiceSettings
  • Volcengine: emotion
  • Xiaomi: style

The latest PR head also fixes the telephony path so OpenAI telephony preserves merged instructions, and Azure telephony forwards prosody overrides to azureSpeechTTS.

Public surface touched

#75043 updates the full public configuration and documentation surface, including:

  • src/config/types.tts.ts
  • src/config/zod-schema.core.ts
  • src/config/schema.help.ts
  • src/config/schema.labels.ts
  • src/plugin-sdk/config-contracts.ts
  • src/tts/tts-types.ts
  • docs/tools/tts.md
  • provider adapters/tests under extensions/
  • changelog entry for the user-facing config option

Relationship to #67539

#67539 asks for provider-specific TTS prompt hints so agents know which expressive syntax is valid for the active provider. That direction is complementary but different:

  • #67539 teaches the model/provider prompt what expressive syntax it may emit.
  • #75043 applies deterministic runtime mapping after text is selected, without requiring the model to emit provider-specific syntax.

The main product-boundary question is whether #75043 should land as a shared speech-core autoEmotion policy, whether #67539 should land first as a narrower provider-owned prompt-hint seam, or whether both should exist because they solve different parts of TTS expressiveness.

Current PR state

Latest checked PR head: 759367c7beeb74051512742cd07d3b7e70758014

Recent validation on #75043 included:

  • pnpm test extensions/openai/speech-provider.test.ts extensions/azure-speech/speech-provider.test.ts extensions/speech-core/src/tts.test.ts
  • pnpm tsgo:core
  • pnpm tsgo:extensions
  • pnpm check:test-types
  • pnpm lint
  • targeted oxfmt --check
  • git diff --check

GitHub CI was also green at the latest check, with no failing checks.

Open owner questions

  1. Should messages.tts.autoEmotion be accepted as a shared speech-core feature, or should provider-specific expressiveness stay provider-owned only?
  2. Is the current precedence model sufficient to protect personas and explicit provider configuration?
  3. Should the PR be kept as one coherent feature, or split into a smaller provider-hint seam first and a later auto-emotion follow-up?
  4. Do TTS/provider owners want updated real behavior proof against the latest PR head before review, beyond the focused provider/unit/type/lint validation already supplied?

Expected decision

If owners agree with the shared opt-in policy, #75043 is the implementation PR for this issue. If owners prefer the narrower prompt-hint direction from #67539 first, this issue can track reshaping #75043 into a smaller follow-up after the provider hint seam lands.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Track provider-aware automatic TTS emotion mapping from #75043 [1 pull requests]