openclaw - ✅(Solved) Fix [Feature]: Make TTS directive parsing honor explicit provider selection [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60131Fetched 2026-04-08 02:35:54
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1labeled ×1

Honor provider= in [[tts:...]] directives so only the selected speech provider consumes overlapping directive keys, and unsupported keys produce warnings instead of being silently dropped.

Root Cause

This is painful because the syntax looks valid, the directive is accepted, and the output is wrong in a way that is hard for users and maintainers to diagnose.

Proposed solution

Fix Action

Fix / Workaround

  • Parse provider= first within each [[tts:...]] directive and treat it as the active provider for the remaining tokens in that directive.
  • When an active provider is present, dispatch non-provider tokens only to that provider's parseDirectiveToken.
  • If a token is not recognized by the selected provider, preserve the directive cleanup behavior but emit a warning instead of silently dropping it.
  • Extend SpeechDirectiveTokenParseContext in src/tts/provider-types.ts to include the selected provider id, so providers can make explicit routing decisions when needed. For example:

PR fix notes

PR #58607: feat(mistral): add Voxtral TTS support

Description (problem / solution / changelog)

Summary

This PR adds Voxtral TTS support through the mistral speech provider.

  • Reuses existing Mistral onboarding, auth-profile resolution, and base URL config so the same MISTRAL_API_KEY setup can power models, transcription, and TTS.
  • Built on the official @mistralai/mistralai v2 SDK (client.audio.speech.complete()) — no raw HTTP, no manual JSON-envelope parsing or base64 decode. Upgraded workspace-wide from 1.14.1 → 2.2.0 (pi-ai only uses chat.stream(), stable across v1→v2).
  • Added enabled flag to the OpenAI TTS provider — lets users keep OpenAI as a model provider while explicitly opting it out of the automatic TTS fallback order.
  • Documents setup/examples for Mistral/Voxtral TTS and adds focused provider/registration coverage.
  • What did NOT change (scope boundary): this stays focused on the Voxtral TTS provider path and does not change unrelated TTS providers.
  • Specifically, there is a pre-existing issue with speech directive parsing at the architectural level, which is not solved in this PR.

I have a followup branch ready for Voxtral STT, for which this PR is a prerequisite.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #58117
  • Related #60131
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: N/A
  • Missing detection / guardrail: N/A
  • Prior context (git blame, prior PR, issue, or refactor if known): N/A
  • Why this regressed now: N/A
  • If unknown, what was ruled out: N/A

Regression Test Plan (if applicable)

  • Coverage level that should have caught this: N/A
  • Target test or file: N/A
  • Scenario the test should lock in: N/A
  • Why this is the smallest reliable guardrail: N/A
  • Existing test that already covers this (if any): N/A
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

  • Adds mistral as a TTS provider using Voxtral TTS.
  • Lets the Mistral TTS path reuse existing Mistral auth/base URL resolution instead of needing a separate TTS-only setup path.
  • Adds Mistral/Voxtral TTS setup examples to the provider and TTS docs.
  • Adds messages.tts.providers.openai.enabled flag — set to false to disable OpenAI as an automatic TTS fallback while keeping OpenAI configured as a model provider.

Diagram

flowchart TD
    explicit["/tts provider mistral"] --> mistral
    autoselect["TTS auto-select<br/>(openai.enabled = false)"] --> mistral

    mistral[mistral speech provider] --> auth[resolve config and auth]
    auth --> sdk["@mistralai/mistralai v2 SDK<br/>client.audio.speech.complete()"]
    sdk --> api["POST /v1/audio/speech"]
    api --> audio["audio ✓"]
    api -- "error / timeout" --> fallback{"openai.enabled?"}
    fallback -- "true (default)" --> openai["OpenAI TTS fallback"]
    fallback -- "false" --> other["other providers in fallback order"]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) Yes
  • New/changed network calls? (Yes/No) Yes
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: this adds a new Mistral TTS network path (via the official @mistralai/mistralai SDK) and reuses existing Mistral auth/profile resolution for that provider; focused plugin tests cover auth reuse, registration, and request shaping.

Repro + Verification

Environment

  • OS: Linux (local checkout)
  • Runtime/container: Node + npx pnpm
  • Model/provider: Mistral / Voxtral TTS
  • Integration/channel (if any): core TTS provider path
  • Relevant config (redacted): existing auth.profiles.mistral:* or models.providers.mistral plus messages.tts.provider = "mistral"

Steps

  1. Configure Mistral via onboarding or an auth profile using MISTRAL_API_KEY.
  2. Set messages.tts.provider to mistral or use /tts provider mistral.
  3. Request speech synthesis.

Expected

  • OpenClaw recognizes Mistral as a TTS provider and synthesizes audio through Voxtral TTS.

Actual

  • Prior to this branch, there was no Voxtral TTS support on the Mistral provider path.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Voxtral TTS in Telegram voice notes
  • Voxtral TTS in voice-call (with Twilio)
  • Using a real MISTRAL_API_KEY and configured voice id.
  • Edge cases checked: auth-profile reuse, provider base URL reuse, Mistral plugin registration, docs/examples for Voxtral TTS, SDK v2 usage (serverURL handling, error reformatting), and OpenAI enabled flag behaviour in auto-select.
  • What you did not verify: broader live coverage beyond one real key + voice id path (for example additional voices).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No

Risks and Mitigations

  • Risk: the Voxtral TTS provider could drift from manifest/runtime ownership expectations.
    • Mitigation: the PR adds Mistral plugin registration and contract coverage for the speech provider.
  • Risk: upgrading @mistralai/mistralai 1.14.1 → 2.2.0 workspace-wide could break @mariozechner/pi-ai model calls.
    • Mitigation: pi-ai only uses new Mistral() + chat.stream(), both unchanged in v2; verified in code review of pi-ai's dist/providers/mistral.js.

Changed files

  • docs/providers/mistral.md (modified, +34/-4)
  • docs/tools/tts.md (modified, +46/-7)
  • extensions/mistral/api.test.ts (modified, +68/-1)
  • extensions/mistral/index.ts (modified, +2/-0)
  • extensions/mistral/openclaw.plugin.json (modified, +1/-0)
  • extensions/mistral/package.json (modified, +3/-0)
  • extensions/mistral/speech-provider.test.ts (added, +505/-0)
  • extensions/mistral/speech-provider.ts (added, +465/-0)
  • extensions/openai/speech-provider.test.ts (modified, +25/-0)
  • extensions/openai/speech-provider.ts (modified, +17/-2)
  • package.json (modified, +1/-0)
  • pnpm-lock.yaml (modified, +19/-0)
  • src/tts/provider-registry.test.ts (modified, +15/-11)
  • test/helpers/plugins/plugin-registration-contract-cases.ts (modified, +1/-0)

Code Example

if (key === "provider") {
  if (policy.allowProvider) {
    const providerId = rawValue.trim().toLowerCase();
    if (providerId) {
      overrides.provider = providerId;
    }
  }
  continue;
}

for (const provider of providers) {
  const parsed = provider.parseDirectiveToken?.({
    key,
    value: rawValue,
    policy,
    providerConfig: resolveDirectiveProviderConfig(provider, options),
    currentOverrides: overrides.providerOverrides?.[provider.id],
  });
  if (!parsed?.handled) {
    continue;
  }
  break;
}

---

// extensions/openai/speech-provider.ts
case "model":
case "openai_model":
case "openaimodel":
  if (!ctx.policy.allowModelId) {
    return { handled: true };
  }
  if (!isValidOpenAIModel(ctx.value, baseUrl)) {
    return { handled: false };
  }
  return { handled: true, overrides: { model: ctx.value } };

// extensions/elevenlabs/speech-provider.ts
case "model":
case "modelid":
case "model_id":
case "elevenlabs_model":
case "elevenlabsmodel":
  if (!ctx.policy.allowModelId) {
    return { handled: true };
  }
  return {
    handled: true,
    overrides: { ...(ctx.currentOverrides ?? {}), modelId: ctx.value },
  };

---

// extensions/openai/speech-provider.ts
autoSelectOrder: 10,

// extensions/elevenlabs/speech-provider.ts
autoSelectOrder: 20,

---

export type SpeechDirectiveTokenParseContext = {
  key: string;
  value: string;
  policy: SpeechModelOverridePolicy;
  selectedProvider?: SpeechProviderId;
  providerConfig?: SpeechProviderConfig;
  currentOverrides?: SpeechProviderOverrides;
};

---

if (key === "provider") {
  if (policy.allowProvider) {
    const providerId = rawValue.trim().toLowerCase();
    if (providerId) {
      overrides.provider = providerId;
    }
  }
  continue;
}

let handled = false;
for (const provider of providers) {
  const parsed = provider.parseDirectiveToken?.({
    key,
    value: rawValue,
    policy,
    providerConfig: resolveDirectiveProviderConfig(provider, options),
    currentOverrides: overrides.providerOverrides?.[provider.id],
  });
  if (!parsed?.handled) {
    continue;
  }
  handled = true;
  break;
}

---

export type SpeechDirectiveTokenParseContext = {
  key: string;
  value: string;
  policy: SpeechModelOverridePolicy;
  providerConfig?: SpeechProviderConfig;
  currentOverrides?: SpeechProviderOverrides;
};

---

// extensions/openai/speech-provider.ts
case "model":
case "openai_model":
case "openaimodel":
  return { handled: true, overrides: { model: ctx.value } };

// extensions/elevenlabs/speech-provider.ts
case "model":
case "modelid":
case "model_id":
case "elevenlabs_model":
case "elevenlabsmodel":
  return {
    handled: true,
    overrides: { ...(ctx.currentOverrides ?? {}), modelId: ctx.value },
  };

---

// extensions/openai/speech-provider.ts
autoSelectOrder: 10,

// extensions/elevenlabs/speech-provider.ts
autoSelectOrder: 20,

---

const input =
  "Hello [[tts:provider=elevenlabs voiceId=pMsXgVXv3BLzUgSXRplE stability=0.4 speed=1.1]] world\n\n" +
  "[[tts:text]](laughs) Read the song once more.[[/tts:text]]";
const result = parseTtsDirectives(input, policy);

expect(result.overrides.provider).toBe("elevenlabs");
expect(elevenlabsOverrides?.voiceId).toBe("pMsXgVXv3BLzUgSXRplE");
RAW_BUFFERClick to expand / collapse

Summary

Honor provider= in [[tts:...]] directives so only the selected speech provider consumes overlapping directive keys, and unsupported keys produce warnings instead of being silently dropped.

Problem to solve

Today, TTS directive parsing is provider-order driven, not provider-aware. In src/tts/directives.ts, the parser records provider= but still routes every other token by walking providers in autoSelectOrder and stopping at the first parseDirectiveToken that returns handled: true. That means overlapping keys can be consumed by the wrong provider before the explicitly requested provider ever sees them.

Current parser behavior:

if (key === "provider") {
  if (policy.allowProvider) {
    const providerId = rawValue.trim().toLowerCase();
    if (providerId) {
      overrides.provider = providerId;
    }
  }
  continue;
}

for (const provider of providers) {
  const parsed = provider.parseDirectiveToken?.({
    key,
    value: rawValue,
    policy,
    providerConfig: resolveDirectiveProviderConfig(provider, options),
    currentOverrides: overrides.providerOverrides?.[provider.id],
  });
  if (!parsed?.handled) {
    continue;
  }
  break;
}

A concrete example already exists with merged providers: [[tts:provider=elevenlabs model=...]]. Both OpenAI and ElevenLabs accept bare model, but OpenAI is ordered first. The result is that the directive can be consumed as an OpenAI override even though the user explicitly selected ElevenLabs. The requested ElevenLabs model is then ignored with no warning.

Overlapping provider keys today:

// extensions/openai/speech-provider.ts
case "model":
case "openai_model":
case "openaimodel":
  if (!ctx.policy.allowModelId) {
    return { handled: true };
  }
  if (!isValidOpenAIModel(ctx.value, baseUrl)) {
    return { handled: false };
  }
  return { handled: true, overrides: { model: ctx.value } };

// extensions/elevenlabs/speech-provider.ts
case "model":
case "modelid":
case "model_id":
case "elevenlabs_model":
case "elevenlabsmodel":
  if (!ctx.policy.allowModelId) {
    return { handled: true };
  }
  return {
    handled: true,
    overrides: { ...(ctx.currentOverrides ?? {}), modelId: ctx.value },
  };

Current provider order:

// extensions/openai/speech-provider.ts
autoSelectOrder: 10,

// extensions/elevenlabs/speech-provider.ts
autoSelectOrder: 20,

This is painful because the syntax looks valid, the directive is accepted, and the output is wrong in a way that is hard for users and maintainers to diagnose.

Proposed solution

Make directive parsing provider-aware at the shared parser layer rather than relying on each provider to avoid collisions on its own.

Desired behavior:

  • Parse provider= first within each [[tts:...]] directive and treat it as the active provider for the remaining tokens in that directive.
  • When an active provider is present, dispatch non-provider tokens only to that provider's parseDirectiveToken.
  • If a token is not recognized by the selected provider, preserve the directive cleanup behavior but emit a warning instead of silently dropping it.
  • Extend SpeechDirectiveTokenParseContext in src/tts/provider-types.ts to include the selected provider id, so providers can make explicit routing decisions when needed. For example:
export type SpeechDirectiveTokenParseContext = {
  key: string;
  value: string;
  policy: SpeechModelOverridePolicy;
  selectedProvider?: SpeechProviderId;
  providerConfig?: SpeechProviderConfig;
  currentOverrides?: SpeechProviderOverrides;
};
  • Keep backward compatibility for directives that do not specify provider=. For those, current fallback behavior can remain, but ambiguous shared keys should ideally move toward provider-prefixed aliases over time.

Expected outcome:

  • [[tts:provider=elevenlabs model=eleven_v3]] applies model only to ElevenLabs.
  • [[tts:provider=openai model=gpt-4o-mini-tts]] applies model only to OpenAI.
  • [[tts:provider=elevenlabs openai_model=gpt-4o-mini-tts]] produces a warning instead of silently disappearing into the wrong override bucket.

Alternatives considered

Provider-local key cleanup: Remove or narrow overlapping bare aliases such as model on one provider. This reduces the immediate collision, but it does not fix the underlying parser behavior and pushes the same problem onto future providers or other shared keys.

Provider reordering: Moving ElevenLabs ahead of OpenAI would only swap which provider "wins" the collision. It is brittle and does not honor explicit user intent.

Provider-local inference: Asking each provider to infer whether it should handle a token is weaker because SpeechDirectiveTokenParseContext does not currently include the selected provider, so the shared parser is still making the real routing decision too early.

Impact

Affected users/systems/channels: Anyone using TTS directives with explicit provider= selection, especially speech providers with overlapping key names such as OpenAI and ElevenLabs.

Severity: Annoying to workflow-blocking, depending on whether voice selection is required for the use case.

Frequency: Intermittent today, but likely to grow as more speech providers share common directive keys.

Consequence: Silent misconfiguration, incorrect voice/model selection, confusing behavior, extra debugging time, and lower trust in directive-based overrides.

Evidence/examples

Current parser behavior in src/tts/directives.ts:

if (key === "provider") {
  if (policy.allowProvider) {
    const providerId = rawValue.trim().toLowerCase();
    if (providerId) {
      overrides.provider = providerId;
    }
  }
  continue;
}

let handled = false;
for (const provider of providers) {
  const parsed = provider.parseDirectiveToken?.({
    key,
    value: rawValue,
    policy,
    providerConfig: resolveDirectiveProviderConfig(provider, options),
    currentOverrides: overrides.providerOverrides?.[provider.id],
  });
  if (!parsed?.handled) {
    continue;
  }
  handled = true;
  break;
}

Current parse context in src/tts/provider-types.ts does not include the selected provider:

export type SpeechDirectiveTokenParseContext = {
  key: string;
  value: string;
  policy: SpeechModelOverridePolicy;
  providerConfig?: SpeechProviderConfig;
  currentOverrides?: SpeechProviderOverrides;
};

OpenAI and ElevenLabs both accept bare model today:

// extensions/openai/speech-provider.ts
case "model":
case "openai_model":
case "openaimodel":
  return { handled: true, overrides: { model: ctx.value } };

// extensions/elevenlabs/speech-provider.ts
case "model":
case "modelid":
case "model_id":
case "elevenlabs_model":
case "elevenlabsmodel":
  return {
    handled: true,
    overrides: { ...(ctx.currentOverrides ?? {}), modelId: ctx.value },
  };

OpenAI currently wins the parser walk because its autoSelectOrder is lower:

// extensions/openai/speech-provider.ts
autoSelectOrder: 10,

// extensions/elevenlabs/speech-provider.ts
autoSelectOrder: 20,

Existing directive coverage in src/plugins/contracts/tts.contract.test.ts already exercises provider=elevenlabs, which is a natural place to add a regression test for provider-aware routing:

const input =
  "Hello [[tts:provider=elevenlabs voiceId=pMsXgVXv3BLzUgSXRplE stability=0.4 speed=1.1]] world\n\n" +
  "[[tts:text]](laughs) Read the song once more.[[/tts:text]]";
const result = parseTtsDirectives(input, policy);

expect(result.overrides.provider).toBe("elevenlabs");
expect(elevenlabsOverrides?.voiceId).toBe("pMsXgVXv3BLzUgSXRplE");

This issue was raised by Codex during review of PR #58607, feat(mistral): add Voxtral TTS support, but the underlying parser issue already exists today in merged OpenAI and ElevenLabs code.

Additional information

This should be treated as a parser capability request, not as a follow-up for any single provider. The main product requirement is: if a user explicitly says provider=..., OpenClaw should route directive tokens to that provider consistently and surface mismatches clearly instead of silently dropping or misapplying them.

extent analysis

TL;DR

To fix the issue, modify the TTS directive parser to prioritize the explicitly specified provider when parsing tokens, ensuring that only the selected provider consumes overlapping directive keys.

Guidance

  1. Update the parser logic: In src/tts/directives.ts, modify the parser to first check for the provider key and set the active provider. Then, for subsequent tokens, only dispatch them to the active provider's parseDirectiveToken method.
  2. Extend the parse context: Update SpeechDirectiveTokenParseContext in src/tts/provider-types.ts to include the selected provider ID, allowing providers to make explicit routing decisions.
  3. Handle unrecognized tokens: When an active provider is present, if a token is not recognized, preserve the directive cleanup behavior but emit a warning instead of silently dropping it.
  4. Test the changes: Add regression tests to src/plugins/contracts/tts.contract.test.ts to ensure the new parser behavior works correctly, especially for cases with overlapping keys like model.

Example

// Updated parser logic in src/tts/directives.ts
if (key === "provider") {
  // Set the active provider
  const providerId = rawValue.trim().toLowerCase();
  if (providerId) {
    overrides.provider = providerId;
    activeProvider = providers.find((p) => p.id === providerId);
  }
  continue;
}

// Only dispatch tokens to the active provider
if (activeProvider) {
  const parsed = activeProvider.parseDirectiveToken?.({
    key,
    value: rawValue,
    policy,
    providerConfig: resolveDirectiveProviderConfig(activeProvider, options),
    currentOverrides: overrides.providerOverrides?.[activeProvider.id],
  });
  if (!parsed?.handled) {
    // Emit a warning for unrecognized tokens
    console.warn(`Unrecognized token ${key} for provider ${activeProvider.id}`);
  }
}

Notes

The proposed solution focuses on making the parser provider-aware, which should fix the issue of overlapping keys being consumed by the wrong provider. However, it's essential to thoroughly test the changes to ensure backward compatibility and correct behavior for all providers.

Recommendation

Apply the proposed workaround by updating the parser logic and extending the parse context to include the selected provider ID. This approach honors the user's explicit provider selection and surfaces mismatches clearly, improving the overall user experience.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING