hermes - 💡(How to fix) Fix [Feature]: Add VoxCPM2 as an optional local TTS provider via external helper [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11688Fetched 2026-04-18 05:59:19
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Fix Action

Fix / Workaround

For a first upstreamable version, I think the right scope is:

  • provider dispatch
  • config keys
  • readiness/probe
  • synthesis helper boundary
  • targeted tests
  • basic docs

1. Keep this as a local patch only

Code Example

N/Athis is a feature RFC, not a live bug report.
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes already has a few built-in TTS providers, including local ones like NeuTTS, but there is no first-class path for a local voice-cloning provider that needs to run in a separate Python environment.

VoxCPM2 seems like a reasonable fit for that gap:

  • local inference
  • reference-audio voice cloning
  • no cloud dependency
  • good Chinese support

The wrinkle is that Hermes itself runs on Python 3.11 in many setups, while VoxCPM2 often wants a separate environment and dependency stack. So this is not just “add another import and another API key”.

Proposed Solution

Add optional first-class voxcpm support as a built-in TTS provider, but keep the runtime isolated behind an external helper process.

Rough shape:

  • Hermes main process keeps its current runtime and does not import voxcpm directly.
  • tools/tts_tool.py gets a new provider branch, voxcpm.
  • That provider shells out to a small helper script using a configurable external Python path.
  • The helper handles:
    • readiness probe
    • one synthesis request
    • reference-audio clone mode
    • optional design-prompt fallback when no reference is available
  • Hermes keeps its existing output flow after synthesis:
    • return MP3 / WAV normally
    • convert to Opus / voice-bubble formats where the current TTS pipeline already does that

Important scope boundary for v1

I am not proposing the “hot worker” / persistent loaded model optimization in the first PR.

For a first upstreamable version, I think the right scope is:

  • provider dispatch
  • config keys
  • readiness/probe
  • synthesis helper boundary
  • targeted tests
  • basic docs

If the maintainers want the feature at all, performance work can come later.

Why this should be built-in instead of a skill

I did consider “just make it a skill”, but I think there is a real argument for provider-level support here:

  • it needs to plug into the existing text_to_speech tool path
  • it should participate in the normal platform delivery behavior
  • it should show up in setup / doctor / readiness checks
  • it needs provider-specific config and runtime validation

That feels closer to NeuTTS than to a niche one-off skill.

Alternatives Considered

1. Keep this as a local patch only

Works for one machine, but makes the integration effectively private and hard to reuse.

2. Publish it as a skill instead of a provider

Possible, but it would sit outside the normal TTS provider flow and lose a lot of the existing Hermes delivery behavior.

3. Require Hermes to import VoxCPM2 directly

I would avoid this. It couples Hermes too tightly to a heavy and version-sensitive dependency stack.

The helper boundary is the main reason this feels upstreamable at all.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

N/A — this is a feature RFC, not a live bug report.

extent analysis

TL;DR

To add VoxCPM2 as a local voice-cloning TTS provider to Hermes, create an optional first-class provider that runs in a separate Python environment using an external helper process.

Guidance

  • Implement a new provider branch voxcpm in tools/tts_tool.py that shells out to a small helper script using a configurable external Python path.
  • Ensure the helper script handles readiness probe, synthesis requests, reference-audio clone mode, and optional design-prompt fallback.
  • Keep the existing output flow after synthesis, returning MP3/WAV and converting to Opus/voice-bubble formats as needed.
  • Focus on the initial scope boundary, including provider dispatch, config keys, readiness/probe, synthesis helper boundary, targeted tests, and basic docs.

Example

No code snippet is provided as the issue is a feature request and not a bug report.

Notes

The proposed solution aims to avoid tightly coupling Hermes to VoxCPM2's dependency stack by using an external helper process, making it a more upstreamable solution.

Recommendation

Apply the proposed workaround by implementing the voxcpm provider using an external helper process, as it allows for a clean separation of dependencies and avoids version conflicts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Add VoxCPM2 as an optional local TTS provider via external helper [1 participants]