hermes - 💡(How to fix) Fix [Feature]: Add VoxCPM2 as an optional local TTS provider via external helper [1 participants]

Sapientropic · 2026-04-17T16:35:26Z

[hermes] Problem or Use Case Hermes already has a few built-in TTS providers, including local ones like NeuTTS, but there is no first-class path for a local vo… ## Fix / Workaround For a first upstreamable version, I think the right scope is: - provider dispatch - config keys - readiness/probe - synthesis helper boundary - targeted tests - basic docs ### 1. Keep this as a local patch only ## Problem or Use Case Hermes already has a few built-in TTS providers, including local ones like NeuTTS, but there is no first-class path for a local voice-cloning provider that needs to run in a separate Python environment. VoxCPM2 seems like a reasonable fit for that gap: - local inference - reference-audio voice cloning - no cloud dependency - good Chinese support The wrinkle is that Hermes itself runs on Python 3.11 in many setups, while VoxCPM2 often wants a separate environment and dependency stack. So this is not just “add another import and another API key”. ## Proposed Solution Add **optional** first-class `voxcpm` support as a built-in TTS provider, but keep the runtime isolated behind an external helper process. Rough shape: - Hermes main process keeps its current runtime and does **not** import `voxcpm` directly. - `tools/tts_tool.py` gets a new provider branch, `voxcpm`. - That provider shells out to a small helper script using a configurable external Python path. - The helper handles: - readiness probe - one synthesis request - reference-audio clone mode - optional design-prompt fallback when no reference is available - Hermes keeps its existing output flow after synthesis: - return MP3 / WAV normally - convert to Opus / voice-bubble formats where the current TTS pipeline already does that ### Important scope boundary for v1 I am **not** proposing the “hot worker” / persistent loaded model optimization in the first PR. For a first upstreamable version, I think the right scope is: - provider dispatch - config keys - readiness/probe - synthesis helper boundary - targeted tests - basic docs If the maintainers want the feature at all, performance work can come later. ## Why this should be built-in instead of a skill I did consider “just make it a skill”, but I think there is a real argument for provider-level support here: - it needs to plug into the existing `text_to_speech` tool path - it should participate in the normal platform delivery behavior - it should show up in setup / doctor / readiness checks - it needs provider-specific config and runtime validation That feels closer to NeuTTS than to a niche one-off skill. ## Alternatives Considered ### 1. Keep this as a local patch only Works for one machine, but makes the integration effectively private and hard to reuse. ### 2. Publish it as a skill instead of a provider Possible, but it would sit outside the normal TTS provider flow and lose a lot of the existing Hermes delivery behavior. ### 3. Require Hermes to import VoxCPM2 directly I would avoid this. It couples Hermes too tightly to a heavy and version-sensitive dependency stack. The helper boundary is the main reason this feels upstreamable at all. ## Feature Type Configuration option ## Scope Medium (few files, < 300 lines) ## Contribution - [x] I'd like to implement this myself and submit a PR ## Debug Report (optional) ```shell N/A — this is a feature RFC, not a live bug report. ```

hermes2026-04-17 16:35:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#11688•Fetched 2026-04-18 05:59:19

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Sapientropic

Participants

Sapientropic

Fix Action

Fix / Workaround

For a first upstreamable version, I think the right scope is:

provider dispatch
config keys
readiness/probe
synthesis helper boundary
targeted tests
basic docs

1. Keep this as a local patch only

Code Example

N/A — this is a feature RFC, not a live bug report.

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes already has a few built-in TTS providers, including local ones like NeuTTS, but there is no first-class path for a local voice-cloning provider that needs to run in a separate Python environment.

VoxCPM2 seems like a reasonable fit for that gap:

local inference
reference-audio voice cloning
no cloud dependency
good Chinese support

The wrinkle is that Hermes itself runs on Python 3.11 in many setups, while VoxCPM2 often wants a separate environment and dependency stack. So this is not just “add another import and another API key”.

Proposed Solution

Add optional first-class voxcpm support as a built-in TTS provider, but keep the runtime isolated behind an external helper process.

Rough shape:

Hermes main process keeps its current runtime and does not import voxcpm directly.
tools/tts_tool.py gets a new provider branch, voxcpm.
That provider shells out to a small helper script using a configurable external Python path.
The helper handles:
- readiness probe
- one synthesis request
- reference-audio clone mode
- optional design-prompt fallback when no reference is available
Hermes keeps its existing output flow after synthesis:
- return MP3 / WAV normally
- convert to Opus / voice-bubble formats where the current TTS pipeline already does that

Important scope boundary for v1

I am not proposing the “hot worker” / persistent loaded model optimization in the first PR.

For a first upstreamable version, I think the right scope is:

provider dispatch
config keys
readiness/probe
synthesis helper boundary
targeted tests
basic docs

If the maintainers want the feature at all, performance work can come later.

Why this should be built-in instead of a skill

I did consider “just make it a skill”, but I think there is a real argument for provider-level support here:

it needs to plug into the existing text_to_speech tool path
it should participate in the normal platform delivery behavior
it should show up in setup / doctor / readiness checks
it needs provider-specific config and runtime validation

That feels closer to NeuTTS than to a niche one-off skill.

Alternatives Considered

1. Keep this as a local patch only

Works for one machine, but makes the integration effectively private and hard to reuse.

2. Publish it as a skill instead of a provider

Possible, but it would sit outside the normal TTS provider flow and lose a lot of the existing Hermes delivery behavior.

3. Require Hermes to import VoxCPM2 directly

I would avoid this. It couples Hermes too tightly to a heavy and version-sensitive dependency stack.

The helper boundary is the main reason this feels upstreamable at all.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

N/A — this is a feature RFC, not a live bug report.

extent analysis

TL;DR

To add VoxCPM2 as a local voice-cloning TTS provider to Hermes, create an optional first-class provider that runs in a separate Python environment using an external helper process.

Guidance

Implement a new provider branch voxcpm in tools/tts_tool.py that shells out to a small helper script using a configurable external Python path.
Ensure the helper script handles readiness probe, synthesis requests, reference-audio clone mode, and optional design-prompt fallback.
Keep the existing output flow after synthesis, returning MP3/WAV and converting to Opus/voice-bubble formats as needed.
Focus on the initial scope boundary, including provider dispatch, config keys, readiness/probe, synthesis helper boundary, targeted tests, and basic docs.

Example

No code snippet is provided as the issue is a feature request and not a bug report.

Notes

The proposed solution aims to avoid tightly coupling Hermes to VoxCPM2's dependency stack by using an external helper process, making it a more upstreamable solution.

Recommendation

Apply the proposed workaround by implementing the voxcpm provider using an external helper process, as it allows for a clean separation of dependencies and avoids version conflicts.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Add VoxCPM2 as an optional local TTS provider via external helper [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

1. Keep this as a local patch only

Code Example

Problem or Use Case

Proposed Solution

Important scope boundary for v1

Why this should be built-in instead of a skill

Alternatives Considered

1. Keep this as a local patch only

2. Publish it as a skill instead of a provider

3. Require Hermes to import VoxCPM2 directly

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Add VoxCPM2 as an optional local TTS provider via external helper [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fix / Workaround

1. Keep this as a local patch only

Code Example

Problem or Use Case

Proposed Solution

Important scope boundary for v1

Why this should be built-in instead of a skill

Alternatives Considered

1. Keep this as a local patch only

2. Publish it as a skill instead of a provider

3. Require Hermes to import VoxCPM2 directly

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING