hermes - 💡(How to fix) Fix [Feature]: Standardize inbound voice-message metadata for gateway adapters [1 participants]

hermes2026-04-17 16:35:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#11687•Fetched 2026-04-18 05:59:21

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Sapientropic

Participants

Sapientropic

Code Example

N/A — design discussion only.

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Some gateway adapters can expose more than just “here is an audio file”.

For example, an inbound voice message may come with useful native metadata such as:

transcript hint / built-in ASR text
duration / playtime
source format details
platform-specific voice-message fields

Right now Hermes does not seem to have a small, explicit contract for carrying that information through normalization.

That makes follow-on work awkward:

one adapter drops the metadata
another adapter invents ad-hoc event attributes
downstream logic has to special-case platforms instead of consuming one normalized shape

Proposed Solution

I think Hermes would benefit from a small core contract for inbound voice-message metadata.

Something along these lines:

keep MessageEvent as the normalized gateway object
add a stable place for adapter-specific / normalized inbound metadata
for voice messages, normalize a minimal shared subset such as:
- voice_text_hint
- voice_duration_ms
- maybe voice_source_format

That alone would already help with:

STT fallback
better prompt context for inbound voice turns
future optional voice-processing integrations
reducing adapter-specific hacks

What I am not proposing here

I am not proposing that Hermes should bundle a full SenseVoice / emotion-analysis / profile-persistence system.

Specifically out of scope for this discussion:

bundled emotion classification
user-state or profile persistence
JSONL ledgers
platform-specific mood heuristics
Codeksei-specific syncing or memory workflows

Those are product-layer decisions. I do not think they belong in Hermes core by default.

Possible extension point

If maintainers think this is useful, a follow-up could be an optional voice-analysis hook that consumes:

normalized inbound audio
normalized voice metadata

But I would still keep that separate from the metadata contract itself.

My preference would be:

standardize the metadata contract first
only then decide whether Hermes wants any general analysis hook

Alternatives Considered

1. Keep everything adapter-specific

Lowest effort, but it does not scale well once multiple adapters expose different voice hints.

2. Bundle a full voice-analysis stack in core

I do not think this is the right first move. Too much policy, too much maintenance, too much product opinion.

3. Put everything in skills only

Skills are a good fit for higher-level workflows, but they still benefit from a stable normalized gateway contract underneath.

Feature Type

Gateway / messaging improvement

Scope

Medium (few files, < 300 lines)

Contribution

I'd like to implement this myself and submit a PR

Debug Report (optional)

N/A — design discussion only.

extent analysis

TL;DR

Introduce a small core contract for inbound voice-message metadata in Hermes to standardize the handling of adapter-specific metadata.

Guidance

Define a stable place for adapter-specific/normalized inbound metadata within the MessageEvent object.
Identify a minimal shared subset of metadata to normalize, such as voice_text_hint, voice_duration_ms, and voice_source_format.
Consider adding an optional voice-analysis hook that consumes normalized inbound audio and voice metadata, but keep it separate from the metadata contract.
Evaluate the proposed contract against the requirements of multiple adapters and downstream logic to ensure it meets the needs of various use cases.

Example

// Proposed metadata contract
interface VoiceMessageMetadata {
  voice_text_hint: string;
  voice_duration_ms: number;
  voice_source_format: string;
}

// Updated MessageEvent object with metadata contract
interface MessageEvent {
  // ... existing properties ...
  metadata: VoiceMessageMetadata;
}

Notes

The proposed solution focuses on standardizing the metadata contract, which is a crucial step in ensuring consistency across adapters and downstream logic. However, the implementation details and potential extensions (e.g., voice-analysis hook) may require further discussion and evaluation.

Recommendation

Apply workaround: Introduce a small core contract for inbound voice-message metadata, as it provides a foundation for standardizing adapter-specific metadata and enables more efficient handling of voice messages.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #prompt template #agent execution #callback error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Standardize inbound voice-message metadata for gateway adapters [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

What I am not proposing here

Possible extension point

Alternatives Considered

1. Keep everything adapter-specific

2. Bundle a full voice-analysis stack in core

3. Put everything in skills only

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Standardize inbound voice-message metadata for gateway adapters [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

Problem or Use Case

Proposed Solution

What I am not proposing here

Possible extension point

Alternatives Considered

1. Keep everything adapter-specific

2. Bundle a full voice-analysis stack in core

3. Put everything in skills only

Feature Type

Scope

Contribution

Debug Report (optional)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING