hermes - 💡(How to fix) Fix [Feature]: Standardize inbound voice-message metadata for gateway adapters [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11687Fetched 2026-04-18 05:59:21
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Code Example

N/A — design discussion only.
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Some gateway adapters can expose more than just “here is an audio file”.

For example, an inbound voice message may come with useful native metadata such as:

  • transcript hint / built-in ASR text
  • duration / playtime
  • source format details
  • platform-specific voice-message fields

Right now Hermes does not seem to have a small, explicit contract for carrying that information through normalization.

That makes follow-on work awkward:

  • one adapter drops the metadata
  • another adapter invents ad-hoc event attributes
  • downstream logic has to special-case platforms instead of consuming one normalized shape

Proposed Solution

I think Hermes would benefit from a small core contract for inbound voice-message metadata.

Something along these lines:

  • keep MessageEvent as the normalized gateway object
  • add a stable place for adapter-specific / normalized inbound metadata
  • for voice messages, normalize a minimal shared subset such as:
    • voice_text_hint
    • voice_duration_ms
    • maybe voice_source_format

That alone would already help with:

  • STT fallback
  • better prompt context for inbound voice turns
  • future optional voice-processing integrations
  • reducing adapter-specific hacks

What I am not proposing here

I am not proposing that Hermes should bundle a full SenseVoice / emotion-analysis / profile-persistence system.

Specifically out of scope for this discussion:

  • bundled emotion classification
  • user-state or profile persistence
  • JSONL ledgers
  • platform-specific mood heuristics
  • Codeksei-specific syncing or memory workflows

Those are product-layer decisions. I do not think they belong in Hermes core by default.

Possible extension point

If maintainers think this is useful, a follow-up could be an optional voice-analysis hook that consumes:

  • normalized inbound audio
  • normalized voice metadata

But I would still keep that separate from the metadata contract itself.

My preference would be:

  1. standardize the metadata contract first
  2. only then decide whether Hermes wants any general analysis hook

Alternatives Considered

1. Keep everything adapter-specific

Lowest effort, but it does not scale well once multiple adapters expose different voice hints.

2. Bundle a full voice-analysis stack in core

I do not think this is the right first move. Too much policy, too much maintenance, too much product opinion.

3. Put everything in skills only

Skills are a good fit for higher-level workflows, but they still benefit from a stable normalized gateway contract underneath.

Feature Type

Gateway / messaging improvement

Scope

Medium (few files, < 300 lines)

Contribution

  • I'd like to implement this myself and submit a PR

Debug Report (optional)

N/A — design discussion only.

extent analysis

TL;DR

Introduce a small core contract for inbound voice-message metadata in Hermes to standardize the handling of adapter-specific metadata.

Guidance

  • Define a stable place for adapter-specific/normalized inbound metadata within the MessageEvent object.
  • Identify a minimal shared subset of metadata to normalize, such as voice_text_hint, voice_duration_ms, and voice_source_format.
  • Consider adding an optional voice-analysis hook that consumes normalized inbound audio and voice metadata, but keep it separate from the metadata contract.
  • Evaluate the proposed contract against the requirements of multiple adapters and downstream logic to ensure it meets the needs of various use cases.

Example

// Proposed metadata contract
interface VoiceMessageMetadata {
  voice_text_hint: string;
  voice_duration_ms: number;
  voice_source_format: string;
}

// Updated MessageEvent object with metadata contract
interface MessageEvent {
  // ... existing properties ...
  metadata: VoiceMessageMetadata;
}

Notes

The proposed solution focuses on standardizing the metadata contract, which is a crucial step in ensuring consistency across adapters and downstream logic. However, the implementation details and potential extensions (e.g., voice-analysis hook) may require further discussion and evaluation.

Recommendation

Apply workaround: Introduce a small core contract for inbound voice-message metadata, as it provides a foundation for standardizing adapter-specific metadata and enables more efficient handling of voice messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Standardize inbound voice-message metadata for gateway adapters [1 participants]