hermes - ✅(Solved) Fix feat: Include audio duration in Feishu voice message payload [1 pull requests, 2 comments, 2 participants]

hermes2026-04-27 11:41:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16524•Fetched 2026-04-28 06:52:48

View on GitHub

Comments

Participants

Timeline

Reactions

Author

111-test-111

Participants

111-test-111

alt-glitch

Timeline (top)

labeled ×4commented ×2cross-referenced ×2referenced ×1

Error Message

import subprocess, json

def _get_audio_duration_ms(file_path: str) -> int: """Extract audio duration in milliseconds using ffprobe.""" try: result = subprocess.run( ["ffprobe", "-v", "error", "-show_entries", "format=duration", "-of", "default=noprint_wrappers=1:nokey=1", file_path], capture_output=True, text=True, timeout=10 ) return int(float(result.stdout.strip()) * 1000) except Exception: return 0

Root Cause

The root cause is in _send_uploaded_file_message (feishu.py line 3892):

Fix Action

Fixed

Fixed by PR: fix(feishu): include audio duration in voice message payload (#16524) (https://github.com/NousResearch/hermes-agent/pull/16736)

PR fix notes

PR #16736: fix(feishu): include audio duration in voice message payload (#16524)

Repository: NousResearch/hermes-agent
Author: Tranquil-Flow
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16736

Description (problem / solution / changelog)

What does this PR do?

Feishu's audio message API requires a positive duration field (in milliseconds) for uploaded audio to render as a playable voice bubble with a pre-play length indicator. Without it, the client falls back to a generic file attachment with a green music-note icon.

_send_uploaded_file_message in gateway/platforms/feishu.py was sending only {"file_key": ...} for the audio routing branch, so every voice message — TTS output, MEDIA:/path/to/audio.opus deliveries — landed as a plain file attachment.

This PR adds a best-effort duration probe that runs only on the audio branch, leaving image/video/document payloads unchanged.

Related Issue

Fixes #16524

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

gateway/platforms/feishu.py — new _get_audio_duration_ms() helper that probes mutagen (transitive dep via discord.py[voice]), then ffprobe on PATH, then a size-based heuristic, with a fixed minimum fallback so the field is always populated. Mirrors the mutagen+heuristic pattern already used for Discord native voice messages (gateway/platforms/discord.py:1396-1402).
gateway/platforms/feishu.py:_send_uploaded_file_message — only the audio routing branch now adds "duration" to the payload. Video, document, and image payloads are unchanged.
tests/gateway/test_feishu.py — updated test_send_voice_uploads_opus_and_sends_audio_message to assert a positive integer duration field; added test_send_video_does_not_inject_audio_duration_field regression for the video branch; added TestFeishuAudioDurationProbe covering real-file, missing-file, and size-heuristic monotonicity branches (the heuristic test forces both mutagen and ffprobe unavailable so the heuristic branch is actually exercised).

How to Test

Configure a Feishu adapter with TTS enabled (auxiliary.tts resolved to ElevenLabs / Azure / etc., or via text_to_speech tool).
Trigger a voice reply: ask a Feishu chat anything that produces audio output, or send MEDIA:/path/to/audio.opus.
Before this fix: message renders in Feishu as a generic file attachment with a music-note icon; clicking is required to see duration.
After this fix: message renders as a voice bubble with the duration displayed before play.

Automated:

pytest tests/gateway/test_feishu.py -q

Result on macOS 15.6.1 / Python 3.14.2: 156 passed, 37 skipped. The 37 skips are pre-existing (lark_oapi optional dep gating) and unaffected by this PR.

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform: macOS 15.6.1 (Python 3.14.2)

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A (N/A — internal helper, behaviour-only fix)
I've updated cli-config.yaml.example if I added/changed config keys — or N/A (N/A — no config keys touched)
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A (N/A)
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A (every probe is in a try/except; mutagen and ffprobe are both optional and the size-heuristic / fixed-minimum fallbacks always succeed; no platform-specific syscalls used)
I've updated tool descriptions/schemas if I changed tool behavior — or N/A (N/A — gateway adapter, not a tool)

Screenshots / Logs

$ pytest tests/gateway/test_feishu.py -q
.................................................................s..s.ss [ 37%]
.s..s....ssss.ssss.ss.s.sss....sssss.sssss......ss.ss..s......s....s.... [ 74%]
.................................................                        [100%]
156 passed, 37 skipped, 193 warnings in 3.32s

Changed files

gateway/platforms/feishu.py (modified, +65/-1)
tests/gateway/test_feishu.py (modified, +126/-1)

Code Example

payload = json.dumps({"file_key": file_key})

---

{"file_key": "xxx", "duration": 7854}

---

import subprocess, json

def _get_audio_duration_ms(file_path: str) -> int:
    """Extract audio duration in milliseconds using ffprobe."""
    try:
        result = subprocess.run(
            ["ffprobe", "-v", "error", "-show_entries", "format=duration",
             "-of", "default=noprint_wrappers=1:nokey=1", file_path],
            capture_output=True, text=True, timeout=10
        )
        return int(float(result.stdout.strip()) * 1000)
    except Exception:
        return 0

---

if resolved_message_type == "audio":
    duration_ms = _get_audio_duration_ms(file_path)
    payload = json.dumps({"file_key": file_key, "duration": duration_ms})
else:
    payload = json.dumps({"file_key": file_key})

RAW_BUFFERClick to expand / collapse

Problem

When sending audio files via the Feishu gateway, the voice message is delivered as a file attachment instead of a playable voice bubble with duration displayed.

The root cause is in _send_uploaded_file_message (feishu.py line 3892):

payload = json.dumps({"file_key": file_key})

The Feishu audio message API requires a duration field (in milliseconds):

{"file_key": "xxx", "duration": 7854}

Without it, the client renders the audio as a generic file attachment (green music note icon) rather than a voice bubble with pre-play duration display.

Steps to Reproduce

Send a voice message from Hermes to a Feishu chat (e.g., via text_to_speech tool or MEDIA:/path/to/audio.opus)
The message arrives as a file attachment, not a voice bubble
Duration is only visible after clicking play

Proposed Fix

In _send_uploaded_file_message, when resolved_message_type == "audio", extract duration from the audio file using ffprobe (already available in the environment) and include it in the payload:

import subprocess, json

def _get_audio_duration_ms(file_path: str) -> int:
    """Extract audio duration in milliseconds using ffprobe."""
    try:
        result = subprocess.run(
            ["ffprobe", "-v", "error", "-show_entries", "format=duration",
             "-of", "default=noprint_wrappers=1:nokey=1", file_path],
            capture_output=True, text=True, timeout=10
        )
        return int(float(result.stdout.strip()) * 1000)
    except Exception:
        return 0

Then in the no-caption branch:

if resolved_message_type == "audio":
    duration_ms = _get_audio_duration_ms(file_path)
    payload = json.dumps({"file_key": file_key, "duration": duration_ms})
else:
    payload = json.dumps({"file_key": file_key})

Environment

Hermes Agent: latest main
macOS Apple Silicon
Feishu bot API
Audio format: opus (also affects mp3, wav, etc.)

Additional Context

The caption branch (line 3876-3887) uses a post message type with media tag, which may handle duration differently — but the no-caption path (the common case for voice messages) definitely needs the duration field.

extent analysis

TL;DR

The most likely fix is to modify the _send_uploaded_file_message function to include the audio duration in the payload when sending audio files via the Feishu gateway.

Guidance

Extract the audio duration from the file using the provided _get_audio_duration_ms function and include it in the payload when resolved_message_type is "audio".
Update the payload construction to include the duration field, as shown in the proposed fix: payload = json.dumps({"file_key": file_key, "duration": duration_ms}).
Verify that the audio message is delivered as a playable voice bubble with duration displayed by sending a test voice message after applying the fix.
Ensure that the ffprobe command is available in the environment and functioning correctly to extract the audio duration.

Example

if resolved_message_type == "audio":
    duration_ms = _get_audio_duration_ms(file_path)
    payload = json.dumps({"file_key": file_key, "duration": duration_ms})

Notes

The proposed fix assumes that the ffprobe command is available and functioning correctly. If issues arise with audio duration extraction, verify that ffprobe is installed and configured properly.

Recommendation

Apply the proposed workaround by modifying the _send_uploaded_file_message function to include the audio duration in the payload, as this directly addresses the identified root cause of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt formatting #chain error #conversation history #tool integration

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.