hermes - ✅(Solved) Fix Inbound videos from all platforms (WeChat, etc.) are silently ignored — no transcription, no vision analysis [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18204Fetched 2026-05-02 05:49:58
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
1
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Root Cause

In gateway/run.py (around line 5093), the media processing loop only handles two types:

if event.media_urls:
    image_paths = []
    audio_paths = []
    for i, path in enumerate(event.media_urls):
        mtype = event.media_types[i] if i < len(event.media_types) else ""
        if mtype.startswith("image/") or event.message_type == MessageType.PHOTO:
            image_paths.append(path)
        if mtype.startswith("audio/") or event.message_type in (MessageType.VOICE, MessageType.AUDIO):
            audio_paths.append(path)
    # video/mp4 → NOT handled at all ❌

Fix Action

Fixed

PR fix notes

PR #18243: fix(gateway): process inbound video messages via ffmpeg extraction

Description (problem / solution / changelog)

Summary

Fixes #18204 — Inbound videos from all platforms (WeChat, Telegram, etc.) were silently ignored because the media processing loop in gateway/run.py only handled image/* and audio/* MIME types. video/* was completely skipped, resulting in empty prompts sent to the LLM API and HTTP 400 errors.

Changes

gateway/run.py

  • Media routing loop (~line 5113): Added video_paths collection alongside image_paths and audio_paths. Videos are detected by video/* MIME type or MessageType.VIDEO.
  • _extract_video_components(): New async method that uses ffmpeg to extract:
    • Audio track → WAV (16kHz mono PCM) for STT transcription
    • Up to 3 keyframes → JPEG for vision analysis (I-frame extraction first, falls back to fps=1/10 sampling)
    • Handles missing ffmpeg, timeouts, and errors gracefully
  • _enrich_message_with_video(): New async method that orchestrates video processing — extracts components, delegates to existing _enrich_message_with_transcription and _enrich_message_with_vision, and provides a fallback note when ffmpeg is unavailable. Cleans up temp files after processing.

tests/gateway/test_video_media_processing.py (new)

7 tests covering:

  • MIME type routing (video/mp4)
  • MessageType.VIDEO routing
  • Mixed media routing (image + video + audio)
  • Graceful handling when ffmpeg is missing
  • Timeout handling
  • Fallback note when extraction fails
  • Audio-only enrichment path

Testing

  • All 7 new tests pass
  • All 3836 existing gateway tests pass (1 pre-existing failure in test_teams.py unrelated to this change)

Notes

  • ffmpeg is an optional runtime dependency — videos degrade gracefully to a text note if ffmpeg is not installed
  • Temp files are cleaned up in a finally-like pattern via shutil.rmtree
  • No changes to platform adapters needed — they already download videos correctly

Changed files

  • gateway/run.py (modified, +160/-0)
  • tests/gateway/test_video_media_processing.py (added, +165/-0)

Code Example

if event.media_urls:
    image_paths = []
    audio_paths = []
    for i, path in enumerate(event.media_urls):
        mtype = event.media_types[i] if i < len(event.media_types) else ""
        if mtype.startswith("image/") or event.message_type == MessageType.PHOTO:
            image_paths.append(path)
        if mtype.startswith("audio/") or event.message_type in (MessageType.VOICE, MessageType.AUDIO):
            audio_paths.append(path)
    # video/mp4 → NOT handled at all ❌
RAW_BUFFERClick to expand / collapse

Bug Description

When a user sends a video message through any platform (WeChat, Telegram, etc.), the gateway downloads the video file successfully but then silently ignores it in the message processing pipeline. The video is not transcribed (audio extraction) and not analyzed (vision), resulting in an empty prompt being sent to the LLM API, which returns HTTP 400.

Steps to Reproduce

  1. Send a video message via WeChat (or any platform)
  2. Observe gateway log: inbound ... media=1 (video detected and downloaded)
  3. The video file is cached locally (e.g., cache/videos/)
  4. But in gateway/run.py, only image/* and audio/* media types are processed
  5. video/* is completely ignored → empty prompt → HTTP 400: "The prompt parameter was not received normally"

Root Cause

In gateway/run.py (around line 5093), the media processing loop only handles two types:

if event.media_urls:
    image_paths = []
    audio_paths = []
    for i, path in enumerate(event.media_urls):
        mtype = event.media_types[i] if i < len(event.media_types) else ""
        if mtype.startswith("image/") or event.message_type == MessageType.PHOTO:
            image_paths.append(path)
        if mtype.startswith("audio/") or event.message_type in (MessageType.VOICE, MessageType.AUDIO):
            audio_paths.append(path)
    # video/mp4 → NOT handled at all ❌

Expected Behavior

Inbound videos should be processed similarly to how images and audio are handled:

  1. Extract audio track via ffmpeg → run through STT (whisper) → append transcription to prompt
  2. Extract key frames via ffmpeg → run through vision analysis → append visual description to prompt
  3. This way the model receives meaningful content instead of an empty prompt

Environment

  • Hermes Agent v0.12.0 (2026.4.30)
  • Platform: WeChat (Weixin), but affects all platforms
  • Python 3.11.15

Related

  • Telegram video caching was added in commit 9fdfb09ae (platform adapter level), but the gateway run.py processing is still missing
  • WeChat adapter already downloads videos successfully (_download_video in weixin.py)

extent analysis

TL;DR

The issue can be fixed by modifying the gateway/run.py to handle video/* media types and process them similarly to images and audio.

Guidance

  • Modify the media processing loop in gateway/run.py to handle video/* media types by adding a condition to check for mtype.startswith("video/").
  • Extract the audio track from the video using ffmpeg and run it through STT (whisper) to append the transcription to the prompt.
  • Extract key frames from the video using ffmpeg and run them through vision analysis to append a visual description to the prompt.
  • Verify the fix by sending a video message and checking the gateway log for successful processing and the LLM API response.

Example

if event.media_urls:
    image_paths = []
    audio_paths = []
    video_paths = []  # Add a list to store video paths
    for i, path in enumerate(event.media_urls):
        mtype = event.media_types[i] if i < len(event.media_types) else ""
        if mtype.startswith("image/") or event.message_type == MessageType.PHOTO:
            image_paths.append(path)
        if mtype.startswith("audio/") or event.message_type in (MessageType.VOICE, MessageType.AUDIO):
            audio_paths.append(path)
        if mtype.startswith("video/"):  # Add a condition to handle video types
            video_paths.append(path)
    # Process video paths to extract audio and key frames

Notes

The fix assumes that the necessary dependencies, such as ffmpeg, are installed and configured correctly. Additionally, the vision analysis and STT (whisper) components should be properly set up to handle the extracted audio and key frames.

Recommendation

Apply the workaround by modifying the gateway/run.py to handle video/* media types, as this will allow the gateway to process videos correctly and send meaningful content to the LLM API.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING