hermes - 💡(How to fix) Fix [Bug]: `video_analyze` silently fails with Gemini vision provider — `video_url` content block dropped by `_extract_multimodal

Error Message

elif ptype == "video_url": url = ((item.get("video_url") or {}).get("url") or "") if not isinstance(url, str) or not url.startswith("data:"): continue try: header, encoded = url.split(",", 1) mime = header.split(":", 1)[1].split(";", 1)[0] raw = base64.b64decode(encoded) except Exception: continue parts.append( { "inlineData": { "mimeType": mime, "data": base64.b64encode(raw).decode("ascii"), } } )

Root Cause

Root cause: agent/gemini_native_adapter.py — _extract_multimodal_parts() (line 177) translates OpenAI-format content blocks into Gemini-native parts. It correctly handles type: "text" and type: "image_url" (converting the latter to inlineData), but has no branch for type: "video_url". Video content blocks are silently dropped. Gemini receives only the text prompt and responds by asking the user to provide a video.

Code Example

Call chain:
video_analyze_tool (vision_tools.py:996)
  → async_call_llm(task="vision")
    → resolve_vision_provider_client() → GeminiNativeClient
      → build_gemini_request()
        → _build_gemini_contents()
          → _extract_multimodal_parts()   ← video_url silently dropped
            → Gemini API receives text-only → "you haven't provided a video"

---

# ~/.hermes/config.yaml
   auxiliary:
     vision:
       provider: gemini
       model: gemini-3.1-flash-lite

---

ffmpeg -f lavfi -i testsrc=duration=3:size=320x240:rate=10 -f lavfi -i sine=frequency=440:duration=3 -shortest -c:v libx264 test.mp4

---

video_analyze test.mp4 with question "Describe this video"

---

.

---



---

elif ptype == "video_url":
            url = ((item.get("video_url") or {}).get("url") or "")
            if not isinstance(url, str) or not url.startswith("data:"):
                continue
            try:
                header, encoded = url.split(",", 1)
                mime = header.split(":", 1)[1].split(";", 1)[0]
                raw = base64.b64decode(encoded)
            except Exception:
                continue
            parts.append(
                {
                    "inlineData": {
                        "mimeType": mime,
                        "data": base64.b64encode(raw).decode("ascii"),
                    }
                }
            )

Bug Description

video_analyze always returns a generic "you haven't provided a video" response when the vision provider is Gemini, regardless of whether the video file is valid, correctly formatted, and within size limits. No error is raised — the tool reports success: true, making the failure completely invisible to users.

Call chain:
video_analyze_tool (vision_tools.py:996)
  → async_call_llm(task="vision")
    → resolve_vision_provider_client() → GeminiNativeClient
      → build_gemini_request()
        → _build_gemini_contents()
          → _extract_multimodal_parts()   ← video_url silently dropped
            → Gemini API receives text-only → "you haven't provided a video"

Steps to Reproduce

Configure Gemini as the vision provider:

# ~/.hermes/config.yaml
auxiliary:
  vision:
    provider: gemini
    model: gemini-3.1-flash-lite

Create a small valid test video (any MP4 <1 MB works):

ffmpeg -f lavfi -i testsrc=duration=3:size=320x240:rate=10 -f lavfi -i sine=frequency=440:duration=3 -shortest -c:v libx264 test.mp4

Run video_analyze on it:

video_analyze test.mp4 with question "Describe this video"

Observe the response.

Expected Behavior

video_analyze should send the video to Gemini as an inlineData part (format: {"inlineData": {"mimeType": "video/mp4", "data": "<base64>"}}), which the Gemini API natively supports, and return an analysis of the actual video content.

Actual Behavior

The tool returns {"success": true} with a response like:

Since you haven't provided a specific video file or link, I can't watch it and describe it. Please provide a video link (e.g., YouTube) or upload a video file.

This happens with local files, YouTube URLs (which video_analyze_tool downloads), and files as small as 119 KB. The video data is silently discarded during the OpenAI-to-Gemini message translation, so the API never sees it and no error surfaces.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Messaging Platform (if gateway-related)

No response

Debug Report

Operating System

Ubuntu 24.04 WSL2

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

Fix (included for reference)

Add a video_url branch in _extract_multimodal_parts() mirroring the existing image_url logic:

        elif ptype == "video_url":
            url = ((item.get("video_url") or {}).get("url") or "")
            if not isinstance(url, str) or not url.startswith("data:"):
                continue
            try:
                header, encoded = url.split(",", 1)
                mime = header.split(":", 1)[1].split(";", 1)[0]
                raw = base64.b64decode(encoded)
            except Exception:
                continue
            parts.append(
                {
                    "inlineData": {
                        "mimeType": mime,
                        "data": base64.b64encode(raw).decode("ascii"),
                    }
                }
            )

Environment

Commit: 839cdd1b0 (2026-05-08)
File: agent/gemini_native_adapter.py, function _extract_multimodal_parts (line 177)
Affected: all Gemini vision provider users, all video sizes and formats

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: `video_analyze` silently fails with Gemini vision provider — `video_url` content block dropped by `_extract_multimodal_parts`

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Fix (included for reference)

Environment

Are you willing to submit a PR for this?

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug]: `video_analyze` silently fails with Gemini vision provider — `video_url` content block dropped by `_extract_multimodal_parts`

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Fix (included for reference)

Environment

Are you willing to submit a PR for this?

Still need to ship something?

RELATED_DISCOVERY

TRENDING