hermes - 💡(How to fix) Fix [Bug]: `video_analyze` silently fails with Gemini vision provider — `video_url` content block dropped by `_extract_multimodal_parts`

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

elif ptype == "video_url": url = ((item.get("video_url") or {}).get("url") or "") if not isinstance(url, str) or not url.startswith("data:"): continue try: header, encoded = url.split(",", 1) mime = header.split(":", 1)[1].split(";", 1)[0] raw = base64.b64decode(encoded) except Exception: continue parts.append( { "inlineData": { "mimeType": mime, "data": base64.b64encode(raw).decode("ascii"), } } )

Root Cause

Root cause: agent/gemini_native_adapter.py_extract_multimodal_parts() (line 177) translates OpenAI-format content blocks into Gemini-native parts. It correctly handles type: "text" and type: "image_url" (converting the latter to inlineData), but has no branch for type: "video_url". Video content blocks are silently dropped. Gemini receives only the text prompt and responds by asking the user to provide a video.

Code Example

Call chain:
video_analyze_tool (vision_tools.py:996)
async_call_llm(task="vision")
resolve_vision_provider_client()GeminiNativeClient
build_gemini_request()
_build_gemini_contents()
_extract_multimodal_parts()   ← video_url silently dropped
Gemini API receives text-only → "you haven't provided a video"

---

# ~/.hermes/config.yaml
   auxiliary:
     vision:
       provider: gemini
       model: gemini-3.1-flash-lite

---

ffmpeg -f lavfi -i testsrc=duration=3:size=320x240:rate=10 -f lavfi -i sine=frequency=440:duration=3 -shortest -c:v libx264 test.mp4

---

video_analyze test.mp4 with question "Describe this video"

---

.

---



---

elif ptype == "video_url":
            url = ((item.get("video_url") or {}).get("url") or "")
            if not isinstance(url, str) or not url.startswith("data:"):
                continue
            try:
                header, encoded = url.split(",", 1)
                mime = header.split(":", 1)[1].split(";", 1)[0]
                raw = base64.b64decode(encoded)
            except Exception:
                continue
            parts.append(
                {
                    "inlineData": {
                        "mimeType": mime,
                        "data": base64.b64encode(raw).decode("ascii"),
                    }
                }
            )
RAW_BUFFERClick to expand / collapse

Bug Description

video_analyze always returns a generic "you haven't provided a video" response when the vision provider is Gemini, regardless of whether the video file is valid, correctly formatted, and within size limits. No error is raised — the tool reports success: true, making the failure completely invisible to users.

Root cause: agent/gemini_native_adapter.py_extract_multimodal_parts() (line 177) translates OpenAI-format content blocks into Gemini-native parts. It correctly handles type: "text" and type: "image_url" (converting the latter to inlineData), but has no branch for type: "video_url". Video content blocks are silently dropped. Gemini receives only the text prompt and responds by asking the user to provide a video.

Call chain:
video_analyze_tool (vision_tools.py:996)
  → async_call_llm(task="vision")
    → resolve_vision_provider_client() → GeminiNativeClient
      → build_gemini_request()
        → _build_gemini_contents()
          → _extract_multimodal_parts()   ← video_url silently dropped
            → Gemini API receives text-only → "you haven't provided a video"

Steps to Reproduce

  1. Configure Gemini as the vision provider:

    # ~/.hermes/config.yaml
    auxiliary:
      vision:
        provider: gemini
        model: gemini-3.1-flash-lite
  2. Create a small valid test video (any MP4 <1 MB works):

    ffmpeg -f lavfi -i testsrc=duration=3:size=320x240:rate=10 -f lavfi -i sine=frequency=440:duration=3 -shortest -c:v libx264 test.mp4
  3. Run video_analyze on it:

    video_analyze test.mp4 with question "Describe this video"
  4. Observe the response.

Expected Behavior

video_analyze should send the video to Gemini as an inlineData part (format: {"inlineData": {"mimeType": "video/mp4", "data": "<base64>"}}), which the Gemini API natively supports, and return an analysis of the actual video content.

Actual Behavior

The tool returns {"success": true} with a response like:

Since you haven't provided a specific video file or link, I can't watch it and describe it. Please provide a video link (e.g., YouTube) or upload a video file.

This happens with local files, YouTube URLs (which video_analyze_tool downloads), and files as small as 119 KB. The video data is silently discarded during the OpenAI-to-Gemini message translation, so the API never sees it and no error surfaces.

Affected Component

Tools (terminal, file ops, web, code execution, etc.)

Messaging Platform (if gateway-related)

No response

Debug Report

.

Operating System

Ubuntu 24.04 WSL2

Python Version

No response

Hermes Version

No response

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

Fix (included for reference)

Add a video_url branch in _extract_multimodal_parts() mirroring the existing image_url logic:

        elif ptype == "video_url":
            url = ((item.get("video_url") or {}).get("url") or "")
            if not isinstance(url, str) or not url.startswith("data:"):
                continue
            try:
                header, encoded = url.split(",", 1)
                mime = header.split(":", 1)[1].split(";", 1)[0]
                raw = base64.b64decode(encoded)
            except Exception:
                continue
            parts.append(
                {
                    "inlineData": {
                        "mimeType": mime,
                        "data": base64.b64encode(raw).decode("ascii"),
                    }
                }
            )

Environment

  • Commit: 839cdd1b0 (2026-05-08)
  • File: agent/gemini_native_adapter.py, function _extract_multimodal_parts (line 177)
  • Affected: all Gemini vision provider users, all video sizes and formats

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: `video_analyze` silently fails with Gemini vision provider — `video_url` content block dropped by `_extract_multimodal_parts`