gemini-cli - 💡(How to fix) Fix Support videoMetadata fps/startOffset/endOffset for local video inputs

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

  • an error flashes briefly
  • #16741: API Error 400 when tool read_file sends binary audio/video data
RAW_BUFFERClick to expand / collapse

What would you like to be added?

I would like Gemini CLI to expose Gemini API videoMetadata options for local video inputs handled through read_file, read_many_files, or the @file workflow.

Specifically, it would be useful to configure:

  • fps
  • startOffset
  • endOffset

Gemini CLI can already read local video files such as .mp4 / .mov as multimodal input, but there does not seem to be a user-facing way to attach video metadata to those video parts.

The Gemini API supports this kind of request metadata. For example, in Python:

types.Part(
    file_data=types.FileData(
        file_uri=video.uri,
        mime_type=video.mime_type,
    ),
    video_metadata=types.VideoMetadata(
        fps=24,
        start_offset="1s",
        end_offset="4s",
    ),
)

Possible CLI UX options could include one or more of the following.

CLI flags:

gemini --video-fps 24 --video-start 1s --video-end 4s -p "@clip.mp4 analyze this video"

Config default:

{
  "media": {
    "video": {
      "fps": 24
    }
  }
}

Optional parameters on the file-reading tools would also work.

I am not attached to a specific UX. The main request is to make it possible for users to control the Gemini API videoMetadata fields when using local video files in Gemini CLI.

Why is this needed?

Some video analysis tasks are motion-sensitive, and the default sampling behavior can miss important events.

This matters for common workflows such as:

  • screen recording analysis
  • animation review
  • UI interaction recordings
  • short bug reproduction videos
  • tutorial or onboarding flow review
  • fast visual state changes
  • timestamp-specific video inspection

For example, the important moment in a video may happen within a few hundred milliseconds:

  • a drag starts
  • an object is released
  • a UI state changes
  • an error flashes briefly
  • a transition timing issue appears
  • a short visual event happens between sampled frames

If the video is sampled too sparsely, the model may miss the key event and produce a misleading analysis.

Prompting alone is not enough. A user can write “please analyze this at 24 FPS from 1s to 4s,” but that does not guarantee the API request actually uses those video sampling parameters. For these workflows, fps, startOffset, and endOffset need to be request metadata, not just natural-language instructions.

The Gemini API already supports custom video metadata, so exposing this in Gemini CLI would make local video analysis much more reliable for motion-sensitive use cases.

Additional context

Related issues that appear adjacent but not identical:

  • #3379: Agent thinks it cannot read videos, even though these are supported filetypes
  • #1691: Add youtube video link & manual upload a video
  • #16741: API Error 400 when tool read_file sends binary audio/video data

Those issues are about whether video input works or how binary media is passed through the CLI. This request is narrower: local video input exists, but users need control over video sampling quality and clip boundaries.

Relevant Gemini API documentation:

Potential implementation direction:

When Gemini CLI detects a video MIME type, it could attach optional videoMetadata to the generated video part:

{
  inlineData: {
    data: base64Data,
    mimeType,
  },
  videoMetadata: {
    fps,
    startOffset,
    endOffset,
  },
}

If video files are moved to File API uploads instead of inline data, the same metadata could be attached to the fileData part.

Backward compatibility can be preserved by omitting videoMetadata unless the user configures it.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING