transformers - 💡(How to fix) Fix Sam3Video: CUDA out of memory [3 comments, 2 participants]

transformers2026-03-12 03:29:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44617•Fetched 2026-04-08 00:27:24

View on GitHub

Comments

Participants

Timeline

Reactions

Author

middleknight

Participants

middleknight

Saad-Mallebhari

Timeline (top)

commented ×3labeled ×1mentioned ×1subscribed ×1

Code Example

from transformers import Sam3VideoConfig, Sam3VideoModel, Sam3VideoProcessor
from transformers.video_utils import load_video
import sys
import os
import cv2
import numpy as np
import json
import time
import gc
import torch
import math
from datetime import datetime

import inspect
import os
def print_memory(message=""):
    if not torch.cuda.is_available():
        print(f"[Line {inspect.currentframe().f_back.f_lineno}] {message} - CUDA not available")
        return
    
    caller_frame = inspect.currentframe().f_back
    line_no = caller_frame.f_lineno
    
    allocated = torch.cuda.memory_allocated() / 1024**2  # MB
    reserved = torch.cuda.memory_reserved() / 1024**2    # MB
    max_allocated = torch.cuda.max_memory_allocated() / 1024**2
    
    print(f"[Line {line_no:4d}] {message} - Allocated: {allocated}")

if __name__ == "__main__":
    sam_path = "./sam3/"
    video_path = "./reverse.mp4"
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    #	device = Accelerator().device
    config = Sam3VideoConfig.from_pretrained(sam_path)
    config.image_size = 1008
    model = Sam3VideoModel.from_pretrained(sam_path).to(device, dtype=torch.bfloat16)
    # processor = Sam3VideoProcessor.from_pretrained(sam_path)
    processor = Sam3VideoProcessor.from_pretrained(sam_path, size={"height": 1008, "width": 1008})

    frame_count = 0
    detection_results = []
    track_ids_set = set()
    color_map = {}

    frame_mask_status = []

    batch_counter = 0

    video_frames, _ = load_video(video_path)
    frames_num = video_frames.shape[0]

    inference_session = processor.init_video_session(
        video=video_frames[0],
        inference_device=device,
        processing_device=device,
        video_storage_device='cpu',
        dtype=torch.bfloat16,
    )
    text = "person"
    inference_session = processor.add_text_prompt(
        inference_session=inference_session,
        text=text,
    )
    for idx in range(1, frames_num):
        processed_video = processor.video_processor(videos=video_frames[idx], device=device, return_tensors="pt")
        pixel_values_video = processed_video.pixel_values_videos[0]
        inference_session.add_new_frame(pixel_values_video)

    with torch.no_grad():
        total_model_outputs = model.propagate_in_video_iterator(
            inference_session=inference_session
        )
        print(f"111")

        for model_outputs in total_model_outputs:
            print_memory("test!!!")

            result = processor.postprocess_outputs(inference_session, model_outputs)

            print_memory("before total_model_outputs!!!")
        
        # print(f"{result['object_ids']}")
        test = 0

RAW_BUFFERClick to expand / collapse

System Info

transformers 5.3.0 Python 3.10.12 torch 2.4.0+cu124

Tracking multiple targets simultaneously, typically numbering in the dozens, results in out of memory.

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import Sam3VideoConfig, Sam3VideoModel, Sam3VideoProcessor
from transformers.video_utils import load_video
import sys
import os
import cv2
import numpy as np
import json
import time
import gc
import torch
import math
from datetime import datetime

import inspect
import os
def print_memory(message=""):
    if not torch.cuda.is_available():
        print(f"[Line {inspect.currentframe().f_back.f_lineno}] {message} - CUDA not available")
        return
    
    caller_frame = inspect.currentframe().f_back
    line_no = caller_frame.f_lineno
    
    allocated = torch.cuda.memory_allocated() / 1024**2  # MB
    reserved = torch.cuda.memory_reserved() / 1024**2    # MB
    max_allocated = torch.cuda.max_memory_allocated() / 1024**2
    
    print(f"[Line {line_no:4d}] {message} - Allocated: {allocated}")

if __name__ == "__main__":
    sam_path = "./sam3/"
    video_path = "./reverse.mp4"
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    #	device = Accelerator().device
    config = Sam3VideoConfig.from_pretrained(sam_path)
    config.image_size = 1008
    model = Sam3VideoModel.from_pretrained(sam_path).to(device, dtype=torch.bfloat16)
    # processor = Sam3VideoProcessor.from_pretrained(sam_path)
    processor = Sam3VideoProcessor.from_pretrained(sam_path, size={"height": 1008, "width": 1008})

    frame_count = 0
    detection_results = []
    track_ids_set = set()
    color_map = {}

    frame_mask_status = []

    batch_counter = 0

    video_frames, _ = load_video(video_path)
    frames_num = video_frames.shape[0]

    inference_session = processor.init_video_session(
        video=video_frames[0],
        inference_device=device,
        processing_device=device,
        video_storage_device='cpu',
        dtype=torch.bfloat16,
    )
    text = "person"
    inference_session = processor.add_text_prompt(
        inference_session=inference_session,
        text=text,
    )
    for idx in range(1, frames_num):
        processed_video = processor.video_processor(videos=video_frames[idx], device=device, return_tensors="pt")
        pixel_values_video = processed_video.pixel_values_videos[0]
        inference_session.add_new_frame(pixel_values_video)

    with torch.no_grad():
        total_model_outputs = model.propagate_in_video_iterator(
            inference_session=inference_session
        )
        print(f"111")

        for model_outputs in total_model_outputs:
            print_memory("test!!!")

            result = processor.postprocess_outputs(inference_session, model_outputs)

            print_memory("before total_model_outputs!!!")
        
        # print(f"{result['object_ids']}")
        test = 0

Expected behavior

[Line 77] test!!! - Allocated: 2696.203125 [Line 81] before total_model_outputs!!! - Allocated: 2838.58642578125 [Line 77] test!!! - Allocated: 2898.85205078125 [Line 81] before total_model_outputs!!! - Allocated: 2898.85205078125 [Line 77] test!!! - Allocated: 2958.947265625 [Line 81] before total_model_outputs!!! - Allocated: 2958.947265625 [Line 77] test!!! - Allocated: 3017.658203125 [Line 81] before total_model_outputs!!! - Allocated: 3017.658203125 [Line 77] test!!! - Allocated: 3077.51611328125

How do I fix it?

extent analysis

Fix Plan

1. Reduce Memory Allocation

The main issue is the high memory allocation during the propagate_in_video_iterator method. To fix this, we can use the torch.cuda.empty_cache() function to clear the CUDA cache after each iteration.

2. Use Efficient Data Structures

The inference_session object is storing a large amount of data. We can use a more efficient data structure, such as a torch.cuda.FloatTensor to store the video frames.

3. Optimize Model Outputs

The total_model_outputs variable is storing all the model outputs. We can optimize this by using a generator to yield the outputs one by one, instead of storing them all in memory.

Code Changes

# Clear CUDA cache after each iteration
for model_outputs in model.propagate_in_video_iterator(
    inference_session=inference_session
):
    torch.cuda.empty_cache()
    result = processor.postprocess_outputs(inference_session, model_outputs)

# Use a generator to yield model outputs
def propagate_in_video_iterator(self, inference_session):
    for frame in inference_session:
        yield self.propagate_in_frame(frame)

# Optimize model outputs
with torch.no_grad():
    for model_outputs in model.propagate_in_video_iterator(
        inference_session=inference_session
    ):
        result = processor.postprocess_outputs(inference_session, model_outputs)

4. Monitor Memory Usage

To monitor memory usage, we can use the print_memory function to print the allocated and reserved memory at each iteration.

print_memory("test!!!")

Verification

To verify that the fix worked, we can monitor the memory usage and check if it is within the expected range. We can also use tools like nvidia-smi to monitor the GPU memory usage.

Extra Tips

Make sure to clear the CUDA cache after each iteration to avoid memory leaks.
Use efficient data structures

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

How do I fix it?

#api #ssr #installation #tensor shape #autograd error #authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix Sam3Video: CUDA out of memory [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

1. Reduce Memory Allocation

2. Use Efficient Data Structures

3. Optimize Model Outputs

Code Changes

4. Monitor Memory Usage

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix Sam3Video: CUDA out of memory [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

1. Reduce Memory Allocation

2. Use Efficient Data Structures

3. Optimize Model Outputs

Code Changes

4. Monitor Memory Usage

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING