transformers - ✅(Solved) Fix Adding SAM3-LiteText [1 pull requests, 7 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44205Fetched 2026-04-08 00:29:52
View on GitHub
Comments
7
Participants
3
Timeline
28
Reactions
0
Timeline (top)
subscribed ×10mentioned ×9commented ×7cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #44320: Add SAM3-LiteText

Description (problem / solution / changelog)

What does this PR do?

This PR adds SAM3-LiteText: An Anatomical Study of the SAM3 Text Encoder for Efficient Vision-Language Segmentation.

Fixes #44205

Changed files

  • docs/source/en/_toctree.yml (modified, +2/-0)
  • docs/source/en/model_doc/sam3_lite_text.md (added, +129/-0)
  • progress.md (added, +103/-0)
  • src/transformers/cli/add_new_model_like.py (modified, +1/-0)
  • src/transformers/cli/serve.py (modified, +9/-9)
  • src/transformers/models/__init__.py (modified, +1/-0)
  • src/transformers/models/auto/configuration_auto.py (modified, +11/-0)
  • src/transformers/models/auto/modeling_auto.py (modified, +3/-0)
  • src/transformers/models/sam3_lite_text/__init__.py (added, +28/-0)
  • src/transformers/models/sam3_lite_text/configuration_sam3_lite_text.py (added, +555/-0)
  • src/transformers/models/sam3_lite_text/convert_sam3_lite_text_to_hf.py (added, +500/-0)
  • src/transformers/models/sam3_lite_text/modeling_sam3_lite_text.py (added, +2674/-0)
  • src/transformers/models/sam3_lite_text/modular_sam3_lite_text.py (added, +543/-0)
  • tests/models/sam3_lite_text/__init__.py (added, +0/-0)
  • tests/models/sam3_lite_text/test_modeling_sam3_lite_text.py (added, +1591/-0)
  • utils/check_repo.py (modified, +1/-0)
RAW_BUFFERClick to expand / collapse

Model description

I would like to propose adding SAM3-LiteText. This model introduces a highly efficient, lightweight text-prompting capability to the SAM3 architecture. It offers excellent performance for text-guided segmentation tasks while maintaining a small computational footprint (params reduced by 80%), making it a fantastic candidate for transformers integration. @NielsRogge

The modular implementation should be relatively straightforward. The architecture builds upon SAM3 and replaced it's text encoder by mobileclip text encoders. It should be highly feasible to map its components to native transformers modules, re-using existing ViT and text encoder building blocks where possible.

Open source status

[x] The model implementation is available

[x] The model weights are available

Provide useful links for the implementation

authors: @SimonZeng7108

original repo: https://github.com/SimonZeng7108/efficientsam3/tree/sam3_litetext

weights: https://huggingface.co/Simon7108528/EfficientSAM3/tree/main/sam3_litetext

Open source status

  • The model implementation is available
  • The model weights are available

Provide useful links for the implementation

No response

extent analysis

Fix Plan

Integrate SAM3-LiteText Model into Existing Architecture

Step 1: Clone the EfficientSAM3 Repository

Clone the repository containing the SAM3-LiteText model implementation:

git clone https://github.com/SimonZeng7108/efficientsam3.git

Step 2: Install Required Dependencies

Install the required dependencies to build and integrate the model:

pip install transformers torch

Step 3: Map Components to Native Transformers Modules

Map the SAM3-LiteText components to native transformers modules, reusing existing ViT and text encoder building blocks where possible.

Example code snippet:

import torch
from transformers import ViTFeatureExtractor, ViTForSegmentation

# Load the EfficientSAM3 model weights
model_weights = torch.load('sam3_litetext.pth')

# Load the ViT feature extractor and model
feature_extractor = ViTFeatureExtractor.from_pretrained('vit-base-patch16-224')
model = ViTForSegmentation.from_pretrained('vit-base-patch16-224')

# Replace the text encoder with the mobileclip text encoder
model.text_encoder = MobileClipTextEncoder.from_pretrained('mobileclip-text-encoder')

# Load the SAM3-LiteText model weights into the ViT model
model.load_state_dict(model_weights)

Step 4: Integrate the Model into the Existing Architecture

Integrate the SAM3-LiteText model into the existing architecture, replacing the original text encoder with the mobileclip text encoder.

Example code snippet:

# Replace the original text encoder with the mobileclip text encoder
text_encoder = MobileClipTextEncoder.from_pretrained('mobileclip-text-encoder')
model.text_encoder = text_encoder

Step 5: Test the Integrated Model

Test the integrated model to ensure it is working correctly:

# Test the integrated model
input

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix Adding SAM3-LiteText [1 pull requests, 7 comments, 3 participants]