transformers - 💡(How to fix) Fix Feature request: Add EfficientViT-SAM (efficientvitsam) to Transformers [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45175Fetched 2026-04-08 02:22:14
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
labeled ×1
RAW_BUFFERClick to expand / collapse

Feature request

EfficientViT-SAM combines MIT’s EfficientViT encoder with the SAM-style prompt encoder and mask decoder. It offers a lighter, faster alternative to ViT-based SAM for interactive segmentation while staying close to the same prompting and mask-decoding workflow.

Today, users who want this stack in the Hugging Face ecosystem must rely on custom code or the upstream repo. First-class support in Transformers would:

Enable from_pretrained on official or community checkpoints (e.g. mit-han-lab/efficientvit-sam and converted .pt weights).

Motivation

Provide a single API aligned with existing SAM-style models (AutoModel, processors, post_process_masks). Improve discoverability and CI-tested parity with upstream preprocessing and architecture where feasible.

Your contribution

Add efficientvitsam model code: configuration, image processing, processor, and PyTorch modeling consistent with upstream efficientvit/applications/efficientvit_sam/. Register the model in auto classes and document it in the model hub docs. Include a conversion path from official checkpoints and tests (including optional slow parity checks against the upstream predictor where appropriate).

extent analysis

TL;DR

To add EfficientViT-SAM support to the Hugging Face ecosystem, implement the model code, configuration, and registration in the Transformers library.

Guidance

  • Implement the EfficientViT-SAM model code, including configuration, image processing, processor, and PyTorch modeling, consistent with the upstream repository.
  • Register the model in the auto classes to enable from_pretrained functionality for official and community checkpoints.
  • Document the model in the model hub docs to improve discoverability.
  • Include a conversion path from official checkpoints and add tests, such as parity checks against the upstream predictor, to ensure consistency.

Notes

The implementation should align with the existing SAM-style models and follow the upstream repository's architecture and preprocessing where feasible.

Recommendation

Apply workaround by implementing the EfficientViT-SAM model code and registration, as there is no clear indication of an existing fixed version that can be upgraded to.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - 💡(How to fix) Fix Feature request: Add EfficientViT-SAM (efficientvitsam) to Transformers [1 participants]