transformers - ✅(Solved) Fix auto_mappings.py references removed Sam3LiteText configs, breaking CI [2 pull requests, 4 comments, 2 participants]

artem-spector · 2026-04-23T09:51:06Z

[transformers] PR 45597: Add Granite 4.1 Vision granite4 vision - Repository: huggingface/transformers - Author: artem-spector - State: open | merged: False -… # PR #45597: Add Granite 4.1 Vision (granite4_vision) - Repository: huggingface/transformers - Author: artem-spector - State: open | merged: False - Link: https://github.com/huggingface/transformers/pull/45597 ## Description (problem / solution / changelog) ## What does this PR do? Adds built-in support for **Granite 4.1 Vision** (`granite4_vision`), IBM's multimodal vision-language model for enterprise document understanding. ### Architecture highlights - **Vision encoder:** SigLIP2 (`google/siglip2-so400m-patch16-384`), tiled 384×384 patches - **Window Q-Former projector:** 4×4 patch windows compressed to 2×2 query tokens via cross-attention (`downsample_rate="4/8"`) - **DeepStack feature injection:** 8 vision-to-LLM injection points across two mechanisms: - *LayerDeepstack:* features from 4 vision encoder depths injected at 4 LLM layers (reversed order — deepest vision → earliest LLM) - *SpatialDeepstack:* deepest features split into 4 spatial offset groups (TL/TR/BL/BR), injected at 4 later LLM layers - **Language model:** GraniteForCausalLM (3.5B) with a rank-256 LoRA adapter (same-repo, LM-only) ### Files added | File | Purpose | |---|---| | `modular_granite4_vision.py` | Source of truth — inherits from LLaVA-Next, overrides novel components | | `configuration_granite4_vision.py` | Config (generated) | | `modeling_granite4_vision.py` | Model (generated) | | `processing_granite4_vision.py` | Unified processor (generated) | | `image_processing_granite4_vision.py` | Torchvision-based image processor | | `image_processing_pil_granite4_vision.py` | PIL/NumPy image processor | | `tests/models/granite4_vision/` | Modeling, image processing, and processor tests | | `docs/source/en/model_doc/granite4_vision.md` | Model documentation | ### Auto-registration - Config: auto-generated via `configuration_granite4_vision.py` model_type - Modeling: `MODEL_MAPPING_NAMES` + `MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES` - Processing + image processing: registered in respective auto files ### Tests - Unit tests pass locally (`pytest tests/models/granite4_vision/ -x -q`) - `@slow` integration tests load real checkpoint and assert outputs within tolerance - `make style` and `make check-repo` pass (3 remaining failures are pre-existing upstream issues: `mlinter` version mismatch and `Sam3Lite` incomplete model) ## Before submitting - [x] This PR is not a duplicate - [x] I have read the [contributor guidelines](https://github.com/huggingface/transformers/blob/main/CONTRIBUTING.md) - [x] The documentation reflects the changes - [x] The tests pass ## Related - vLLM built-in support: https://github.com/vllm-project/vllm/pull/40282 (merged) - Model: https://huggingface.co/ibm-granite/granite-vision-4.1-4b ## Changed files - `docs/source/en/_toctree.yml` (modified, +2/-0) - `docs/source/en/model_doc/granite4_vision.md` (added, +206/-0) - `src/transformers/conversion_mapping.py` (modified, +6/-0) - `src/transformers/models/__init__.py` (modified, +1/-0) - `src/transformers/models/auto/auto_mappings.py` (modified, +2/-0) - `src/transformers/models/auto/modeling_auto.py` (modified, +2/-0) - `src/transformers/models/auto/processing_auto.py` (modified, +1/-0) - `src/transformers/models/granite4_vision/__init__.py` (added, +30/-0) - `src/transformers/models/granite4_vision/configuration_granite4_vision.py` (added, +116/-0) - `src/transformers/models/granite4_vision/downsampling_granite4_vision.py` (added, +155/-0) - `src/transformers/models/granite4_vision/image_processing_granite4_vision.py` (added, +244/-0) - `src/transformers/models/granite4_vision/image_processing_pil_granite4_vision.py` (added, +240/-0) - `src/transformers/models/granite4_vision/modeling_granite4_vision.py` (added, +882/-0) - `src/transformers/models/granite4_vision/modular_granite4_vision.py` (added, +734/-0) - `src/transformers/models/granite4_vision/processing_granite4_vision.py` (added, +238/-0) - `src/transformers/utils/auto_docstring.py` (modified, +4/-0) - `tests/models/granite4_vision/__init__.py` (added, +0/-0) - `tests/models/granite4_vision/test_image_processing_granite4_vision.py` (added, +253/-0) - `tests/models/granite4_vision/test_modeling_granite4_vision.py` (added, +272/-0) - `tests/models/granite4_vision/test_processing_granite4_vision.py` (added, +122/-0) - `utils/check_repo.py` (modified, +4/-0) ## Fixed - Fixed by PR: Add Granite 4.1 Vision (granite4_vision) (https://github.com/huggingface/transformers/pull/45597) ### System Info - `transformers` version: 5.6.0.dev0 - Platform: Linux-5.14.0-570.12.1.el9_6.x86_64-x86_64-with-glibc2.34 - Python version: 3.11.15 - Huggingface_hub version: 1.10.2 - Safetensors version: 0.7.0 - Accelerate version: 1.13.0 - Accelerate config: not found - DeepSpeed version: not installed - PyTorch version (accelerator?): 2.11.0+cu130 (CUDA) - Using distributed

transformers2026-04-23 09:51:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#45600•Fetched 2026-04-24 05:51:22

View on GitHub

Comments

Participants

Timeline

Reactions

Author

artem-spector

Participants

artem-spector

zucchini-nlp

Timeline (top)

commented ×4closed ×1cross-referenced ×1labeled ×1

Error Message

Exception: Sam3LiteTextVisionConfig appears in CONFIG_MAPPING_NAMES but is not defined in the library.

Root Cause

Root cause: PR #45535 removed Sam3LiteTextViTConfig and Sam3LiteTextVisionConfig from the modeling file but left these two stale entries in auto_mappings.py:
("sam3_vision_model", "Sam3LiteTextVisionConfig"),
("sam3_vit_model", "Sam3LiteTextViTConfig"),

Fix Action

Fixed

Fixed by PR: Add Granite 4.1 Vision (granite4_vision) (https://github.com/huggingface/transformers/pull/45597)

PR fix notes

PR #45597: Add Granite 4.1 Vision (granite4_vision)

Repository: huggingface/transformers
Author: artem-spector
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/45597

Description (problem / solution / changelog)

What does this PR do?

Adds built-in support for Granite 4.1 Vision (granite4_vision), IBM's multimodal vision-language model for enterprise document understanding.

Architecture highlights

Vision encoder: SigLIP2 (google/siglip2-so400m-patch16-384), tiled 384×384 patches
Window Q-Former projector: 4×4 patch windows compressed to 2×2 query tokens via cross-attention (downsample_rate="4/8")
DeepStack feature injection: 8 vision-to-LLM injection points across two mechanisms:
- LayerDeepstack: features from 4 vision encoder depths injected at 4 LLM layers (reversed order — deepest vision → earliest LLM)
- SpatialDeepstack: deepest features split into 4 spatial offset groups (TL/TR/BL/BR), injected at 4 later LLM layers
Language model: GraniteForCausalLM (3.5B) with a rank-256 LoRA adapter (same-repo, LM-only)

Files added

File	Purpose
`modular_granite4_vision.py`	Source of truth — inherits from LLaVA-Next, overrides novel components
`configuration_granite4_vision.py`	Config (generated)
`modeling_granite4_vision.py`	Model (generated)
`processing_granite4_vision.py`	Unified processor (generated)
`image_processing_granite4_vision.py`	Torchvision-based image processor
`image_processing_pil_granite4_vision.py`	PIL/NumPy image processor
`tests/models/granite4_vision/`	Modeling, image processing, and processor tests
`docs/source/en/model_doc/granite4_vision.md`	Model documentation

Auto-registration

Config: auto-generated via configuration_granite4_vision.py model_type
Modeling: MODEL_MAPPING_NAMES + MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING_NAMES
Processing + image processing: registered in respective auto files

Tests

Unit tests pass locally (pytest tests/models/granite4_vision/ -x -q)
@slow integration tests load real checkpoint and assert outputs within tolerance
make style and make check-repo pass (3 remaining failures are pre-existing upstream issues: mlinter version mismatch and Sam3Lite incomplete model)

Before submitting

This PR is not a duplicate
I have read the contributor guidelines
The documentation reflects the changes
The tests pass

vLLM built-in support: https://github.com/vllm-project/vllm/pull/40282 (merged)
Model: https://huggingface.co/ibm-granite/granite-vision-4.1-4b

Changed files

docs/source/en/_toctree.yml (modified, +2/-0)
docs/source/en/model_doc/granite4_vision.md (added, +206/-0)
src/transformers/conversion_mapping.py (modified, +6/-0)
src/transformers/models/__init__.py (modified, +1/-0)
src/transformers/models/auto/auto_mappings.py (modified, +2/-0)
src/transformers/models/auto/modeling_auto.py (modified, +2/-0)
src/transformers/models/auto/processing_auto.py (modified, +1/-0)
src/transformers/models/granite4_vision/__init__.py (added, +30/-0)
src/transformers/models/granite4_vision/configuration_granite4_vision.py (added, +116/-0)
src/transformers/models/granite4_vision/downsampling_granite4_vision.py (added, +155/-0)
src/transformers/models/granite4_vision/image_processing_granite4_vision.py (added, +244/-0)
src/transformers/models/granite4_vision/image_processing_pil_granite4_vision.py (added, +240/-0)
src/transformers/models/granite4_vision/modeling_granite4_vision.py (added, +882/-0)
src/transformers/models/granite4_vision/modular_granite4_vision.py (added, +734/-0)
src/transformers/models/granite4_vision/processing_granite4_vision.py (added, +238/-0)
src/transformers/utils/auto_docstring.py (modified, +4/-0)
tests/models/granite4_vision/__init__.py (added, +0/-0)
tests/models/granite4_vision/test_image_processing_granite4_vision.py (added, +253/-0)
tests/models/granite4_vision/test_modeling_granite4_vision.py (added, +272/-0)
tests/models/granite4_vision/test_processing_granite4_vision.py (added, +122/-0)
utils/check_repo.py (modified, +4/-0)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.6.0.dev0
Platform: Linux-5.14.0-570.12.1.el9_6.x86_64-x86_64-with-glibc2.34
Python version: 3.11.15
Huggingface_hub version: 1.10.2
Safetensors version: 0.7.0
Accelerate version: 1.13.0
Accelerate config: not found
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.11.0+cu130 (CUDA)
Using distributed or parallel set-up in script?: <fill in>
Using GPU in script?: <fill in>
GPU type: NVIDIA H100 80GB HBM3

Who can help?

@yonigozlan

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

On current branch:

python utils/check_repo.py Exception: Sam3LiteTextVisionConfig appears in CONFIG_MAPPING_NAMES but is not defined in the library.
Sam3LiteTextViTConfig appears in CONFIG_MAPPING_NAMES but is not defined in the library.

pytest tests/models/sam3_lite_text/ 357 failures: AttributeError: module transformers has no attribute Sam3LiteTextViTConfig

Expected behavior

check_repo.py and sam3_lite_text tests should pass on main. The stale entries should be removed from auto_mappings.py.

extent analysis

TL;DR

Remove the stale entries for Sam3LiteTextVisionConfig and Sam3LiteTextViTConfig from auto_mappings.py to fix the issue.

Guidance

Identify the auto_mappings.py file and locate the stale entries for Sam3LiteTextVisionConfig and Sam3LiteTextViTConfig.
Remove the lines containing these stale entries to ensure consistency with the changes made in PR #45535.
Run python utils/check_repo.py and pytest tests/models/sam3_lite_text/ again to verify that the issue is resolved.
Review the code changes to ensure that no other stale entries or references to the removed configurations exist.

Notes

This fix assumes that the removal of Sam3LiteTextVisionConfig and Sam3LiteTextViTConfig from the modeling file was intentional and that the stale entries in auto_mappings.py are the only remaining references to these configurations.

Recommendation

Apply workaround: Remove the stale entries from auto_mappings.py to resolve the issue, as the configurations have been removed from the modeling file.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

check_repo.py and sam3_lite_text tests should pass on main. The stale entries should be removed from auto_mappings.py.

#file not found #serialization error #model compatibility #GPU setup #container setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix auto_mappings.py references removed Sam3LiteText configs, breaking CI [2 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #45597: Add Granite 4.1 Vision (granite4_vision)

Description (problem / solution / changelog)

What does this PR do?

Architecture highlights

Files added

Auto-registration

Tests

Before submitting

Related

Changed files

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

TL;DR

Guidance

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING