transformers - ✅(Solved) Fix PIL backend image processors incorrectly require torchvision in v5.4.0 [4 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#45042Fetched 2026-04-08 01:35:58
View on GitHub
Comments
5
Participants
4
Timeline
20
Reactions
0
Author
Timeline (top)
commented ×5referenced ×4cross-referenced ×3mentioned ×3

Error Message

Traceback (most recent call last): File "/tmp/temp/app.py", line 3, in <module> processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 424, in from_pretrained return processor_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1421, in from_pretrained args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1550, in _get_arguments_from_pretrained sub_processor = auto_processor_class.from_pretrained( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 751, in from_pretrained raise ValueError( ValueError: Unrecognized image processor in google/gemma-3-12b-it. Should have a image_processor_type key in its preprocessor_config.json of config.json, or one of the following model_type keys in its config.json: aimv2, aimv2_vision_model, align, altclip, aria, aya_vision, beit, bit, blip, blip-2, bridgetower, chameleon, chinese_clip, chmv2, clip, clipseg, cohere2_vision, colpali, colqwen2, conditional_detr, convnext, convnextv2, cvt, data2vec-vision, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, detr, dinat, dinov2, dinov3_vit, donut-swin, dpt, edgetam, efficientloftr, efficientnet, emu3, eomt, eomt_dinov3, ernie4_5_vl_moe, flava, florence2, focalnet, fuyu, gemma3, gemma3n, git, glm46v, glm4v, glm_image, glpn, got_ocr2, grounding-dino, groupvit, hiera, idefics, idefics2, idefics3, ijepa, imagegpt, instructblip, internvl, janus, kosmos-2, kosmos-2.5, layoutlmv2, layoutlmv3, layoutxlm, levit, lfm2_vl, lightglue, lighton_ocr, llama4, llava, llava_next, llava_next_video, llava_onevision, lw_detr, mask2former, maskformer, metaclip_2, mgp-str, mistral3, mlcd, mllama, mm-grounding-dino, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, nougat, omdet-turbo, oneformer, ovis2, owlv2, owlvit, paddleocr_vl, paligemma, perceiver, perception_lm, phi4_multimodal, pi0, pix2struct, pixio, pixtral, poolformer, pp_chart2table, pp_doclayout_v2, pp_doclayout_v3, pp_lcnet, pp_ocrv5_mobile_det, pp_ocrv5_mobile_rec, pp_ocrv5_server_det, pp_ocrv5_server_rec, prompt_depth_anything, pvt, pvt_v2, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_5, qwen3_5_moe, qwen3_omni_moe, qwen3_vl, regnet, resnet, rt_detr, sam, sam2, sam2_video, sam3, sam3_tracker, sam3_tracker_video, sam3_video, sam_hq, segformer, seggpt, shieldgemma2, siglip, siglip2, slanext, smolvlm, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, t5gemma2, t5gemma2_encoder, table-transformer, textnet, timesformer, timm_wrapper, trocr, tvp, udop, upernet, uvdoc, video_llama_3, video_llava, videomae, vilt, vipllava, vit, vit_mae, vit_msn, vitmatte, vitpose, xclip, yolos, zoedepth

PR fix notes

PR #45045: [Bugfix] Remove incorrect torchvision requirement from PIL backend image processors

Description (problem / solution / changelog)

Isolate dependencies, make PIL independant from Torchvision backend

Fixes #45042

PR #45029 added @requires(backends=("vision", "torch", "torchvision")) to 67 PIL backend image_processing_pil_*.py files. This causes PIL backend classes to become dummy objects when torchvision is not installed, making AutoImageProcessor unable to find any working processor — even though the PIL backend's purpose is to work without torchvision.

Fixes:

  • duplicate Kwarg import (PIL imported Kwargs from Torchvision equivalent file)
  • fix modular to inline Kwargs if used by both files
  • explicit import when it makes sense
  • explicit protection with requires(backends=("torch")) when file actually needs it.

Changed files

  • src/transformers/image_processing_backends.py (modified, +2/-2)
  • src/transformers/models/aria/image_processing_aria.py (modified, +2/-5)
  • src/transformers/models/aria/image_processing_pil_aria.py (modified, +23/-7)
  • src/transformers/models/aria/modeling_aria.py (modified, +1/-6)
  • src/transformers/models/aria/modular_aria.py (modified, +1/-9)
  • src/transformers/models/beit/image_processing_beit.py (modified, +5/-9)
  • src/transformers/models/beit/image_processing_pil_beit.py (modified, +18/-10)
  • src/transformers/models/bridgetower/image_processing_bridgetower.py (modified, +3/-8)
  • src/transformers/models/bridgetower/image_processing_pil_bridgetower.py (modified, +47/-6)
  • src/transformers/models/chameleon/image_processing_chameleon.py (modified, +3/-7)
  • src/transformers/models/chmv2/image_processing_chmv2.py (modified, +3/-9)
  • src/transformers/models/cohere2_vision/image_processing_cohere2_vision.py (modified, +2/-5)
  • src/transformers/models/cohere2_vision/modeling_cohere2_vision.py (modified, +1/-6)
  • src/transformers/models/conditional_detr/image_processing_conditional_detr.py (modified, +2/-5)
  • src/transformers/models/conditional_detr/image_processing_pil_conditional_detr.py (modified, +183/-22)
  • src/transformers/models/conditional_detr/modular_conditional_detr.py (modified, +18/-3)
  • src/transformers/models/convnext/image_processing_convnext.py (modified, +4/-9)
  • src/transformers/models/convnext/image_processing_pil_convnext.py (modified, +13/-6)
  • src/transformers/models/deepseek_vl/image_processing_pil_deepseek_vl.py (modified, +11/-4)
  • src/transformers/models/deepseek_vl/modular_deepseek_vl.py (modified, +11/-1)
  • src/transformers/models/deepseek_vl_hybrid/image_processing_deepseek_vl_hybrid.py (modified, +2/-2)
  • src/transformers/models/deepseek_vl_hybrid/image_processing_pil_deepseek_vl_hybrid.py (modified, +30/-10)
  • src/transformers/models/deepseek_vl_hybrid/modular_deepseek_vl_hybrid.py (modified, +6/-5)
  • src/transformers/models/deformable_detr/image_processing_deformable_detr.py (modified, +2/-5)
  • src/transformers/models/deformable_detr/image_processing_pil_deformable_detr.py (modified, +18/-5)
  • src/transformers/models/deformable_detr/modeling_deformable_detr.py (modified, +1/-6)
  • src/transformers/models/deformable_detr/modular_deformable_detr.py (modified, +17/-2)
  • src/transformers/models/depth_pro/image_processing_depth_pro.py (modified, +4/-10)
  • src/transformers/models/detr/image_processing_detr.py (modified, +1/-5)
  • src/transformers/models/detr/image_processing_pil_detr.py (modified, +183/-15)
  • src/transformers/models/dinov3_vit/image_processing_dinov3_vit.py (modified, +3/-8)
  • src/transformers/models/donut/image_processing_pil_donut.py (modified, +16/-6)
  • src/transformers/models/dpt/image_processing_dpt.py (modified, +3/-8)
  • src/transformers/models/dpt/image_processing_pil_dpt.py (modified, +73/-13)
  • src/transformers/models/dpt/modular_dpt.py (modified, +2/-3)
  • src/transformers/models/efficientloftr/image_processing_efficientloftr.py (modified, +8/-5)
  • src/transformers/models/efficientloftr/image_processing_pil_efficientloftr.py (modified, +65/-20)
  • src/transformers/models/efficientloftr/modular_efficientloftr.py (modified, +24/-7)
  • src/transformers/models/efficientnet/image_processing_efficientnet.py (modified, +4/-6)
  • src/transformers/models/efficientnet/image_processing_pil_efficientnet.py (modified, +15/-5)
  • src/transformers/models/eomt/image_processing_eomt.py (modified, +1/-5)
  • src/transformers/models/eomt/image_processing_pil_eomt.py (modified, +145/-17)
  • src/transformers/models/ernie4_5_vl_moe/image_processing_ernie4_5_vl_moe.py (modified, +2/-5)
  • src/transformers/models/ernie4_5_vl_moe/image_processing_pil_ernie4_5_vl_moe.py (modified, +49/-12)
  • src/transformers/models/ernie4_5_vl_moe/modeling_ernie4_5_vl_moe.py (modified, +1/-7)
  • src/transformers/models/ernie4_5_vl_moe/modular_ernie4_5_vl_moe.py (modified, +10/-10)
  • src/transformers/models/flava/image_processing_flava.py (modified, +1/-5)
  • src/transformers/models/flava/image_processing_pil_flava.py (modified, +195/-16)
  • src/transformers/models/fuyu/image_processing_fuyu.py (modified, +1/-5)
  • src/transformers/models/fuyu/image_processing_pil_fuyu.py (modified, +166/-10)
  • src/transformers/models/gemma3/image_processing_gemma3.py (modified, +1/-5)
  • src/transformers/models/gemma3/image_processing_pil_gemma3.py (modified, +21/-5)
  • src/transformers/models/glm46v/image_processing_glm46v.py (modified, +2/-5)
  • src/transformers/models/glm46v/image_processing_pil_glm46v.py (modified, +56/-5)
  • src/transformers/models/glm46v/video_processing_glm46v.py (modified, +2/-5)
  • src/transformers/models/glm4v/image_processing_glm4v.py (modified, +2/-5)
  • src/transformers/models/glm4v/image_processing_pil_glm4v.py (modified, +40/-4)
  • src/transformers/models/glm_image/image_processing_glm_image.py (modified, +4/-9)
  • src/transformers/models/glm_image/image_processing_pil_glm_image.py (modified, +66/-5)
  • src/transformers/models/glm_image/modeling_glm_image.py (modified, +2/-5)
  • src/transformers/models/glm_image/modular_glm_image.py (modified, +4/-4)
  • src/transformers/models/glm_image/processing_glm_image.py (modified, +4/-5)
  • src/transformers/models/glpn/image_processing_glpn.py (modified, +4/-4)
  • src/transformers/models/glpn/image_processing_pil_glpn.py (modified, +19/-10)
  • src/transformers/models/got_ocr2/image_processing_got_ocr2.py (modified, +1/-5)
  • src/transformers/models/got_ocr2/image_processing_pil_got_ocr2.py (modified, +100/-6)
  • src/transformers/models/grounding_dino/image_processing_grounding_dino.py (modified, +2/-5)
  • src/transformers/models/grounding_dino/image_processing_pil_grounding_dino.py (modified, +19/-12)
  • src/transformers/models/grounding_dino/modular_grounding_dino.py (modified, +17/-1)
  • src/transformers/models/idefics/image_processing_pil_idefics.py (modified, +29/-4)
  • src/transformers/models/idefics2/image_processing_idefics2.py (modified, +2/-3)
  • src/transformers/models/idefics2/image_processing_pil_idefics2.py (modified, +77/-11)
  • src/transformers/models/idefics3/image_processing_idefics3.py (modified, +2/-5)
  • src/transformers/models/idefics3/image_processing_pil_idefics3.py (modified, +149/-15)
  • src/transformers/models/imagegpt/image_processing_imagegpt.py (modified, +1/-5)
  • src/transformers/models/imagegpt/image_processing_pil_imagegpt.py (modified, +19/-6)
  • src/transformers/models/janus/image_processing_pil_janus.py (modified, +12/-4)
  • src/transformers/models/kosmos2_5/image_processing_pil_kosmos2_5.py (modified, +47/-5)
  • src/transformers/models/layoutlmv2/image_processing_layoutlmv2.py (modified, +2/-3)
  • src/transformers/models/layoutlmv2/image_processing_pil_layoutlmv2.py (modified, +92/-6)
  • src/transformers/models/layoutlmv3/image_processing_layoutlmv3.py (modified, +2/-3)
  • src/transformers/models/layoutlmv3/image_processing_pil_layoutlmv3.py (modified, +92/-5)
  • src/transformers/models/levit/image_processing_levit.py (modified, +4/-9)
  • src/transformers/models/levit/image_processing_pil_levit.py (modified, +1/-1)
  • src/transformers/models/lfm2_vl/image_processing_lfm2_vl.py (modified, +1/-4)
  • src/transformers/models/lightglue/image_processing_lightglue.py (modified, +2/-4)
  • src/transformers/models/lightglue/image_processing_pil_lightglue.py (modified, +58/-22)
  • src/transformers/models/lightglue/modeling_lightglue.py (modified, +1/-6)
  • src/transformers/models/lightglue/modular_lightglue.py (modified, +13/-8)
  • src/transformers/models/lighton_ocr/modeling_lighton_ocr.py (modified, +1/-6)
  • src/transformers/models/llama4/image_processing_llama4.py (modified, +2/-5)
  • src/transformers/models/llava/image_processing_llava.py (modified, +4/-6)
  • src/transformers/models/llava/image_processing_pil_llava.py (modified, +1/-1)
  • src/transformers/models/llava_next/image_processing_llava_next.py (modified, +4/-8)
  • src/transformers/models/llava_next/image_processing_pil_llava_next.py (modified, +14/-5)
  • src/transformers/models/llava_onevision/image_processing_llava_onevision.py (modified, +1/-1)
  • src/transformers/models/llava_onevision/image_processing_pil_llava_onevision.py (modified, +13/-5)
  • src/transformers/models/llava_onevision/modeling_llava_onevision.py (modified, +1/-5)
  • src/transformers/models/llava_onevision/modular_llava_onevision.py (modified, +14/-3)
  • src/transformers/models/mask2former/image_processing_mask2former.py (modified, +2/-4)

PR #3101: FIX Broken tests with torchao >= 0.15

Description (problem / solution / changelog)

Torchao made some API changes, which have to be reflected in the tests. Moreover, for this to pass, we also need transformers to make the corresponding adjustments:

https://github.com/huggingface/transformers/pull/44604

While working on this, I migrated the tests from unittest to pytest style.

Changed files

  • docs/source/developer_guides/quantization.md (modified, +2/-1)
  • examples/sequence_classification/LoRA-torchao-8bit-dynamic-activation.ipynb (modified, +2/-1)
  • examples/sequence_classification/LoRA-torchao-8bit.ipynb (modified, +2/-1)
  • src/peft/import_utils.py (modified, +1/-1)
  • src/peft/tuners/lora/torchao.py (modified, +5/-10)
  • tests/test_gpu_examples.py (modified, +44/-38)

PR #45060: Fix PIL backend fallback when torchvision is unavailable

Description (problem / solution / changelog)

This PR fixes a regression where PIL-based image/video processors were incorrectly treated as requiring torchvision.

As a result, AutoProcessor / AutoImageProcessor could fail in environments without torchvision, even though a valid PIL fallback exists.

What changed

  • Updated import-structure backend resolution in import_utils so PIL modules do not inherit torchvision requirements:
    • image_processing_pil_*
    • video_processing_pil_*
  • Added a normalization step to prevent torchvision from being attached to PIL-only modules during backend aggregation.
  • Added regression tests for:
    • PIL import-structure backend correctness
    • Auto-backend fallback to PIL when torchvision is unavailable

Fixes #45042

Tests

  • python -m pytest tests/utils/test_import_structure.py -k "pil_import_structure_does_not_require_torchvision" -q
  • python -m pytest tests/models/auto/test_image_processing_auto.py -k "auto_backend_falls_back_to_pil_when_torchvision_is_unavailable or backend_kwarg_pil" -q

Changed files

  • src/transformers/utils/import_utils.py (modified, +18/-3)
  • tests/models/auto/test_image_processing_auto.py (modified, +12/-0)
  • tests/utils/test_import_structure.py (modified, +24/-0)

Code Example

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it")

---

Traceback (most recent call last):
  File "/tmp/temp/app.py", line 3, in <module>
    processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 424, in from_pretrained
    return processor_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1421, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1550, in _get_arguments_from_pretrained
    sub_processor = auto_processor_class.from_pretrained(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 751, in from_pretrained
    raise ValueError(
ValueError: Unrecognized image processor in google/gemma-3-12b-it. Should have a `image_processor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: aimv2, aimv2_vision_model, align, altclip, aria, aya_vision, beit, bit, blip, blip-2, bridgetower, chameleon, chinese_clip, chmv2, clip, clipseg, cohere2_vision, colpali, colqwen2, conditional_detr, convnext, convnextv2, cvt, data2vec-vision, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, detr, dinat, dinov2, dinov3_vit, donut-swin, dpt, edgetam, efficientloftr, efficientnet, emu3, eomt, eomt_dinov3, ernie4_5_vl_moe, flava, florence2, focalnet, fuyu, gemma3, gemma3n, git, glm46v, glm4v, glm_image, glpn, got_ocr2, grounding-dino, groupvit, hiera, idefics, idefics2, idefics3, ijepa, imagegpt, instructblip, internvl, janus, kosmos-2, kosmos-2.5, layoutlmv2, layoutlmv3, layoutxlm, levit, lfm2_vl, lightglue, lighton_ocr, llama4, llava, llava_next, llava_next_video, llava_onevision, lw_detr, mask2former, maskformer, metaclip_2, mgp-str, mistral3, mlcd, mllama, mm-grounding-dino, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, nougat, omdet-turbo, oneformer, ovis2, owlv2, owlvit, paddleocr_vl, paligemma, perceiver, perception_lm, phi4_multimodal, pi0, pix2struct, pixio, pixtral, poolformer, pp_chart2table, pp_doclayout_v2, pp_doclayout_v3, pp_lcnet, pp_ocrv5_mobile_det, pp_ocrv5_mobile_rec, pp_ocrv5_server_det, pp_ocrv5_server_rec, prompt_depth_anything, pvt, pvt_v2, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_5, qwen3_5_moe, qwen3_omni_moe, qwen3_vl, regnet, resnet, rt_detr, sam, sam2, sam2_video, sam3, sam3_tracker, sam3_tracker_video, sam3_video, sam_hq, segformer, seggpt, shieldgemma2, siglip, siglip2, slanext, smolvlm, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, t5gemma2, t5gemma2_encoder, table-transformer, textnet, timesformer, timm_wrapper, trocr, tvp, udop, upernet, uvdoc, video_llama_3, video_llava, videomae, vilt, vipllava, vit, vit_mae, vit_msn, vitmatte, vitpose, xclip, yolos, zoedepth
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.4.0
  • Platform: Linux-6.8.0-1050-aws-x86_64-with-glibc2.35
  • Python version: 3.12.13
  • Huggingface_hub version: 1.8.0
  • Safetensors version: 0.7.0
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
  • Using distributed or parallel set-up in script?: <fill in>
  • Using GPU in script?: <fill in>
  • GPU type: NVIDIA L4

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it")

Expected behavior

After upgrading to transformers==5.4.0, AutoProcessor.from_pretrained fails for models like google/gemma-3-12b-it when torchvision is not installed:

Traceback (most recent call last):
  File "/tmp/temp/app.py", line 3, in <module>
    processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 424, in from_pretrained
    return processor_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1421, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1550, in _get_arguments_from_pretrained
    sub_processor = auto_processor_class.from_pretrained(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 751, in from_pretrained
    raise ValueError(
ValueError: Unrecognized image processor in google/gemma-3-12b-it. Should have a `image_processor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: aimv2, aimv2_vision_model, align, altclip, aria, aya_vision, beit, bit, blip, blip-2, bridgetower, chameleon, chinese_clip, chmv2, clip, clipseg, cohere2_vision, colpali, colqwen2, conditional_detr, convnext, convnextv2, cvt, data2vec-vision, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, detr, dinat, dinov2, dinov3_vit, donut-swin, dpt, edgetam, efficientloftr, efficientnet, emu3, eomt, eomt_dinov3, ernie4_5_vl_moe, flava, florence2, focalnet, fuyu, gemma3, gemma3n, git, glm46v, glm4v, glm_image, glpn, got_ocr2, grounding-dino, groupvit, hiera, idefics, idefics2, idefics3, ijepa, imagegpt, instructblip, internvl, janus, kosmos-2, kosmos-2.5, layoutlmv2, layoutlmv3, layoutxlm, levit, lfm2_vl, lightglue, lighton_ocr, llama4, llava, llava_next, llava_next_video, llava_onevision, lw_detr, mask2former, maskformer, metaclip_2, mgp-str, mistral3, mlcd, mllama, mm-grounding-dino, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, nougat, omdet-turbo, oneformer, ovis2, owlv2, owlvit, paddleocr_vl, paligemma, perceiver, perception_lm, phi4_multimodal, pi0, pix2struct, pixio, pixtral, poolformer, pp_chart2table, pp_doclayout_v2, pp_doclayout_v3, pp_lcnet, pp_ocrv5_mobile_det, pp_ocrv5_mobile_rec, pp_ocrv5_server_det, pp_ocrv5_server_rec, prompt_depth_anything, pvt, pvt_v2, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_5, qwen3_5_moe, qwen3_omni_moe, qwen3_vl, regnet, resnet, rt_detr, sam, sam2, sam2_video, sam3, sam3_tracker, sam3_tracker_video, sam3_video, sam_hq, segformer, seggpt, shieldgemma2, siglip, siglip2, slanext, smolvlm, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, t5gemma2, t5gemma2_encoder, table-transformer, textnet, timesformer, timm_wrapper, trocr, tvp, udop, upernet, uvdoc, video_llama_3, video_llava, videomae, vilt, vipllava, vit, vit_mae, vit_msn, vitmatte, vitpose, xclip, yolos, zoedepth

This worked in 5.3.0.

extent analysis

Fix Plan

To resolve the issue, you need to install the torchvision library, which is required by the transformers library for certain models. Here are the steps:

  • Install torchvision using pip:

pip install torchvision

* Alternatively, you can install `torchvision` using conda:
  ```bash
conda install torchvision
  • If you are using a requirements.txt file, add the following line:

torchvision

* After installing `torchvision`, try running your script again to see if the issue is resolved.

### Verification
To verify that the fix worked, try running the following code:
```python
from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it")
print(processor)

If the code runs without errors and prints the processor object, then the fix was successful.

Extra Tips

  • Make sure to check the compatibility of the transformers library with other libraries and frameworks you are using.
  • If you are using a virtual environment, ensure that the torchvision library is installed in the correct environment.
  • You can also try upgrading the transformers library to the latest version to see if the issue is resolved. However, be cautious when upgrading libraries, as it may introduce new compatibility issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After upgrading to transformers==5.4.0, AutoProcessor.from_pretrained fails for models like google/gemma-3-12b-it when torchvision is not installed:

Traceback (most recent call last):
  File "/tmp/temp/app.py", line 3, in <module>
    processor = AutoProcessor.from_pretrained("google/gemma-3-12b-it")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/processing_auto.py", line 424, in from_pretrained
    return processor_class.from_pretrained(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1421, in from_pretrained
    args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/processing_utils.py", line 1550, in _get_arguments_from_pretrained
    sub_processor = auto_processor_class.from_pretrained(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/temp/.venv/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py", line 751, in from_pretrained
    raise ValueError(
ValueError: Unrecognized image processor in google/gemma-3-12b-it. Should have a `image_processor_type` key in its preprocessor_config.json of config.json, or one of the following `model_type` keys in its config.json: aimv2, aimv2_vision_model, align, altclip, aria, aya_vision, beit, bit, blip, blip-2, bridgetower, chameleon, chinese_clip, chmv2, clip, clipseg, cohere2_vision, colpali, colqwen2, conditional_detr, convnext, convnextv2, cvt, data2vec-vision, deepseek_vl, deepseek_vl_hybrid, deformable_detr, deit, depth_anything, depth_pro, detr, dinat, dinov2, dinov3_vit, donut-swin, dpt, edgetam, efficientloftr, efficientnet, emu3, eomt, eomt_dinov3, ernie4_5_vl_moe, flava, florence2, focalnet, fuyu, gemma3, gemma3n, git, glm46v, glm4v, glm_image, glpn, got_ocr2, grounding-dino, groupvit, hiera, idefics, idefics2, idefics3, ijepa, imagegpt, instructblip, internvl, janus, kosmos-2, kosmos-2.5, layoutlmv2, layoutlmv3, layoutxlm, levit, lfm2_vl, lightglue, lighton_ocr, llama4, llava, llava_next, llava_next_video, llava_onevision, lw_detr, mask2former, maskformer, metaclip_2, mgp-str, mistral3, mlcd, mllama, mm-grounding-dino, mobilenet_v1, mobilenet_v2, mobilevit, mobilevitv2, nougat, omdet-turbo, oneformer, ovis2, owlv2, owlvit, paddleocr_vl, paligemma, perceiver, perception_lm, phi4_multimodal, pi0, pix2struct, pixio, pixtral, poolformer, pp_chart2table, pp_doclayout_v2, pp_doclayout_v3, pp_lcnet, pp_ocrv5_mobile_det, pp_ocrv5_mobile_rec, pp_ocrv5_server_det, pp_ocrv5_server_rec, prompt_depth_anything, pvt, pvt_v2, qwen2_5_omni, qwen2_5_vl, qwen2_vl, qwen3_5, qwen3_5_moe, qwen3_omni_moe, qwen3_vl, regnet, resnet, rt_detr, sam, sam2, sam2_video, sam3, sam3_tracker, sam3_tracker_video, sam3_video, sam_hq, segformer, seggpt, shieldgemma2, siglip, siglip2, slanext, smolvlm, superglue, superpoint, swiftformer, swin, swin2sr, swinv2, t5gemma2, t5gemma2_encoder, table-transformer, textnet, timesformer, timm_wrapper, trocr, tvp, udop, upernet, uvdoc, video_llama_3, video_llava, videomae, vilt, vipllava, vit, vit_mae, vit_msn, vitmatte, vitpose, xclip, yolos, zoedepth

This worked in 5.3.0.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix PIL backend image processors incorrectly require torchvision in v5.4.0 [4 pull requests, 5 comments, 4 participants]