transformers - ✅(Solved) Fix [BUG] OmDet-Turbo processor produces 640px inputs but the model expects 224px [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44610Fetched 2026-04-08 00:27:26
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
closed ×1cross-referenced ×1labeled ×1mentioned ×1

Error Message

from transformers import AutoProcessor, OmDetTurboForObjectDetection from PIL import Image import requests import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf") processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf") image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB") encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt") try: with torch.no_grad(): outputs = model(**encoding) print(outputs.decoder_coord_logits.shape) except Exception as e: print(e)

Fix Action

Fixed

PR fix notes

PR #44611: fix(models): Forward timm model kwargs to timm.create_model for OmDet-Turbo

Description (problem / solution / changelog)

What does this PR do?

The following issue was identified and fixed in this PR:

This PR (🚨 Delete duplicate code in backbone utils) structured config loading to use BackboneMixin.consolidate_backbone_kwargs_to_config. For the DETR-family; the current state works correctly because timm_default_kwargs only contains keys that map to TimmBackboneConfig.init. There might be others; but OmDet-Turbo is a model that passes kwargs meant for timm.create_model itself, and which are not TimmBackboneConfig params and were dropped. → From the prev PR's diff, before the refactor, the implementation forwarded these params via **kwargs to timm.create_model and it worked before, but after the refactor they were stored as attributes on PreTrainedConfig and never forwarded, and parameters like img_size are ignored.

Fixes #44610.

cc: @zucchini-nlp

CI Failures

Before the fix (feel free to cross-check; these errors are reproducible):

<img width="500" height="500" alt="Image" src="https://github.com/user-attachments/assets/6ae9830c-2968-4f16-958f-a125dbac8d57" /><br>

After the fix (feel free to cross-check):

<img width="500" height="500" alt="2" src="https://github.com/user-attachments/assets/f5c855ae-e5cd-4212-ace9-e1fdfd088f45" />

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline, Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
  • Did you fix any necessary existing tests?
<!-- CURSOR_SUMMARY -->

[!NOTE] Medium Risk Changes timm backbone construction and argument forwarding; while scoped to an optional timm_model_kwargs dict, it can affect any model relying on TimmBackbone if those kwargs are set or collide with existing **kwargs.

Overview Fixes OmDet-Turbo’s timm backbone initialization by storing timm-only parameters (e.g. img_size, always_partition) under backbone_config.timm_model_kwargs instead of top-level config fields.

Updates TimmBackbone to forward config.timm_model_kwargs into timm.create_model, and adds a backward-compatibility shim to migrate older hub configs that had img_size/always_partition as direct attributes.

<sup>Written by Cursor Bugbot for commit 414ee30d2c42077e07d0f0b1d582b65b25606f09. This will update automatically on new commits. Configure here.</sup>

<!-- /CURSOR_SUMMARY -->

Changed files

  • src/transformers/models/omdet_turbo/configuration_omdet_turbo.py (modified, +13/-0)
  • src/transformers/models/omdet_turbo/modeling_omdet_turbo.py (modified, +2/-3)

Code Example

from transformers import AutoProcessor, OmDetTurboForObjectDetection
from PIL import Image
import requests
import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB")
encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt")
try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.0.0.dev0
  • Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • huggingface_hub version: 1.3.2
  • safetensors version: 0.7.0
  • accelerate version: 1.12.0
  • Accelerate config: not installed
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
  • GPU type: NVIDIA L4
  • NVIDIA driver version: 550.90.07
  • CUDA version: 12.4

Who can help?

@zucchini-nlp (🚨 Delete duplicate code in backbone utils)

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import AutoProcessor, OmDetTurboForObjectDetection
from PIL import Image
import requests
import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB")
encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt")
try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)

Current Repro Output:

<img width="500" height="500" alt="Image" src="https://github.com/user-attachments/assets/6ae9830c-2968-4f16-958f-a125dbac8d57" /><br>

OmDet-Turbo fails with AssertionError in inference. The processor produces 640×640 images but the model expects an input height of 224, and running the official loading and inference code raises AssertionError: Input height (640) doesn't match model (224) as shown in the screenshot; instead of the expected output tensor. Also causes the issue in the official OmDet-Turbo CI run.

Expected behavior

outputs.decoder_coord_logits.shape should return torch.Size([1, 900, 4]); the model should accept 640×640 images as configured.

extent analysis

Fix Plan

1. Update transformers version

Update the transformers version to the latest stable release (5.0.0 or later) to ensure compatibility with the latest model configurations.

pip install transformers==5.0.0

2. Update model configuration

Update the model configuration to match the expected input height of 640.

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf", input_size=640)

3. Update processor configuration

Update the processor configuration to match the expected input height of 640.

processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf", input_size=640)

4. Update image encoding

Update the image encoding to match the expected input height of 640.

encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt", max_size=640)

5. Verify the fix

Run the code again and verify that the outputs.decoder_coord_logits.shape returns torch.Size([1, 900, 4]).

try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)

Verification

  • Run the code again and verify that the outputs.decoder_coord_logits.shape returns torch.Size([1, 900, 4]).
  • Check the official OmDet-Turbo CI run to ensure that the issue is resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

outputs.decoder_coord_logits.shape should return torch.Size([1, 900, 4]); the model should accept 640×640 images as configured.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING