transformers - ✅(Solved) Fix [BUG] OmDet-Turbo processor produces 640px inputs but the model expects 224px [2 pull requests, 1 participants]

Q: Expected behavior

→ `outputs.decoder_coord_logits.shape` should return `torch.Size([1, 900, 4])`; the model should accept 640×640 images as configured.

transformers2026-03-11 19:58:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44610•Fetched 2026-04-08 00:27:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

harshaljanjani

Participants

harshaljanjani

Timeline (top)

closed ×1cross-referenced ×1labeled ×1mentioned ×1

Error Message

from transformers import AutoProcessor, OmDetTurboForObjectDetection from PIL import Image import requests import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf") processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf") image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB") encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt") try: with torch.no_grad(): outputs = model(**encoding) print(outputs.decoder_coord_logits.shape) except Exception as e: print(e)

Fix Action

Fixed

Fixed by PR: fix(models): Forward timm model kwargs to timm.create_model for OmDet-Turbo (https://github.com/huggingface/transformers/pull/44611)

PR fix notes

PR #44611: fix(models): Forward timm model kwargs to timm.create_model for OmDet-Turbo

Repository: huggingface/transformers
Author: harshaljanjani
State: closed | merged: True
Link: https://github.com/huggingface/transformers/pull/44611

Description (problem / solution / changelog)

What does this PR do?

The following issue was identified and fixed in this PR:

→ This PR (🚨 Delete duplicate code in backbone utils) structured config loading to use BackboneMixin.consolidate_backbone_kwargs_to_config. For the DETR-family; the current state works correctly because timm_default_kwargs only contains keys that map to TimmBackboneConfig.init. There might be others; but OmDet-Turbo is a model that passes kwargs meant for timm.create_model itself, and which are not TimmBackboneConfig params and were dropped. → From the prev PR's diff, before the refactor, the implementation forwarded these params via **kwargs to timm.create_model and it worked before, but after the refactor they were stored as attributes on PreTrainedConfig and never forwarded, and parameters like img_size are ignored.

Fixes #44610.

cc: @zucchini-nlp

CI Failures

Before the fix (feel free to cross-check; these errors are reproducible):

After the fix (feel free to cross-check):

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline, Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines, and here are tips on formatting docstrings.
Did you fix any necessary existing tests?

[!NOTE] Medium Risk Changes timm backbone construction and argument forwarding; while scoped to an optional timm_model_kwargs dict, it can affect any model relying on TimmBackbone if those kwargs are set or collide with existing **kwargs.

Overview Fixes OmDet-Turbo’s timm backbone initialization by storing timm-only parameters (e.g. img_size, always_partition) under backbone_config.timm_model_kwargs instead of top-level config fields.

Updates TimmBackbone to forward config.timm_model_kwargs into timm.create_model, and adds a backward-compatibility shim to migrate older hub configs that had img_size/always_partition as direct attributes.

<sup>Written by Cursor Bugbot for commit 414ee30d2c42077e07d0f0b1d582b65b25606f09. This will update automatically on new commits. Configure here.</sup>

Changed files

src/transformers/models/omdet_turbo/configuration_omdet_turbo.py (modified, +13/-0)
src/transformers/models/omdet_turbo/modeling_omdet_turbo.py (modified, +2/-3)

Code Example

from transformers import AutoProcessor, OmDetTurboForObjectDetection
from PIL import Image
import requests
import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB")
encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt")
try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.0.0.dev0
Platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39
Python version: 3.12.3
huggingface_hub version: 1.3.2
safetensors version: 0.7.0
accelerate version: 1.12.0
Accelerate config: not installed
DeepSpeed version: not installed
PyTorch version (accelerator?): 2.9.1+cu128 (CUDA)
GPU type: NVIDIA L4
NVIDIA driver version: 550.90.07
CUDA version: 12.4

Who can help?

@zucchini-nlp (🚨 Delete duplicate code in backbone utils)

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

from transformers import AutoProcessor, OmDetTurboForObjectDetection
from PIL import Image
import requests
import torch

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf")
image = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw).convert("RGB")
encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt")
try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)

Current Repro Output:

OmDet-Turbo fails with AssertionError in inference. The processor produces 640×640 images but the model expects an input height of 224, and running the official loading and inference code raises AssertionError: Input height (640) doesn't match model (224) as shown in the screenshot; instead of the expected output tensor. Also causes the issue in the official OmDet-Turbo CI run.

Expected behavior

→ outputs.decoder_coord_logits.shape should return torch.Size([1, 900, 4]); the model should accept 640×640 images as configured.

extent analysis

Fix Plan

1. Update `transformers` version

Update the transformers version to the latest stable release (5.0.0 or later) to ensure compatibility with the latest model configurations.

pip install transformers==5.0.0

2. Update model configuration

Update the model configuration to match the expected input height of 640.

model = OmDetTurboForObjectDetection.from_pretrained("omlab/omdet-turbo-swin-tiny-hf", input_size=640)

3. Update processor configuration

Update the processor configuration to match the expected input height of 640.

processor = AutoProcessor.from_pretrained("omlab/omdet-turbo-swin-tiny-hf", input_size=640)

4. Update image encoding

Update the image encoding to match the expected input height of 640.

encoding = processor(images=image, text=["cat", "remote"], task="Detect cat, remote.", return_tensors="pt", max_size=640)

5. Verify the fix

Run the code again and verify that the outputs.decoder_coord_logits.shape returns torch.Size([1, 900, 4]).

try:
    with torch.no_grad():
        outputs = model(**encoding)
    print(outputs.decoder_coord_logits.shape)
except Exception as e:
    print(e)

Verification

Run the code again and verify that the outputs.decoder_coord_logits.shape returns torch.Size([1, 900, 4]).
Check the official OmDet-Turbo CI run to ensure that the issue is resolved.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

→ outputs.decoder_coord_logits.shape should return torch.Size([1, 900, 4]); the model should accept 640×640 images as configured.

#api #ssr #installation #tensor shape #autograd error #request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix [BUG] OmDet-Turbo processor produces 640px inputs but the model expects 224px [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #44611: fix(models): Forward timm model kwargs to timm.create_model for OmDet-Turbo

Description (problem / solution / changelog)

What does this PR do?

Before submitting

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Fix Plan

1. Update transformers version

2. Update model configuration

3. Update processor configuration

4. Update image encoding

5. Verify the fix

Verification

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. Update `transformers` version