transformers - 💡(How to fix) Fix return_tensors is silently ignored when text_kwargs is explicitly passed

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

_merge_kwargs uses if/elif for routing:

if modality in kwargs:
    kwarg_value = kwargs[modality].pop(modality_key, "__empty__")  # not found → "__empty__"
    # NO fallthrough to elif branch
elif modality_key in kwargs:
    kwarg_value = kwargs.get(modality_key, "__empty__")

When text_kwargs={...} is explicitly passed, the if branch is taken for all keys. If return_tensors is not in the user's text_kwargs dict, pop() returns "__empty__", and the code never falls through to check flat kwargs. Meanwhile, images_kwargs/audio_kwargs/videos_kwargs have no explicit dicts, so they correctly pick up return_tensors from flat kwargs.

This worked in v4.x because CommonKwargs propagated return_tensors to all modalities at the end of _merge_kwargs. PR #40931 (5339f72b9b) removed CommonKwargs, exposing this routing deficiency.

Fix Action

Workaround

Include return_tensors inside text_kwargs:

processor(text="hello", text_kwargs={"padding": "max_length", "return_tensors": "pt"})

Code Example

from transformers import Qwen2VLProcessor

processor = Qwen2VLProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

# Works — no text_kwargs dict
out1 = processor(text="hello", return_tensors="pt")
print(type(out1["input_ids"]))  # Tensor
# Silently breaks — flat return_tensors + text_kwargs dict
out2 = processor(text="hello", return_tensors="pt",
                 text_kwargs={"padding": "max_length", "max_length": 32})
print(type(out2["input_ids"]))  # list ✗ — should be Tensor!

# Workaround — put return_tensors inside text_kwargs
out3 = processor(text="hello",
                 text_kwargs={"padding": "max_length", "max_length": 32, "return_tensors": "pt"})
print(type(out3["input_ids"]))  # Tensor
---

<class 'torch.Tensor'>
<class 'list'>
<class 'torch.Tensor'>

---

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>

---

if modality in kwargs:
    kwarg_value = kwargs[modality].pop(modality_key, "__empty__")  # not found → "__empty__"
    # NO fallthrough to elif branch
elif modality_key in kwargs:
    kwarg_value = kwargs.get(modality_key, "__empty__")

---

if modality in kwargs:
    kwarg_value = kwargs[modality].pop(modality_key, "__empty__")
    if kwarg_value != "__empty__" and modality_key in non_modality_kwargs:
        raise ValueError(...)
else:
    kwarg_value = "__empty__"

# Fallback: if not found in modality dict, check flat kwargs
if (isinstance(kwarg_value, str) and kwarg_value == "__empty__") and modality_key in kwargs:
    kwarg_value = kwargs.get(modality_key, "__empty__")

---

processor(text="hello", text_kwargs={"padding": "max_length", "return_tensors": "pt"})
RAW_BUFFERClick to expand / collapse

System Info

  • transformers version: 5.10.0.dev0
  • Platform: Linux-5.15.0-117-generic-x86_64-with-glibc2.35
  • Python version: 3.12.11
  • Huggingface_hub version: 1.16.1
  • Safetensors version: 0.5.3
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.12.0+cu130 (CUDA)
  • Using distributed or parallel set-up in script?: <fill in>
  • Using GPU in script?: <fill in>

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

from transformers import Qwen2VLProcessor

processor = Qwen2VLProcessor.from_pretrained("Qwen/Qwen2-VL-2B-Instruct")

# Works — no text_kwargs dict
out1 = processor(text="hello", return_tensors="pt")
print(type(out1["input_ids"]))  # Tensor ✓

# Silently breaks — flat return_tensors + text_kwargs dict
out2 = processor(text="hello", return_tensors="pt",
                 text_kwargs={"padding": "max_length", "max_length": 32})
print(type(out2["input_ids"]))  # list ✗ — should be Tensor!

# Workaround — put return_tensors inside text_kwargs
out3 = processor(text="hello",
                 text_kwargs={"padding": "max_length", "max_length": 32, "return_tensors": "pt"})
print(type(out3["input_ids"]))  # Tensor ✓

Output:

<class 'torch.Tensor'>
<class 'list'>
<class 'torch.Tensor'>

Expected behavior

Expected Output:

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>

Some analysis from AI:

Root Cause

_merge_kwargs uses if/elif for routing:

if modality in kwargs:
    kwarg_value = kwargs[modality].pop(modality_key, "__empty__")  # not found → "__empty__"
    # NO fallthrough to elif branch
elif modality_key in kwargs:
    kwarg_value = kwargs.get(modality_key, "__empty__")

When text_kwargs={...} is explicitly passed, the if branch is taken for all keys. If return_tensors is not in the user's text_kwargs dict, pop() returns "__empty__", and the code never falls through to check flat kwargs. Meanwhile, images_kwargs/audio_kwargs/videos_kwargs have no explicit dicts, so they correctly pick up return_tensors from flat kwargs.

This worked in v4.x because CommonKwargs propagated return_tensors to all modalities at the end of _merge_kwargs. PR #40931 (5339f72b9b) removed CommonKwargs, exposing this routing deficiency.

Suggested Fix

Add a fallback after the if/elif block:

if modality in kwargs:
    kwarg_value = kwargs[modality].pop(modality_key, "__empty__")
    if kwarg_value != "__empty__" and modality_key in non_modality_kwargs:
        raise ValueError(...)
else:
    kwarg_value = "__empty__"

# Fallback: if not found in modality dict, check flat kwargs
if (isinstance(kwarg_value, str) and kwarg_value == "__empty__") and modality_key in kwargs:
    kwarg_value = kwargs.get(modality_key, "__empty__")

Workaround

Include return_tensors inside text_kwargs:

processor(text="hello", text_kwargs={"padding": "max_length", "return_tensors": "pt"})

Related

  • #38341, #38393 — Same symptom but attributed to .pop() mutating the dict; the deeper _merge_kwargs routing issue was not identified

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Expected Output:

<class 'torch.Tensor'>
<class 'torch.Tensor'>
<class 'torch.Tensor'>

Some analysis from AI:

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING