transformers - ✅(Solved) Fix `Qwen2_5_VLProcessor.apply_chat_template` crashes on batched input when `padding=False` [4 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44514Fetched 2026-04-08 00:27:56
View on GitHub
Comments
1
Participants
2
Timeline
15
Reactions
0
Timeline (top)
cross-referenced ×6referenced ×3subscribed ×2closed ×1

Error Message

Traceback (most recent call last): File "<string>", line 14, in <module> File "transformers/processing_utils.py", line 1829, in apply_chat_template out = self(text=prompt, **kwargs) File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 148, in call array_ids = np.array(text_inputs["input_ids"]) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Root Cause

Qwen2_5_VLProcessor.apply_chat_template raises a ValueError when called with a batch of conversations (different prompt lengths) and padding=False (the default). The crash originates from processing_qwen2_5_vl.py:148 which does np.array(text_inputs["input_ids"]), this fails when the tokenized sequences have different lengths because NumPy cannot create a homogeneous array from ragged lists.

Fix Action

Fixed

PR fix notes

PR #44516: fix(qwen2_5_vl): handle ragged batched input in apply_chat_template

Description (problem / solution / changelog)

Summary

Fix Qwen2_5_VLProcessor.apply_chat_template crashing with ValueError when called with batched inputs of different sequence lengths (ragged lists) and padding=False (the default).

Fixes #44514

Root Cause

The mm_token_type_ids construction calls np.array(text_inputs["input_ids"]) which fails when the tokenized sequences have different lengths, because NumPy cannot create a homogeneous array from ragged lists:

ValueError: setting an array element with a sequence. The requested array
has an inhomogeneous shape after 1 dimensions.

Since return_mm_token_type_ids defaults to True, this code path always runs.

Fix

Process each sequence individually instead of trying to create a single 2D array:

mm_token_type_ids = []
for seq_ids in text_inputs["input_ids"]:
    seq_array = np.array(seq_ids)
    seq_mm_ids = np.zeros_like(seq_array)
    seq_mm_ids[seq_array == self.image_token_id] = 1
    seq_mm_ids[seq_array == self.video_token_id] = 2
    mm_token_type_ids.append(seq_mm_ids.tolist())
text_inputs["mm_token_type_ids"] = mm_token_type_ids

This produces the same output format (list of lists) and works for both equal-length and ragged batches.

Changes

  • modular_qwen2_5_vl.py — Updated source of truth with per-sequence processing
  • processing_qwen2_5_vl.py — Updated generated file with same fix (couldn't run modular converter locally due to missing libcst)
  • test_processing_qwen2_5_vl.py — Added test_apply_chat_template_ragged_batched_mm_token_type_ids covering single input and ragged batched input, verifying mm_token_type_ids length matches input_ids per sequence

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)
  • tests/models/qwen2_5_vl/test_processing_qwen2_5_vl.py (modified, +49/-0)

PR #44518: fix: Qwen2_5_VLProcessor crashes on batched input when padding=False …

Description (problem / solution / changelog)

What does this PR do?

Fixes #44514

Qwen2_5_VLProcessor.__call__ crashed with a ValueError when processing a batch of conversations with different lengths and padding=False (the default).

Root cause: The mm_token_type_ids block was calling np.array(text_inputs["input_ids"]) on the full batch at once. With padding=False, sequences have different lengths producing a ragged list-of-lists that NumPy cannot convert to a 2D array.

Fix: Process each sequence individually instead of the whole batch at once:

# Before
array_ids = np.array(text_inputs["input_ids"])          # crashes on ragged input
mm_token_type_ids = np.zeros_like(text_inputs["input_ids"])
mm_token_type_ids[array_ids == self.image_token_id] = 1
mm_token_type_ids[array_ids == self.video_token_id] = 2
text_inputs["mm_token_type_ids"] = mm_token_type_ids.tolist()

# After
mm_token_type_ids = [np.zeros(len(row), dtype=np.int64) for row in text_inputs["input_ids"]]
    for i, row_ids in enumerate(text_inputs["input_ids"]):
        row = np.array(row_ids)
        mm_token_type_ids[i][row == self.image_token_id] = 1
        mm_token_type_ids[i][row == self.video_token_id] = 2
    text_inputs["mm_token_type_ids"] = [row.tolist() for row in mm_token_type_ids]

The fix was applied to modular_qwen2_5_vl.py and processing_qwen2_5_vl.py was regenerated via:

python utils/modular_model_converter.py --files qwen2_5_vl

Before submitting

  • Was this discussed/approved via a Github issue — #44514
  • Did you write any new necessary tests? — 36 passed, 23 skipped

Who can review?

@zucchini-nlp (mentioned in the original issue)

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +6/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +6/-5)

PR #44531: Fix Qwen2_5_VLProcessor.apply_chat_template crash on unpadded batched input

Description (problem / solution / changelog)

What does this PR do?

Fixes Qwen2_5_VLProcessor.apply_chat_template crashing with a ValueError when called with a batch of conversations with different prompt lengths and padding=False (the default).

Root cause

In the mm_token_type_ids block, np.array(text_inputs["input_ids"]) is called on the full batch at once. With padding=False, sequences have different lengths, producing a ragged list-of-lists that NumPy cannot convert to a 2D array.

Fix

Process each sample individually so that both padded and unpadded inputs are handled correctly. The fix is applied to both modular_qwen2_5_vl.py (the source of truth) and the auto-generated processing_qwen2_5_vl.py.

Reproduction

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")
img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Before fix: crashes with ValueError
# After fix: works correctly
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

Fixes #44514

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)

PR #44563: Allow mm_token_type be non-padded lists

Description (problem / solution / changelog)

What does this PR do?

Split out mm_token_type_id creation to a separate utility and just call it in VLMs. Also make sure that mm_token_type_id can be created even when padding=False and the inputs are of different length. As long as the return_type is not an array, it should work

Fixes https://github.com/huggingface/transformers/issues/44545 and fixes https://github.com/huggingface/transformers/issues/44514

Changed files

  • src/transformers/models/aria/modular_aria.py (modified, +1/-5)
  • src/transformers/models/aria/processing_aria.py (modified, +1/-8)
  • src/transformers/models/aya_vision/processing_aya_vision.py (modified, +1/-7)
  • src/transformers/models/chameleon/processing_chameleon.py (modified, +3/-9)
  • src/transformers/models/cohere2_vision/processing_cohere2_vision.py (modified, +1/-7)
  • src/transformers/models/colmodernvbert/processing_colmodernvbert.py (modified, +22/-15)
  • src/transformers/models/emu3/processing_emu3.py (modified, +1/-7)
  • src/transformers/models/florence2/modular_florence2.py (modified, +1/-6)
  • src/transformers/models/florence2/processing_florence2.py (modified, +1/-7)
  • src/transformers/models/fuyu/processing_fuyu.py (modified, +3/-6)
  • src/transformers/models/gemma3/processing_gemma3.py (modified, +1/-8)
  • src/transformers/models/glm46v/processing_glm46v.py (modified, +22/-13)
  • src/transformers/models/glm4v/modular_glm4v.py (modified, +17/-8)
  • src/transformers/models/glm4v/processing_glm4v.py (modified, +22/-13)
  • src/transformers/models/glm_image/modular_glm_image.py (modified, +1/-5)
  • src/transformers/models/glm_image/processing_glm_image.py (modified, +1/-6)
  • src/transformers/models/idefics3/processing_idefics3.py (modified, +22/-15)
  • src/transformers/models/internvl/processing_internvl.py (modified, +3/-7)
  • src/transformers/models/lighton_ocr/modular_lighton_ocr.py (modified, +3/-7)
  • src/transformers/models/lighton_ocr/processing_lighton_ocr.py (modified, +3/-7)
  • src/transformers/models/llava/processing_llava.py (modified, +29/-13)
  • src/transformers/models/llava_next/processing_llava_next.py (modified, +1/-7)
  • src/transformers/models/llava_onevision/processing_llava_onevision.py (modified, +1/-5)
  • src/transformers/models/paddleocr_vl/modular_paddleocr_vl.py (modified, +1/-5)
  • src/transformers/models/paddleocr_vl/processing_paddleocr_vl.py (modified, +1/-8)
  • src/transformers/models/paligemma/processing_paligemma.py (modified, +1/-5)
  • src/transformers/models/perception_lm/processing_perception_lm.py (modified, +1/-7)
  • src/transformers/models/pixtral/processing_pixtral.py (modified, +3/-6)
  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +1/-7)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +1/-9)
  • src/transformers/models/qwen2_vl/processing_qwen2_vl.py (modified, +1/-7)
  • src/transformers/models/qwen3_vl/modular_qwen3_vl.py (modified, +1/-6)
  • src/transformers/models/qwen3_vl/processing_qwen3_vl.py (modified, +1/-6)
  • src/transformers/models/video_llama_3/modular_video_llama_3.py (modified, +1/-6)
  • src/transformers/models/video_llama_3/processing_video_llama_3.py (modified, +1/-9)
  • src/transformers/processing_utils.py (modified, +18/-0)

Code Example

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Crashes
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

---

Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "transformers/processing_utils.py", line 1829, in apply_chat_template
    out = self(text=prompt, **kwargs)
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 148, in __call__
    array_ids = np.array(text_inputs["input_ids"])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape
after 1 dimensions. The detected shape was (2,) + inhomogeneous part.
RAW_BUFFERClick to expand / collapse

System Info

transformers 5.3.0

Who can help?

@zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Qwen2_5_VLProcessor.apply_chat_template raises a ValueError when called with a batch of conversations (different prompt lengths) and padding=False (the default). The crash originates from processing_qwen2_5_vl.py:148 which does np.array(text_inputs["input_ids"]), this fails when the tokenized sequences have different lengths because NumPy cannot create a homogeneous array from ragged lists.

Single-message inputs and batched inputs with padding=True both work fine.

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Crashes
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

Traceback

Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "transformers/processing_utils.py", line 1829, in apply_chat_template
    out = self(text=prompt, **kwargs)
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 148, in __call__
    array_ids = np.array(text_inputs["input_ids"])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape
after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Expected behavior

to work with padding=False

extent analysis

Fix Plan

1. Update transformers to the latest version

Ensure you're running the latest version of the transformers library, which may have fixed this issue.

pip install --upgrade transformers

2. Use padding=True or padding="max_length" for batched inputs

If you need to process batched inputs with different lengths, use padding=True or padding="max_length" to pad the sequences to the maximum length.

result = processor.apply_chat_template(messages, tokenize=True, return_dict=True, padding="max_length")

3. Use torch instead of numpy for ragged arrays

If you need to work with ragged arrays, consider using torch instead of numpy. You can convert the ragged array to a torch tensor using torch.tensor().

import torch

array_ids = torch.tensor(text_inputs["input_ids"])

4. Use transformers' built-in support for ragged arrays

The transformers library has built-in support for ragged arrays. You can use the RaggedTensor class to create a ragged tensor from a list of sequences.

from transformers import RaggedTensor

ragged_tensor = RaggedTensor.from_sequence(text_inputs["input_ids"])

Verification

  1. Run the code with the fix and verify that it works without crashing.
  2. Check the output to ensure it's correct.
  3. Test the code with different inputs to ensure it's robust.

Extra Tips

  • Always check the latest version of the transformers library to see if the issue has been fixed.
  • Use padding=True or padding="max_length" to pad sequences to the maximum length when working with batched inputs.
  • Consider using torch instead of `

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

to work with padding=False

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix `Qwen2_5_VLProcessor.apply_chat_template` crashes on batched input when `padding=False` [4 pull requests, 1 comments, 2 participants]