transformers - ✅(Solved) Fix `Qwen2_5_VLProcessor.apply_chat_template` crashes on batched input when `padding=False` [4 pull requests, 1 comments, 2 participants]

transformers2026-03-07 17:03:20

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44514•Fetched 2026-04-08 00:27:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

qgallouedec

Participants

KartikPawade

qgallouedec

Timeline (top)

cross-referenced ×6referenced ×3subscribed ×2closed ×1

Error Message

Traceback (most recent call last): File "<string>", line 14, in <module> File "transformers/processing_utils.py", line 1829, in apply_chat_template out = self(text=prompt, **kwargs) File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 148, in call array_ids = np.array(text_inputs["input_ids"]) ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Root Cause

Qwen2_5_VLProcessor.apply_chat_template raises a ValueError when called with a batch of conversations (different prompt lengths) and padding=False (the default). The crash originates from processing_qwen2_5_vl.py:148 which does np.array(text_inputs["input_ids"]), this fails when the tokenized sequences have different lengths because NumPy cannot create a homogeneous array from ragged lists.

Fix Action

Fixed

Fixed by PR: fix(qwen2_5_vl): handle ragged batched input in apply_chat_template (https://github.com/huggingface/transformers/pull/44516)
Fixed by PR: fix: Qwen2_5_VLProcessor crashes on batched input when padding=False … (https://github.com/huggingface/transformers/pull/44518)
Fixed by PR: Fix Qwen2_5_VLProcessor.apply_chat_template crash on unpadded batched input (https://github.com/huggingface/transformers/pull/44531)
Fixed by PR: Allow mm_token_type be non-padded lists (https://github.com/huggingface/transformers/pull/44563)

PR fix notes

PR #44516: fix(qwen2_5_vl): handle ragged batched input in apply_chat_template

Repository: huggingface/transformers
Author: JasonCZMeng
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/44516

Description (problem / solution / changelog)

Summary

Fix Qwen2_5_VLProcessor.apply_chat_template crashing with ValueError when called with batched inputs of different sequence lengths (ragged lists) and padding=False (the default).

Fixes #44514

Root Cause

The mm_token_type_ids construction calls np.array(text_inputs["input_ids"]) which fails when the tokenized sequences have different lengths, because NumPy cannot create a homogeneous array from ragged lists:

ValueError: setting an array element with a sequence. The requested array
has an inhomogeneous shape after 1 dimensions.

Since return_mm_token_type_ids defaults to True, this code path always runs.

Fix

Process each sequence individually instead of trying to create a single 2D array:

mm_token_type_ids = []
for seq_ids in text_inputs["input_ids"]:
    seq_array = np.array(seq_ids)
    seq_mm_ids = np.zeros_like(seq_array)
    seq_mm_ids[seq_array == self.image_token_id] = 1
    seq_mm_ids[seq_array == self.video_token_id] = 2
    mm_token_type_ids.append(seq_mm_ids.tolist())
text_inputs["mm_token_type_ids"] = mm_token_type_ids

This produces the same output format (list of lists) and works for both equal-length and ragged batches.

Changes

modular_qwen2_5_vl.py — Updated source of truth with per-sequence processing
processing_qwen2_5_vl.py — Updated generated file with same fix (couldn't run modular converter locally due to missing libcst)
test_processing_qwen2_5_vl.py — Added test_apply_chat_template_ragged_batched_mm_token_type_ids covering single input and ragged batched input, verifying mm_token_type_ids length matches input_ids per sequence

Changed files

src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)
tests/models/qwen2_5_vl/test_processing_qwen2_5_vl.py (modified, +49/-0)

PR #44518: fix: Qwen2_5_VLProcessor crashes on batched input when padding=False …

Repository: huggingface/transformers
Author: KartikPawade
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/44518

Description (problem / solution / changelog)

What does this PR do?

Fixes #44514

Qwen2_5_VLProcessor.__call__ crashed with a ValueError when processing a batch of conversations with different lengths and padding=False (the default).

Root cause: The mm_token_type_ids block was calling np.array(text_inputs["input_ids"]) on the full batch at once. With padding=False, sequences have different lengths producing a ragged list-of-lists that NumPy cannot convert to a 2D array.

Fix: Process each sequence individually instead of the whole batch at once:

# Before
array_ids = np.array(text_inputs["input_ids"])          # crashes on ragged input
mm_token_type_ids = np.zeros_like(text_inputs["input_ids"])
mm_token_type_ids[array_ids == self.image_token_id] = 1
mm_token_type_ids[array_ids == self.video_token_id] = 2
text_inputs["mm_token_type_ids"] = mm_token_type_ids.tolist()

# After
mm_token_type_ids = [np.zeros(len(row), dtype=np.int64) for row in text_inputs["input_ids"]]
    for i, row_ids in enumerate(text_inputs["input_ids"]):
        row = np.array(row_ids)
        mm_token_type_ids[i][row == self.image_token_id] = 1
        mm_token_type_ids[i][row == self.video_token_id] = 2
    text_inputs["mm_token_type_ids"] = [row.tolist() for row in mm_token_type_ids]

The fix was applied to modular_qwen2_5_vl.py and processing_qwen2_5_vl.py was regenerated via:

python utils/modular_model_converter.py --files qwen2_5_vl

Before submitting

Was this discussed/approved via a Github issue — #44514
Did you write any new necessary tests? — 36 passed, 23 skipped

Who can review?

@zucchini-nlp (mentioned in the original issue)

Changed files

src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +6/-5)
src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +6/-5)

PR #44531: Fix Qwen2_5_VLProcessor.apply_chat_template crash on unpadded batched input

Repository: huggingface/transformers
Author: s-zx
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/44531

Description (problem / solution / changelog)

What does this PR do?

Fixes Qwen2_5_VLProcessor.apply_chat_template crashing with a ValueError when called with a batch of conversations with different prompt lengths and padding=False (the default).

Root cause

In the mm_token_type_ids block, np.array(text_inputs["input_ids"]) is called on the full batch at once. With padding=False, sequences have different lengths, producing a ragged list-of-lists that NumPy cannot convert to a 2D array.

Fix

Process each sample individually so that both padded and unpadded inputs are handled correctly. The fix is applied to both modular_qwen2_5_vl.py (the source of truth) and the auto-generated processing_qwen2_5_vl.py.

Reproduction

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")
img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Before fix: crashes with ValueError
# After fix: works correctly
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

Fixes #44514

Changed files

src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)

PR #44563: Allow `mm_token_type` be non-padded lists

Repository: huggingface/transformers
Author: zucchini-nlp
State: open | merged: False
Link: https://github.com/huggingface/transformers/pull/44563

Description (problem / solution / changelog)

What does this PR do?

Split out mm_token_type_id creation to a separate utility and just call it in VLMs. Also make sure that mm_token_type_id can be created even when padding=False and the inputs are of different length. As long as the return_type is not an array, it should work

Fixes https://github.com/huggingface/transformers/issues/44545 and fixes https://github.com/huggingface/transformers/issues/44514

Changed files

src/transformers/models/aria/modular_aria.py (modified, +1/-5)
src/transformers/models/aria/processing_aria.py (modified, +1/-8)
src/transformers/models/aya_vision/processing_aya_vision.py (modified, +1/-7)
src/transformers/models/chameleon/processing_chameleon.py (modified, +3/-9)
src/transformers/models/cohere2_vision/processing_cohere2_vision.py (modified, +1/-7)
src/transformers/models/colmodernvbert/processing_colmodernvbert.py (modified, +22/-15)
src/transformers/models/emu3/processing_emu3.py (modified, +1/-7)
src/transformers/models/florence2/modular_florence2.py (modified, +1/-6)
src/transformers/models/florence2/processing_florence2.py (modified, +1/-7)
src/transformers/models/fuyu/processing_fuyu.py (modified, +3/-6)
src/transformers/models/gemma3/processing_gemma3.py (modified, +1/-8)
src/transformers/models/glm46v/processing_glm46v.py (modified, +22/-13)
src/transformers/models/glm4v/modular_glm4v.py (modified, +17/-8)
src/transformers/models/glm4v/processing_glm4v.py (modified, +22/-13)
src/transformers/models/glm_image/modular_glm_image.py (modified, +1/-5)
src/transformers/models/glm_image/processing_glm_image.py (modified, +1/-6)
src/transformers/models/idefics3/processing_idefics3.py (modified, +22/-15)
src/transformers/models/internvl/processing_internvl.py (modified, +3/-7)
src/transformers/models/lighton_ocr/modular_lighton_ocr.py (modified, +3/-7)
src/transformers/models/lighton_ocr/processing_lighton_ocr.py (modified, +3/-7)
src/transformers/models/llava/processing_llava.py (modified, +29/-13)
src/transformers/models/llava_next/processing_llava_next.py (modified, +1/-7)
src/transformers/models/llava_onevision/processing_llava_onevision.py (modified, +1/-5)
src/transformers/models/paddleocr_vl/modular_paddleocr_vl.py (modified, +1/-5)
src/transformers/models/paddleocr_vl/processing_paddleocr_vl.py (modified, +1/-8)
src/transformers/models/paligemma/processing_paligemma.py (modified, +1/-5)
src/transformers/models/perception_lm/processing_perception_lm.py (modified, +1/-7)
src/transformers/models/pixtral/processing_pixtral.py (modified, +3/-6)
src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +1/-7)
src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +1/-9)
src/transformers/models/qwen2_vl/processing_qwen2_vl.py (modified, +1/-7)
src/transformers/models/qwen3_vl/modular_qwen3_vl.py (modified, +1/-6)
src/transformers/models/qwen3_vl/processing_qwen3_vl.py (modified, +1/-6)
src/transformers/models/video_llama_3/modular_video_llama_3.py (modified, +1/-6)
src/transformers/models/video_llama_3/processing_video_llama_3.py (modified, +1/-9)
src/transformers/processing_utils.py (modified, +18/-0)

Code Example

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Crashes
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

---

Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "transformers/processing_utils.py", line 1829, in apply_chat_template
    out = self(text=prompt, **kwargs)
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 148, in __call__
    array_ids = np.array(text_inputs["input_ids"])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape
after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

RAW_BUFFERClick to expand / collapse

System Info

transformers 5.3.0

Who can help?

@zucchini-nlp

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Single-message inputs and batched inputs with padding=True both work fine.

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")

img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Crashes
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

Traceback

Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "transformers/processing_utils.py", line 1829, in apply_chat_template
    out = self(text=prompt, **kwargs)
  File "transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py", line 148, in __call__
    array_ids = np.array(text_inputs["input_ids"])
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape
after 1 dimensions. The detected shape was (2,) + inhomogeneous part.

Expected behavior

to work with padding=False

extent analysis

Fix Plan

1. Update `transformers` to the latest version

Ensure you're running the latest version of the transformers library, which may have fixed this issue.

pip install --upgrade transformers

2. Use `padding=True` or `padding="max_length"` for batched inputs

If you need to process batched inputs with different lengths, use padding=True or padding="max_length" to pad the sequences to the maximum length.

result = processor.apply_chat_template(messages, tokenize=True, return_dict=True, padding="max_length")

3. Use `torch` instead of `numpy` for ragged arrays

If you need to work with ragged arrays, consider using torch instead of numpy. You can convert the ragged array to a torch tensor using torch.tensor().

import torch

array_ids = torch.tensor(text_inputs["input_ids"])

4. Use `transformers`' built-in support for ragged arrays

The transformers library has built-in support for ragged arrays. You can use the RaggedTensor class to create a ragged tensor from a list of sequences.

from transformers import RaggedTensor

ragged_tensor = RaggedTensor.from_sequence(text_inputs["input_ids"])

Verification

Run the code with the fix and verify that it works without crashing.
Check the output to ensure it's correct.
Test the code with different inputs to ensure it's robust.

Extra Tips

Always check the latest version of the transformers library to see if the issue has been fixed.
Use padding=True or padding="max_length" to pad sequences to the maximum length when working with batched inputs.
Consider using torch instead of `

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

to work with padding=False

#api #ssr #installation #tensor shape #autograd error #mixed precision #training loop #device allocation #model download #tokenizer error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

transformers - ✅(Solved) Fix `Qwen2_5_VLProcessor.apply_chat_template` crashes on batched input when `padding=False` [4 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #44516: fix(qwen2_5_vl): handle ragged batched input in apply_chat_template

Description (problem / solution / changelog)

Summary

Root Cause

Fix

Changes

Changed files

PR #44518: fix: Qwen2_5_VLProcessor crashes on batched input when padding=False …

Description (problem / solution / changelog)

What does this PR do?

Before submitting

Who can review?

Changed files

PR #44531: Fix Qwen2_5_VLProcessor.apply_chat_template crash on unpadded batched input

Description (problem / solution / changelog)

What does this PR do?

Root cause

Fix

Reproduction

Changed files

PR #44563: Allow mm_token_type be non-padded lists

Description (problem / solution / changelog)

What does this PR do?

Changed files

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Traceback

Expected behavior

extent analysis

Fix Plan

1. Update transformers to the latest version

2. Use padding=True or padding="max_length" for batched inputs

3. Use torch instead of numpy for ragged arrays

4. Use transformers' built-in support for ragged arrays

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

PR #44563: Allow `mm_token_type` be non-padded lists

1. Update `transformers` to the latest version

2. Use `padding=True` or `padding="max_length"` for batched inputs

3. Use `torch` instead of `numpy` for ragged arrays

4. Use `transformers`' built-in support for ragged arrays