transformers - ✅(Solved) Fix Qwen2_5_VLProcessor.apply_chat_template crashes on batched input when padding=False [5 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44545Fetched 2026-04-08 00:27:46
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
1
Timeline (top)
cross-referenced ×5commented ×2closed ×1

Error Message

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

batch_messages = [ [{"role": "user", "content": [{"type": "image", "image": "img1.jpg"}, {"type": "text", "text": "Describe."}]}], [{"role": "user", "content": [{"type": "image", "image": "img2.jpg"}, {"type": "text", "text": "What is this? Give a detailed answer."}]}], ]

processor.apply_chat_template(batch_messages, padding=False, tokenize=True, return_dict=True)

raises ValueError: setting an array element with a sequence

Root Cause

Root cause: mm_token_type_ids was built by calling np.array(text_inputs["input_ids"]) on a ragged list (variable-length sequences when padding=False). NumPy ≥ 1.24 rejects inhomogeneous shapes for this operation.

Fix Action

Fix

A fix is implemented in PR #44535.

PR fix notes

PR #44535: Fix crash in Qwen2_5_VLProcessor when using batched input with padding=False

Description (problem / solution / changelog)

Problem

Qwen2_5_VLProcessor.apply_chat_template raises ValueError: setting an array element with a sequence when called with a batch of ≥2 conversations that include images under the default padding=False setting.

Root cause: mm_token_type_ids was built by calling np.array(text_inputs["input_ids"]) on a ragged list (variable-length sequences when padding=False). NumPy ≥ 1.24 rejects inhomogeneous shapes for this operation.

Fix

Iterate per-sequence instead of constructing a 2D array from a ragged list. Each ids_arr = np.array(ids) call receives a 1-D list, so the shape is always homogeneous.

Changed in both:

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (auto-generated copy, manually synced since make is unavailable on Windows)

Test

Added test_batched_apply_chat_template_no_padding in tests/models/qwen2_5_vl/test_processing_qwen2_5_vl.py to guard against regression.

Closes #44545

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)
  • tests/models/qwen2_5_vl/test_processing_qwen2_5_vl.py (modified, +37/-0)

PR #44518: fix: Qwen2_5_VLProcessor crashes on batched input when padding=False …

Description (problem / solution / changelog)

What does this PR do?

Fixes #44514

Qwen2_5_VLProcessor.__call__ crashed with a ValueError when processing a batch of conversations with different lengths and padding=False (the default).

Root cause: The mm_token_type_ids block was calling np.array(text_inputs["input_ids"]) on the full batch at once. With padding=False, sequences have different lengths producing a ragged list-of-lists that NumPy cannot convert to a 2D array.

Fix: Process each sequence individually instead of the whole batch at once:

# Before
array_ids = np.array(text_inputs["input_ids"])          # crashes on ragged input
mm_token_type_ids = np.zeros_like(text_inputs["input_ids"])
mm_token_type_ids[array_ids == self.image_token_id] = 1
mm_token_type_ids[array_ids == self.video_token_id] = 2
text_inputs["mm_token_type_ids"] = mm_token_type_ids.tolist()

# After
mm_token_type_ids = [np.zeros(len(row), dtype=np.int64) for row in text_inputs["input_ids"]]
    for i, row_ids in enumerate(text_inputs["input_ids"]):
        row = np.array(row_ids)
        mm_token_type_ids[i][row == self.image_token_id] = 1
        mm_token_type_ids[i][row == self.video_token_id] = 2
    text_inputs["mm_token_type_ids"] = [row.tolist() for row in mm_token_type_ids]

The fix was applied to modular_qwen2_5_vl.py and processing_qwen2_5_vl.py was regenerated via:

python utils/modular_model_converter.py --files qwen2_5_vl

Before submitting

  • Was this discussed/approved via a Github issue — #44514
  • Did you write any new necessary tests? — 36 passed, 23 skipped

Who can review?

@zucchini-nlp (mentioned in the original issue)

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +6/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +6/-5)

PR #44531: Fix Qwen2_5_VLProcessor.apply_chat_template crash on unpadded batched input

Description (problem / solution / changelog)

What does this PR do?

Fixes Qwen2_5_VLProcessor.apply_chat_template crashing with a ValueError when called with a batch of conversations with different prompt lengths and padding=False (the default).

Root cause

In the mm_token_type_ids block, np.array(text_inputs["input_ids"]) is called on the full batch at once. With padding=False, sequences have different lengths, producing a ragged list-of-lists that NumPy cannot convert to a 2D array.

Fix

Process each sample individually so that both padded and unpadded inputs are handled correctly. The fix is applied to both modular_qwen2_5_vl.py (the source of truth) and the auto-generated processing_qwen2_5_vl.py.

Reproduction

from transformers import AutoProcessor
from PIL import Image

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-3B-Instruct")
img = Image.new("RGB", (64, 64), color="red")

messages = [
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "Describe this"}]}],
    [{"role": "user", "content": [{"type": "image", "image": img}, {"type": "text", "text": "What is this?"}]}],
]

# Before fix: crashes with ValueError
# After fix: works correctly
result = processor.apply_chat_template(messages, tokenize=True, return_dict=True)

Fixes #44514

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)

PR #44516: fix(qwen2_5_vl): handle ragged batched input in apply_chat_template

Description (problem / solution / changelog)

Summary

Fix Qwen2_5_VLProcessor.apply_chat_template crashing with ValueError when called with batched inputs of different sequence lengths (ragged lists) and padding=False (the default).

Fixes #44514

Root Cause

The mm_token_type_ids construction calls np.array(text_inputs["input_ids"]) which fails when the tokenized sequences have different lengths, because NumPy cannot create a homogeneous array from ragged lists:

ValueError: setting an array element with a sequence. The requested array
has an inhomogeneous shape after 1 dimensions.

Since return_mm_token_type_ids defaults to True, this code path always runs.

Fix

Process each sequence individually instead of trying to create a single 2D array:

mm_token_type_ids = []
for seq_ids in text_inputs["input_ids"]:
    seq_array = np.array(seq_ids)
    seq_mm_ids = np.zeros_like(seq_array)
    seq_mm_ids[seq_array == self.image_token_id] = 1
    seq_mm_ids[seq_array == self.video_token_id] = 2
    mm_token_type_ids.append(seq_mm_ids.tolist())
text_inputs["mm_token_type_ids"] = mm_token_type_ids

This produces the same output format (list of lists) and works for both equal-length and ragged batches.

Changes

  • modular_qwen2_5_vl.py — Updated source of truth with per-sequence processing
  • processing_qwen2_5_vl.py — Updated generated file with same fix (couldn't run modular converter locally due to missing libcst)
  • test_processing_qwen2_5_vl.py — Added test_apply_chat_template_ragged_batched_mm_token_type_ids covering single input and ragged batched input, verifying mm_token_type_ids length matches input_ids per sequence

Changed files

  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +8/-5)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +8/-5)
  • tests/models/qwen2_5_vl/test_processing_qwen2_5_vl.py (modified, +49/-0)

PR #44563: Allow mm_token_type be non-padded lists

Description (problem / solution / changelog)

What does this PR do?

Split out mm_token_type_id creation to a separate utility and just call it in VLMs. Also make sure that mm_token_type_id can be created even when padding=False and the inputs are of different length. As long as the return_type is not an array, it should work

Fixes https://github.com/huggingface/transformers/issues/44545 and fixes https://github.com/huggingface/transformers/issues/44514

Changed files

  • src/transformers/models/aria/modular_aria.py (modified, +1/-6)
  • src/transformers/models/aria/processing_aria.py (modified, +1/-7)
  • src/transformers/models/aya_vision/processing_aya_vision.py (modified, +1/-7)
  • src/transformers/models/chameleon/processing_chameleon.py (modified, +3/-9)
  • src/transformers/models/cohere2_vision/processing_cohere2_vision.py (modified, +1/-7)
  • src/transformers/models/colmodernvbert/processing_colmodernvbert.py (modified, +22/-15)
  • src/transformers/models/emu3/processing_emu3.py (modified, +1/-7)
  • src/transformers/models/florence2/modular_florence2.py (modified, +1/-6)
  • src/transformers/models/florence2/processing_florence2.py (modified, +1/-7)
  • src/transformers/models/fuyu/processing_fuyu.py (modified, +3/-6)
  • src/transformers/models/gemma3/processing_gemma3.py (modified, +1/-8)
  • src/transformers/models/glm46v/processing_glm46v.py (modified, +22/-13)
  • src/transformers/models/glm4v/modular_glm4v.py (modified, +17/-8)
  • src/transformers/models/glm4v/processing_glm4v.py (modified, +22/-13)
  • src/transformers/models/glm_image/image_processing_pil_glm_image.py (modified, +0/-1)
  • src/transformers/models/glm_image/modular_glm_image.py (modified, +1/-5)
  • src/transformers/models/glm_image/processing_glm_image.py (modified, +1/-6)
  • src/transformers/models/idefics3/processing_idefics3.py (modified, +22/-15)
  • src/transformers/models/internvl/processing_internvl.py (modified, +3/-7)
  • src/transformers/models/lighton_ocr/modular_lighton_ocr.py (modified, +3/-7)
  • src/transformers/models/lighton_ocr/processing_lighton_ocr.py (modified, +3/-7)
  • src/transformers/models/llava/processing_llava.py (modified, +1/-7)
  • src/transformers/models/llava_next/processing_llava_next.py (modified, +1/-7)
  • src/transformers/models/llava_onevision/processing_llava_onevision.py (modified, +1/-5)
  • src/transformers/models/paddleocr_vl/modular_paddleocr_vl.py (modified, +1/-5)
  • src/transformers/models/paddleocr_vl/processing_paddleocr_vl.py (modified, +1/-8)
  • src/transformers/models/paligemma/processing_paligemma.py (modified, +1/-5)
  • src/transformers/models/perception_lm/processing_perception_lm.py (modified, +1/-7)
  • src/transformers/models/pixtral/processing_pixtral.py (modified, +3/-6)
  • src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py (modified, +1/-7)
  • src/transformers/models/qwen2_5_vl/processing_qwen2_5_vl.py (modified, +1/-9)
  • src/transformers/models/qwen2_vl/processing_qwen2_vl.py (modified, +1/-7)
  • src/transformers/models/qwen3_vl/modular_qwen3_vl.py (modified, +1/-6)
  • src/transformers/models/qwen3_vl/processing_qwen3_vl.py (modified, +1/-6)
  • src/transformers/models/video_llama_3/modular_video_llama_3.py (modified, +1/-6)
  • src/transformers/models/video_llama_3/processing_video_llama_3.py (modified, +1/-9)
  • src/transformers/processing_utils.py (modified, +18/-0)

Code Example

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

batch_messages = [
    [{"role": "user", "content": [{"type": "image", "image": "img1.jpg"}, {"type": "text", "text": "Describe."}]}],
    [{"role": "user", "content": [{"type": "image", "image": "img2.jpg"}, {"type": "text", "text": "What is this? Give a detailed answer."}]}],
]

processor.apply_chat_template(batch_messages, padding=False, tokenize=True, return_dict=True)
# raises ValueError: setting an array element with a sequence
RAW_BUFFERClick to expand / collapse

Bug Description

Qwen2_5_VLProcessor.apply_chat_template raises ValueError: setting an array element with a sequence when processing a batch of ≥2 conversations that include images, under the default padding=False setting.

Root cause: mm_token_type_ids was built by calling np.array(text_inputs["input_ids"]) on a ragged list (variable-length sequences when padding=False). NumPy ≥ 1.24 rejects inhomogeneous shapes for this operation.

This is distinct from #44521, which concerns assistant_masks being all zeros for multimodal inputs.

Reproduction

from transformers import AutoProcessor

processor = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

batch_messages = [
    [{"role": "user", "content": [{"type": "image", "image": "img1.jpg"}, {"type": "text", "text": "Describe."}]}],
    [{"role": "user", "content": [{"type": "image", "image": "img2.jpg"}, {"type": "text", "text": "What is this? Give a detailed answer."}]}],
]

processor.apply_chat_template(batch_messages, padding=False, tokenize=True, return_dict=True)
# raises ValueError: setting an array element with a sequence

Expected Behavior

The processor should handle batched inputs without crashing when padding=False.

Fix

A fix is implemented in PR #44535.

extent analysis

Fix: Update NumPy Array Construction for Ragged Lists

Fix Plan

  1. Update mm_token_type_ids construction:

    • Replace np.array(text_inputs["input_ids"]) with np.array([input_id for input_id in text_inputs["input_ids"]]) to ensure a list of arrays is created instead of a single array with variable-length sequences.

import numpy as np

...

mm_token_type_ids = np.array([input_id for input_id in text_inputs["input_ids"]])


2. **Update `padding=False` handling**:
   - No additional changes are required as the fix above addresses the issue.

### Verification

- Run the reproduction test case with the updated code:
  ```python
processor.apply_chat_template(batch_messages, padding=False, tokenize=True, return_dict=True)
  • Verify that the test case passes without raising a ValueError.

Extra Tips

  • Ensure you're using the latest version of NumPy (≥ 1.24) to avoid similar issues.
  • Consider using torch.tensor() or tf.constant() instead of np.array() for more flexible handling of ragged lists.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix Qwen2_5_VLProcessor.apply_chat_template crashes on batched input when padding=False [5 pull requests, 2 comments, 2 participants]