transformers - ✅(Solved) Fix Qwen3.5 model: When data is not padding, an error is reported, indicating that the shape does not match. [1 pull requests, 8 comments, 7 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44384Fetched 2026-04-08 00:28:50
View on GitHub
Comments
8
Participants
7
Timeline
28
Reactions
0
Author
Timeline (top)
commented ×8mentioned ×7subscribed ×7cross-referenced ×3

Fix Action

Fixed

PR fix notes

PR #44818: fix: resolve mask shape mismatch IndexError in multimodal VL models

Description (problem / solution / changelog)

Description

Fixes #44805

When training multimodal models (Qwen3-VL, GLM-4.6V, Qwen3-VL-MoE) with LoRA adapters, the attention_mask and mm_token_type_ids tensors can have different shapes. This causes an IndexError when the get_rope_index method attempts to use the attention mask for boolean indexing on the token type IDs.

Root Cause

The issue occurs because different code paths may process input_ids and mm_token_type_ids separately, leading to shape mismatches. The original code assumed these tensors would always have matching shapes.

Changes

Added shape validation in the get_rope_index method of the following models:

  • Qwen3VLModel (modeling_qwen3_vl.py)
  • Glm46VModel (modeling_glm46v.py)
  • Qwen3VLMoeModel (modeling_qwen3_vl_moe.py)

The fix:

  1. Checks if attention_mask[batch_idx] and input_token_type have different shapes
  2. Truncates all tensors to the minimum length before boolean indexing
  3. Prevents the IndexError while preserving the intended functionality

Testing

Verified the fix handles the specific error case from the issue where:

  • attention_mask shape: [2041]
  • mm_token_type_ids shape: [1010]

The fix truncates to [1010] and proceeds without error, allowing training to continue.

Changed files

  • memory/subagent-result-huggingface-transformers-44805.md (added, +32/-0)
  • src/transformers/models/glm46v/modeling_glm46v.py (modified, +10/-2)
  • src/transformers/models/qwen3_vl/modeling_qwen3_vl.py (modified, +12/-3)
  • src/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py (modified, +10/-2)
RAW_BUFFERClick to expand / collapse

commit id:fc9137225880a9d03f130634c20f9dbe36a7b8bf Qwen3_5 Whether the position_ids input when the text model invokes the decoder_layer is text_position_ids

extent analysis

Fix: Clarify position_ids Input for Decoder Layer

Fix Plan

Step 1: Review Code for position_ids Input

Check the code where the text model invokes the decoder layer and verify that the position_ids input is correctly set to text_position_ids.

Step 2: Update Code to Use text_position_ids

Update the code to explicitly use text_position_ids when invoking the decoder layer. Here's an example code snippet in Python:

# Before
decoder_layer(text_input, position_ids=input_ids)

# After
decoder_layer(text_input, position_ids=text_position_ids)

Step 3: Verify Correct Input

Verify that the position_ids input is correctly set to text_position_ids by adding a debug print statement or using a debugger.

Step 4: Test the Fix

Test the model with the updated code to ensure that it produces the expected output.

Example Use Case

import torch

# Define the text model and decoder layer
class TextModel(torch.nn.Module):
    def __init__(self):
        super(TextModel, self).__init__()
        self.decoder_layer = torch.nn.Linear(512, 512)

    def forward(self, text_input, position_ids):
        return self.decoder_layer(text_input, position_ids)

# Create an instance of the text model
model = TextModel()

# Define the text input and position IDs
text_input = torch.randn(1, 10, 512)
input_ids = torch.randn(1, 10)
text_position_ids = torch.randn(1, 10)

# Invoke the decoder layer with the correct position IDs
output = model(text_input, position_ids=text_position_ids)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

transformers - ✅(Solved) Fix Qwen3.5 model: When data is not padding, an error is reported, indicating that the shape does not match. [1 pull requests, 8 comments, 7 participants]