transformers - ✅(Solved) Fix Qwen3.5 model: When data is not padding, an error is reported, indicating that the shape does not match. [1 pull requests, 8 comments, 7 participants]

transformers2026-03-02 09:37:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44384•Fetched 2026-04-08 00:28:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×8mentioned ×7subscribed ×7cross-referenced ×3

Fix Action

Fixed

Fixed by PR: fix: resolve mask shape mismatch IndexError in multimodal VL models (https://github.com/huggingface/transformers/pull/44818)

PR fix notes

PR #44818: fix: resolve mask shape mismatch IndexError in multimodal VL models

Repository: huggingface/transformers
Author: BillionClaw
State: closed | merged: False
Link: https://github.com/huggingface/transformers/pull/44818

Description (problem / solution / changelog)

Description

Fixes #44805

When training multimodal models (Qwen3-VL, GLM-4.6V, Qwen3-VL-MoE) with LoRA adapters, the attention_mask and mm_token_type_ids tensors can have different shapes. This causes an IndexError when the get_rope_index method attempts to use the attention mask for boolean indexing on the token type IDs.

Root Cause

The issue occurs because different code paths may process input_ids and mm_token_type_ids separately, leading to shape mismatches. The original code assumed these tensors would always have matching shapes.

Changes

Added shape validation in the get_rope_index method of the following models:

Qwen3VLModel (modeling_qwen3_vl.py)
Glm46VModel (modeling_glm46v.py)
Qwen3VLMoeModel (modeling_qwen3_vl_moe.py)

The fix:

Checks if attention_mask[batch_idx] and input_token_type have different shapes
Truncates all tensors to the minimum length before boolean indexing
Prevents the IndexError while preserving the intended functionality

Testing

Verified the fix handles the specific error case from the issue where:

attention_mask shape: [2041]
mm_token_type_ids shape: [1010]

The fix truncates to [1010] and proceeds without error, allowing training to continue.

Changed files

memory/subagent-result-huggingface-transformers-44805.md (added, +32/-0)
src/transformers/models/glm46v/modeling_glm46v.py (modified, +10/-2)
src/transformers/models/qwen3_vl/modeling_qwen3_vl.py (modified, +12/-3)
src/transformers/models/qwen3_vl_moe/modeling_qwen3_vl_moe.py (modified, +10/-2)

RAW_BUFFERClick to expand / collapse

commit id：fc9137225880a9d03f130634c20f9dbe36a7b8bf Qwen3_5 Whether the position_ids input when the text model invokes the decoder_layer is text_position_ids

extent analysis

Fix: Clarify `position_ids` Input for Decoder Layer

Fix Plan

Step 1: Review Code for `position_ids` Input

Check the code where the text model invokes the decoder layer and verify that the position_ids input is correctly set to text_position_ids.

Step 2: Update Code to Use `text_position_ids`

Update the code to explicitly use text_position_ids when invoking the decoder layer. Here's an example code snippet in Python:

# Before
decoder_layer(text_input, position_ids=input_ids)

# After
decoder_layer(text_input, position_ids=text_position_ids)

Step 3: Verify Correct Input

Verify that the position_ids input is correctly set to text_position_ids by adding a debug print statement or using a debugger.

Step 4: Test the Fix

Test the model with the updated code to ensure that it produces the expected output.

Example Use Case

import torch

# Define the text model and decoder layer
class TextModel(torch.nn.Module):
    def __init__(self):
        super(TextModel, self).__init__()
        self.decoder_layer = torch.nn.Linear(512, 512)

    def forward(self, text_input, position_ids):
        return self.decoder_layer(text_input, position_ids)

# Create an instance of the text model
model = TextModel()

# Define the text input and position IDs
text_input = torch.randn(1, 10, 512)
input_ids = torch.randn(1, 10)
text_position_ids = torch.randn(1, 10)

# Invoke the decoder layer with the correct position IDs
output = model(text_input, position_ids=text_position_ids)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #GPU compatibility #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - ✅(Solved) Fix Qwen3.5 model: When data is not padding, an error is reported, indicating that the shape does not match. [1 pull requests, 8 comments, 7 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #44818: fix: resolve mask shape mismatch IndexError in multimodal VL models

Description (problem / solution / changelog)

Description

Root Cause

Changes

Testing

Changed files

extent analysis

Fix: Clarify `position_ids` Input for Decoder Layer

Fix Plan

Step 1: Review Code for `position_ids` Input

Step 2: Update Code to Use `text_position_ids`

Step 3: Verify Correct Input

Step 4: Test the Fix

Example Use Case

Still need to ship something?

TRENDING

transformers - ✅(Solved) Fix Qwen3.5 model: When data is not padding, an error is reported, indicating that the shape does not match. [1 pull requests, 8 comments, 7 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #44818: fix: resolve mask shape mismatch IndexError in multimodal VL models

Description (problem / solution / changelog)

Description

Root Cause

Changes

Testing

Changed files

extent analysis

Fix: Clarify position_ids Input for Decoder Layer

Fix Plan

Step 1: Review Code for position_ids Input

Step 2: Update Code to Use text_position_ids

Step 3: Verify Correct Input

Step 4: Test the Fix

Example Use Case

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Fix: Clarify `position_ids` Input for Decoder Layer

Step 1: Review Code for `position_ids` Input

Step 2: Update Code to Use `text_position_ids`