transformers - 💡(How to fix) Fix IndexError: The shape of the mask [...] at index 0 does not match the shape of the indexed tensor [...] at index 0 [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
huggingface/transformers#44805Fetched 2026-04-08 00:52:51
View on GitHub
Comments
2
Participants
2
Timeline
10
Reactions
0
Author
Timeline (top)
mentioned ×3subscribed ×3commented ×2closed ×1

Error Message

0%| | 0/117 [00:00<?, ?it/s]Traceback (most recent call last): File "/opt/ml/models/train_hf_bug.py", line 184, in <module> trainer.train() File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1424, in train return inner_training_loop( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1506, in _inner_training_loop self._run_epoch( File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1734, in _run_epoch tr_loss_step = self.training_step(model, inputs, num_items_in_batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1320, in training_step return super().training_step(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1906, in training_step loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1216, in compute_loss (loss, outputs) = super().compute_loss( ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1978, in compute_loss outputs = model(**inputs) ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 823, in forward return model_forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 811, in call return convert_to_fp32(self.model_forward(*args, **kwargs)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/peft/peft_model.py", line 1923, in forward return self.base_model( ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward return self.model.forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper output = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1522, in forward outputs = self.model( ^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper output = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1348, in forward position_ids = self.compute_3d_position_ids( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1242, in compute_3d_position_ids position_ids, rope_deltas = self.get_rope_index( ^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1105, in get_rope_index input_token_type = input_token_type[attention_mask[batch_idx].bool()] ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IndexError: The shape of the mask [2041] at index 0 does not match the shape of the indexed tensor [1010] at index 0 0%| | 0/117 [00:01<?, ?it/s]

Fix Action

Fix / Workaround

  0%|                                                                                                                           | 0/117 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/ml/models/train_hf_bug.py", line 184, in <module>
    trainer.train()
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1424, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1506, in _inner_training_loop
    self._run_epoch(
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1734, in _run_epoch
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1320, in training_step
    return super().training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1906, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1216, in compute_loss
    (loss, outputs) = super().compute_loss(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1978, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/peft_model.py", line 1923, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1522, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1348, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1242, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1105, in get_rope_index
    input_token_type = input_token_type[attention_mask[batch_idx].bool()]
                       ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: The shape of the mask [2041] at index 0 does not match the shape of the indexed tensor [1010] at index 0
  0%|          | 0/117 [00:01<?, ?it/s]

If I downgrade to transformers 5.2.0, it works just fine. I only see this error on 5.3.0 and I also see it with a model of a different architecture (GLM-4.6V-Flash).

I attempted the workaround here but I still get the error with gradient checkpointing disabled and use cache enabled, but the stack trace is slightly different:

Code Example

- `transformers` version: 5.3.0
- Platform: Linux-5.15.0-164-generic-x86_64-with-glibc2.35
- Python version: 3.12.13
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: 1.13.0
- Accelerate config:    not found
- DeepSpeed version: 0.16.4
- PyTorch version (accelerator?): 2.10.0+cu129 (CUDA)
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: yes
- GPU type: NVIDIA GeForce RTX 3090

---

warmup_steps = 0
max_seq_length = 2560

processor = AutoProcessor.from_pretrained(model_name)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_name,
    dtype=(
        torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
    ),
)

peft_config = LoraConfig(
    r=64,
    lora_alpha=128,
    target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
            "o_proj",
        ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# ────────────────────────────────────────────────
#  DATASET (customize preprocessing!)
# ────────────────────────────────────────────────
train_dataset = load_dataset("train")
eval_dataset = load_dataset("test")


# ────────────────────────────────────────────────
#  TRAINING ARGUMENTS
# ────────────────────────────────────────────────
training_args = SFTConfig(
    output_dir=output_dir,
    num_train_epochs=num_epochs,
    per_device_train_batch_size=per_device_batch_size,
    per_device_eval_batch_size=per_device_batch_size,
    learning_rate=learning_rate,
    warmup_steps=warmup_steps,
    eval_strategy="steps",
    save_strategy="steps",
    save_steps=int(len(train_dataset) / 2),
    eval_steps=int(len(train_dataset) / 2),
    logging_steps=10,
    metric_for_best_model="eval_loss",
    load_best_model_at_end=True,
    max_grad_norm=1,
    ddp_find_unused_parameters=False,
    dataset_kwargs={"skip_prepare_dataset": True},
    remove_unused_columns=False,
    optim="adamw_torch",
    completion_only_loss=True,
    max_length=max_seq_length
)

# ────────────────────────────────────────────────
#  TRAIN
# ────────────────────────────────────────────────


trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    processing_class=processor,
)

trainer.train()

---

0%|                                                                                                                           | 0/117 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/ml/models/train_hf_bug.py", line 184, in <module>
    trainer.train()
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1424, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1506, in _inner_training_loop
    self._run_epoch(
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1734, in _run_epoch
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1320, in training_step
    return super().training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1906, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1216, in compute_loss
    (loss, outputs) = super().compute_loss(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1978, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/peft_model.py", line 1923, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1522, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1348, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1242, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1105, in get_rope_index
    input_token_type = input_token_type[attention_mask[batch_idx].bool()]
                       ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: The shape of the mask [2041] at index 0 does not match the shape of the indexed tensor [1010] at index 0
  0%|          | 0/117 [00:01<?, ?it/s]

---

0%|                                                                                                                           | 0/117 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/ml/models/train_hf_bug.py", line 185, in <module>
    trainer.train()
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1424, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1506, in _inner_training_loop
    self._run_epoch(
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1734, in _run_epoch
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1320, in training_step
    return super().training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1906, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1216, in compute_loss
    (loss, outputs) = super().compute_loss(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1978, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/peft_model.py", line 1923, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1522, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1348, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1242, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1105, in get_rope_index
    input_token_type = input_token_type[attention_mask[batch_idx].bool()]
                       ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: The shape of the mask [2041] at index 0 does not match the shape of the indexed tensor [1010] at index 0
RAW_BUFFERClick to expand / collapse

System Info

- `transformers` version: 5.3.0
- Platform: Linux-5.15.0-164-generic-x86_64-with-glibc2.35
- Python version: 3.12.13
- Huggingface_hub version: 1.7.1
- Safetensors version: 0.7.0
- Accelerate version: 1.13.0
- Accelerate config:    not found
- DeepSpeed version: 0.16.4
- PyTorch version (accelerator?): 2.10.0+cu129 (CUDA)
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: yes
- GPU type: NVIDIA GeForce RTX 3090

Who can help?

@SunMarc @zucchini-nlp

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

I'm attempting to train a Lora adapter on a local copy of Qwen3-VL-4B-Instruct with a config like this:

warmup_steps = 0
max_seq_length = 2560

processor = AutoProcessor.from_pretrained(model_name)
model = Qwen3VLForConditionalGeneration.from_pretrained(
    model_name,
    dtype=(
        torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
    ),
)

peft_config = LoraConfig(
    r=64,
    lora_alpha=128,
    target_modules=[
            "q_proj",
            "k_proj",
            "v_proj",
            "o_proj",
        ],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# ────────────────────────────────────────────────
#  DATASET (customize preprocessing!)
# ────────────────────────────────────────────────
train_dataset = load_dataset("train")
eval_dataset = load_dataset("test")


# ────────────────────────────────────────────────
#  TRAINING ARGUMENTS
# ────────────────────────────────────────────────
training_args = SFTConfig(
    output_dir=output_dir,
    num_train_epochs=num_epochs,
    per_device_train_batch_size=per_device_batch_size,
    per_device_eval_batch_size=per_device_batch_size,
    learning_rate=learning_rate,
    warmup_steps=warmup_steps,
    eval_strategy="steps",
    save_strategy="steps",
    save_steps=int(len(train_dataset) / 2),
    eval_steps=int(len(train_dataset) / 2),
    logging_steps=10,
    metric_for_best_model="eval_loss",
    load_best_model_at_end=True,
    max_grad_norm=1,
    ddp_find_unused_parameters=False,
    dataset_kwargs={"skip_prepare_dataset": True},
    remove_unused_columns=False,
    optim="adamw_torch",
    completion_only_loss=True,
    max_length=max_seq_length
)

# ────────────────────────────────────────────────
#  TRAIN
# ────────────────────────────────────────────────


trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    peft_config=peft_config,
    processing_class=processor,
)

trainer.train()

As soon as training starts I get this error:

  0%|                                                                                                                           | 0/117 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/ml/models/train_hf_bug.py", line 184, in <module>
    trainer.train()
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1424, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1506, in _inner_training_loop
    self._run_epoch(
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1734, in _run_epoch
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1320, in training_step
    return super().training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1906, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1216, in compute_loss
    (loss, outputs) = super().compute_loss(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1978, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/peft_model.py", line 1923, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1522, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1348, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1242, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1105, in get_rope_index
    input_token_type = input_token_type[attention_mask[batch_idx].bool()]
                       ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: The shape of the mask [2041] at index 0 does not match the shape of the indexed tensor [1010] at index 0
  0%|          | 0/117 [00:01<?, ?it/s]

If I downgrade to transformers 5.2.0, it works just fine. I only see this error on 5.3.0 and I also see it with a model of a different architecture (GLM-4.6V-Flash).

My dataset is custom, but laid out in prompt-completion format with an "image" column with a single PIL image. The dataset was working (as was the rest of the code) on earlier transformers version.

I attempted the workaround here but I still get the error with gradient checkpointing disabled and use cache enabled, but the stack trace is slightly different:

  0%|                                                                                                                           | 0/117 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/opt/ml/models/train_hf_bug.py", line 185, in <module>
    trainer.train()
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1424, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1506, in _inner_training_loop
    self._run_epoch(
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1734, in _run_epoch
    tr_loss_step = self.training_step(model, inputs, num_items_in_batch)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1320, in training_step
    return super().training_step(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1906, in training_step
    loss = self.compute_loss(model, inputs, num_items_in_batch=num_items_in_batch)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/trainer/sft_trainer.py", line 1216, in compute_loss
    (loss, outputs) = super().compute_loss(
                      ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 1978, in compute_loss
    outputs = model(**inputs)
              ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 823, in forward
    return model_forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/accelerate/utils/operations.py", line 811, in __call__
    return convert_to_fp32(self.model_forward(*args, **kwargs))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/peft_model.py", line 1923, in forward
    return self.base_model(
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/peft/tuners/tuners_utils.py", line 311, in forward
    return self.model.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1522, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1348, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1242, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1105, in get_rope_index
    input_token_type = input_token_type[attention_mask[batch_idx].bool()]
                       ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
IndexError: The shape of the mask [2041] at index 0 does not match the shape of the indexed tensor [1010] at index 0

I'm using the latest trl available (0.29.0).

Expected behavior

Training should run without error.

extent analysis

Fix Plan

The issue seems to be related to the transformers version 5.3.0 and the way it handles attention masks. To fix this, you can try the following:

  • Downgrade transformers to version 5.2.0, which is known to work.
  • Alternatively, you can try to modify the compute_3d_position_ids method in transformers/models/qwen3_vl/modeling_qwen3_vl.py to handle the attention mask shape mismatch.

Here is an example of how you can modify the compute_3d_position_ids method:

def compute_3d_position_ids(self, attention_mask, input_token_type):
    # Add this line to handle the attention mask shape mismatch
    attention_mask = attention_mask[:, :input_token_type.shape[1]]
    position_ids, rope_deltas = self.get_rope_index(attention_mask, input_token_type)
    return position_ids

You can also try to modify the get_rope_index method to handle the attention mask shape mismatch:

def get_rope_index(self, attention_mask, input_token_type):
    # Add this line to handle the attention mask shape mismatch
    attention_mask = attention_mask[:, :input_token_type.shape[1]]
    input_token_type = input_token_type[attention_mask.bool()]
    # Rest of the method remains the same

Verification

To verify that the fix worked, you can run the training script again and check if the error is resolved. You can also add some debug prints to the modified methods to ensure that the attention mask shape mismatch is handled correctly.

Extra Tips

  • Make sure to test the modified code thoroughly to ensure that it does not introduce any new issues.
  • If you are using a custom dataset, make sure that the attention mask shape mismatch is not caused by the dataset itself.
  • You can also try to update trl to the latest version to see if the issue is resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Training should run without error.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING