transformers - 💡(How to fix) Fix Qwen3-vl-embedding Video Error "StopIteration" in transformers 5.3.0 [3 comments, 2 participants]

transformers2026-03-10 08:30:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

huggingface/transformers#44560•Fetched 2026-04-08 00:27:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

QYQTexas

Participants

QYQTexas

zucchini-nlp

Timeline (top)

commented ×3closed ×1labeled ×1mentioned ×1

Error Message

Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1343.66it/s] qwen-vl-utils using decord to read video. Traceback (most recent call last): File "/root/graduation_project/test_video_embedding_local.py", line 23, in <module> embeddings = model.process(inputs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 387, in process outputs = self.forward(processed_inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 193, in forward outputs = self.model(**inputs) ^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 99, in forward outputs = self.model( ^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 843, in wrapper output = func(self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1358, in forward position_ids = self.compute_3d_position_ids( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1252, in compute_3d_position_ids position_ids, rope_deltas = self.get_rope_index( ^^^^^^^^^^^^^^^^^^^^ File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1135, in get_rope_index grid_thw = next(grid_iters[modality_type]) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ StopIteration

Code Example

import torch
from Qwen3_VL_Embedding.src.models.qwen3_vl_embedding import Qwen3VLEmbedder

model = Qwen3VLEmbedder(
    model_name_or_path="./models/Qwen3-VL-Embedding-2B",
    # flash_attention_2 for better acceleration and memory saving
    # torch_dtype=torch.bfloat16, 
    # attn_implementation="flash_attention_2"
)

inputs = [
    ## Official example
#     {
#     "text": "A woman playing with her dog on a beach at sunset.",
#     "instruction": "Retrieve images or text relevant to the user's query.",
# }, {
#     "text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust."
# }, {
#     "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
# }, {
#     "text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.", 
#     "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
# }

    ## What I try
{
    "video":'./data/video/1602562103_1491434826.mp4' # 30min
},{
    "video":"./data/video/115814632003756_35105146583.mp4" # 3min
}
]

embeddings = model.process(inputs)
print(embeddings)
print(embeddings @ embeddings.T)

---

Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1343.66it/s]
qwen-vl-utils using decord to read video.
Traceback (most recent call last):
  File "/root/graduation_project/test_video_embedding_local.py", line 23, in <module>
    embeddings = model.process(inputs)
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 387, in process
    outputs = self.forward(processed_inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 193, in forward
    outputs = self.model(**inputs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 99, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1358, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1252, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1135, in get_rope_index
    grid_thw = next(grid_iters[modality_type])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

RAW_BUFFERClick to expand / collapse

System Info

transformers version: 5.3.0 Platform: Windows11, WSL2, uv, vscode Python 3.12.13 (main, Mar 3 2026, 14:59:34) [Clang 21.1.4 ] on linux

Who can help?

@zucchini-nlp

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Following README of https://github.com/[QwenLM/Qwen3-VL-Embedding](https://github.com/QwenLM/Qwen3-VL-Embedding/tree/main)/tree/main
Using video for embedding.

import torch
from Qwen3_VL_Embedding.src.models.qwen3_vl_embedding import Qwen3VLEmbedder

model = Qwen3VLEmbedder(
    model_name_or_path="./models/Qwen3-VL-Embedding-2B",
    # flash_attention_2 for better acceleration and memory saving
    # torch_dtype=torch.bfloat16, 
    # attn_implementation="flash_attention_2"
)

inputs = [
    ## Official example
#     {
#     "text": "A woman playing with her dog on a beach at sunset.",
#     "instruction": "Retrieve images or text relevant to the user's query.",
# }, {
#     "text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust."
# }, {
#     "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
# }, {
#     "text": "A woman shares a joyful moment with her golden retriever on a sun-drenched beach at sunset, as the dog offers its paw in a heartwarming display of companionship and trust.", 
#     "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg"
# }

    ## What I try
{
    "video":'./data/video/1602562103_1491434826.mp4' # 30min
},{
    "video":"./data/video/115814632003756_35105146583.mp4" # 3min
}
]

embeddings = model.process(inputs)
print(embeddings)
print(embeddings @ embeddings.T)

Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:00<00:00, 1343.66it/s]
qwen-vl-utils using decord to read video.
Traceback (most recent call last):
  File "/root/graduation_project/test_video_embedding_local.py", line 23, in <module>
    embeddings = model.process(inputs)
                 ^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 387, in process
    outputs = self.forward(processed_inputs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 193, in forward
    outputs = self.model(**inputs)
              ^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/Qwen3_VL_Embedding/src/models/qwen3_vl_embedding.py", line 99, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/utils/generic.py", line 843, in wrapper
    output = func(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1358, in forward
    position_ids = self.compute_3d_position_ids(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1252, in compute_3d_position_ids
    position_ids, rope_deltas = self.get_rope_index(
                                ^^^^^^^^^^^^^^^^^^^^
  File "/root/graduation_project/.venv/lib/python3.12/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1135, in get_rope_index
    grid_thw = next(grid_iters[modality_type])
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

Expected behavior

There is no problem with v4.57.6, v5.1.0, v5.2.0 but v5.3.0

extent analysis

Problem Summary

The problem is a crash in the Qwen3_VL_Embedding model when processing video inputs with version 5.3.0 of the transformers library.

Root Cause Analysis

The root cause is likely a change in the transformers library that affects the get_rope_index method in the Qwen3_VL_Embedding model.

Fix Plan

Step 1: Downgrade transformers to version 5.2.0

Downgrade the transformers library to version 5.2.0, which is the last version that worked without issues.

pip install transformers==5.2.0

Step 2: Update code to handle StopIteration exception

Update the code to handle the StopIteration exception that is raised when the get_rope_index method is called.

try:
    grid_thw = next(grid_iters[modality_type])
except StopIteration:
    print("Error: Unable to get rope index")
    # Handle the error or provide a default value

Step 3: Verify the fix

Run the code again with the updated transformers version and verify that it works without crashing.

Verification

To verify that the fix worked, run the code with the updated transformers version and check that it completes without crashing. You can also add print statements or use a debugger to verify that the get_rope_index method is called correctly.

Extra Tips

Always check the release notes and changelogs for the library you are using to see if there are any known issues or changes that may affect your code.
When downgrading a library, make sure to also downgrade any dependencies that may have changed.
Consider using a virtual environment to isolate your project's dependencies and avoid conflicts with other projects.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

There is no problem with v4.57.6, v5.1.0, v5.2.0 but v5.3.0

#api #ssr #installation #tensor shape #autograd error #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

transformers - 💡(How to fix) Fix Qwen3-vl-embedding Video Error "StopIteration" in transformers 5.3.0 [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Problem Summary

Root Cause Analysis

Fix Plan

Step 1: Downgrade transformers to version 5.2.0

Step 2: Update code to handle StopIteration exception

Step 3: Verify the fix

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

TRENDING

transformers - 💡(How to fix) Fix Qwen3-vl-embedding Video Error "StopIteration" in transformers 5.3.0 [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

extent analysis

Problem Summary

Root Cause Analysis

Fix Plan

Step 1: Downgrade transformers to version 5.2.0

Step 2: Update code to handle StopIteration exception

Step 3: Verify the fix

Verification

Extra Tips

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING