pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][multimodal] Qwen2-VL multi-image generation diverges from HF reference in test_case43 [1 comments, 2 participants]

pytorch2026-04-22 20:34:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#181168•Fetched 2026-04-23 07:22:16

View on GitHub

Comments

Participants

Timeline

122

Reactions

Author

Participants

Assignees

Timeline (top)

mentioned ×55subscribed ×55labeled ×7unlabeled ×2

Under torch 2.12.0 + triton 3.7.0, vLLM's test_multi_image_models[qwen2_vl-test_case43] fails with an output divergence between the HF reference and vLLM:

AssertionError: Test1:

Matched tokens (divergence point is after "a large"): [785, 1156, 2168, 4933, 264, 2936, 1841, 448, 264, 2518, 4004, 323, 4158, 11931, 11, 9099, 304, 4065, 315, 264, 2518, 4752, 448, 264, 3460]

hf: 'The first image shows a stop sign ... placed in front of a red building with a large window. ... The second image features a beautiful view of a cherry blossom tree ...

vllm: 'The first image shows a stop sign ... placed in front of a red building with a large archway. ... The second image features a beautiful cherry blossom tree ...

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Root Cause

Under torch 2.12.0 + triton 3.7.0, vLLM's test_multi_image_models[qwen2_vl-test_case43] fails with an output divergence between the HF reference and vLLM:

AssertionError: Test1:

Matched tokens (divergence point is after "a large"): [785, 1156, 2168, 4933, 264, 2936, 1841, 448, 264, 2518, 4004, 323, 4158, 11931, 11, 9099, 304, 4065, 315, 264, 2518, 4752, 448, 264, 3460]

hf: 'The first image shows a stop sign ... placed in front of a red building with a large window. ... The second image features a beautiful view of a cherry blossom tree ...

vllm: 'The first image shows a stop sign ... placed in front of a red building with a large archway. ... The second image features a beautiful cherry blossom tree ...

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Code Example

tests/models/multimodal/generation/test_common.py::test_multi_image_models[qwen2_vl-test_case43]

RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 + triton 3.7.0, vLLM's test_multi_image_models[qwen2_vl-test_case43] fails with an output divergence between the HF reference and vLLM:

AssertionError: Test1:

Matched tokens (divergence point is after "a large"): [785, 1156, 2168, 4933, 264, 2936, 1841, 448, 264, 2518, 4004, 323, 4158, 11931, 11, 9099, 304, 4065, 315, 264, 2518, 4752, 448, 264, 3460]

hf: 'The first image shows a stop sign ... placed in front of a red building with a large window. ... The second image features a beautiful view of a cherry blossom tree ...

vllm: 'The first image shows a stop sign ... placed in front of a red building with a large archway. ... The second image features a beautiful cherry blossom tree ...

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

torch: 2.12.0+cu130 (test channel)
triton: 3.7.0
CUDA: 13.0 / Driver: 570.133.20
Python: 3.12.13
GPU: NVIDIA L4 (23 GiB)
Model: Qwen/Qwen2-VL-2B-Instruct

Reproduction

Failing test ID:

tests/models/multimodal/generation/test_common.py::test_multi_image_models[qwen2_vl-test_case43]

Test config: multi-image (stop_sign + cherry_blossom assets), size factors (0.25, 0.2, 0.15), max_tokens=128, num_logprobs=5, eager distributed.

Reproducibility on torch 2.12 branch (3/3 days)

Passes on same-day main builds (torch 2.11):

2026-04-21 nightly: https://buildkite.com/vllm/ci/builds/62254
2026-04-21 daily: https://buildkite.com/vllm/ci/builds/62383
2026-04-22 nightly: https://buildkite.com/vllm/ci/builds/62456

Diagnosis request

Output tokens diverge after a deterministic shared prefix across three runs — consistent with a kernel-level numerical change (matmul / softmax / attention / rotary), plausibly from the triton 3.7 bundled in torch 2.12. Is this an intentional numerical behavior change or a regression?

extent analysis

TL;DR

Downgrade to torch 2.11 to resolve the output divergence issue between the HF reference and vLLM.

Guidance

Verify the issue is specific to torch 2.12 by running the test on different torch versions to confirm the problem is not present in torch 2.11.
Investigate the triton 3.7 update in torch 2.12 as a potential cause of the numerical behavior change, focusing on matmul, softmax, attention, or rotary operations.
Review the vLLM PR (vllm-project/vllm#40077) and the umbrella issue (pytorch/pytorch#180899) for any related discussions or fixes.
Consider testing with a different CUDA version or driver to rule out any GPU-specific issues.

Notes

The issue seems to be related to the update in torch 2.12, specifically with the triton 3.7 update. However, without further investigation, it's unclear if this is an intentional numerical behavior change or a regression.

Recommendation

Apply the workaround by downgrading to torch 2.11, as the issue is not present in this version, allowing for continued development and testing while the root cause is further investigated.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#training loop #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][multimodal] Qwen2-VL multi-image generation diverges from HF reference in test_case43 [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Environment

Reproduction

Reproducibility on torch 2.12 branch (3/3 days)

Diagnosis request

Links

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][multimodal] Qwen2-VL multi-image generation diverges from HF reference in test_case43 [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Environment

Reproduction

Reproducibility on torch 2.12 branch (3/3 days)

Diagnosis request

Links

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING