pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][multimodal] Qwen2-VL multi-image generation diverges from HF reference in test_case43 [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181168Fetched 2026-04-23 07:22:16
View on GitHub
Comments
1
Participants
2
Timeline
122
Reactions
0
Author
Participants
Assignees
Timeline (top)
mentioned ×55subscribed ×55labeled ×7unlabeled ×2

Under torch 2.12.0 + triton 3.7.0, vLLM's test_multi_image_models[qwen2_vl-test_case43] fails with an output divergence between the HF reference and vLLM:

AssertionError: Test1:

Matched tokens (divergence point is after "a large"): [785, 1156, 2168, 4933, 264, 2936, 1841, 448, 264, 2518, 4004, 323, 4158, 11931, 11, 9099, 304, 4065, 315, 264, 2518, 4752, 448, 264, 3460]

hf: 'The first image shows a stop sign ... placed in front of a red building with a large window. ... The second image features a beautiful view of a cherry blossom tree ...

vllm: 'The first image shows a stop sign ... placed in front of a red building with a large archway. ... The second image features a beautiful cherry blossom tree ...

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Root Cause

Under torch 2.12.0 + triton 3.7.0, vLLM's test_multi_image_models[qwen2_vl-test_case43] fails with an output divergence between the HF reference and vLLM:

AssertionError: Test1:

Matched tokens (divergence point is after "a large"): [785, 1156, 2168, 4933, 264, 2936, 1841, 448, 264, 2518, 4004, 323, 4158, 11931, 11, 9099, 304, 4065, 315, 264, 2518, 4752, 448, 264, 3460]

hf: 'The first image shows a stop sign ... placed in front of a red building with a large window. ... The second image features a beautiful view of a cherry blossom tree ...

vllm: 'The first image shows a stop sign ... placed in front of a red building with a large archway. ... The second image features a beautiful cherry blossom tree ...

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Code Example

tests/models/multimodal/generation/test_common.py::test_multi_image_models[qwen2_vl-test_case43]
RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 + triton 3.7.0, vLLM's test_multi_image_models[qwen2_vl-test_case43] fails with an output divergence between the HF reference and vLLM:

AssertionError: Test1:

Matched tokens (divergence point is after "a large"): [785, 1156, 2168, 4933, 264, 2936, 1841, 448, 264, 2518, 4004, 323, 4158, 11931, 11, 9099, 304, 4065, 315, 264, 2518, 4752, 448, 264, 3460]

hf: 'The first image shows a stop sign ... placed in front of a red building with a large window. ... The second image features a beautiful view of a cherry blossom tree ...

vllm: 'The first image shows a stop sign ... placed in front of a red building with a large archway. ... The second image features a beautiful cherry blossom tree ...

Passes on torch 2.11. Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0 / Driver: 570.133.20
  • Python: 3.12.13
  • GPU: NVIDIA L4 (23 GiB)
  • Model: Qwen/Qwen2-VL-2B-Instruct

Reproduction

Failing test ID:

tests/models/multimodal/generation/test_common.py::test_multi_image_models[qwen2_vl-test_case43]

Test config: multi-image (stop_sign + cherry_blossom assets), size factors (0.25, 0.2, 0.15), max_tokens=128, num_logprobs=5, eager distributed.

Reproducibility on torch 2.12 branch (3/3 days)

Passes on same-day main builds (torch 2.11):

Diagnosis request

Output tokens diverge after a deterministic shared prefix across three runs — consistent with a kernel-level numerical change (matmul / softmax / attention / rotary), plausibly from the triton 3.7 bundled in torch 2.12. Is this an intentional numerical behavior change or a regression?

Links

  • vLLM PR: vllm-project/vllm#40077
  • Umbrella: pytorch/pytorch#180899

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

Downgrade to torch 2.11 to resolve the output divergence issue between the HF reference and vLLM.

Guidance

  • Verify the issue is specific to torch 2.12 by running the test on different torch versions to confirm the problem is not present in torch 2.11.
  • Investigate the triton 3.7 update in torch 2.12 as a potential cause of the numerical behavior change, focusing on matmul, softmax, attention, or rotary operations.
  • Review the vLLM PR (vllm-project/vllm#40077) and the umbrella issue (pytorch/pytorch#180899) for any related discussions or fixes.
  • Consider testing with a different CUDA version or driver to rule out any GPU-specific issues.

Notes

The issue seems to be related to the update in torch 2.12, specifically with the triton 3.7 update. However, without further investigation, it's unclear if this is an intentional numerical behavior change or a regression.

Recommendation

Apply the workaround by downgrading to torch 2.11, as the issue is not present in this version, allowing for continued development and testing while the root cause is further investigated.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][multimodal] Qwen2-VL multi-image generation diverges from HF reference in test_case43 [1 comments, 2 participants]