pytorch - 💡(How to fix) Fix [vllm] [2.12 regression] Qwen2-VL vision-tower-only LoRA generation diverges from golden output [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181409Fetched 2026-04-25 06:02:36
View on GitHub
Comments
0
Participants
1
Timeline
224
Reactions
0
Author
Participants
Timeline (top)
mentioned ×108subscribed ×108labeled ×7cross-referenced ×1

Under torch 2.12.0 + triton 3.7.0, vLLM's test_qwen2vl_multiple_lora_types starts failing on the third parametrized path (vision-tower-only LoRA, no connector, lora_id=5/6). The first two LoRA paths (language-only, tower+connector) in the same test pass; only the vision-tower-only adapter diverges:

AssertionError: Generated text "A view of the Tokyo" doesn't match expected pattern "A closeup shot of the Tokyo Skytree with pink flowers in the foreground."

Passes on torch 2.11 and on the same torch-2.12 branch through 2026-04-22 (builds 62138/62232/62495/62583); newly failing on 2026-04-24 (build 62848). Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Root Cause

Under torch 2.12.0 + triton 3.7.0, vLLM's test_qwen2vl_multiple_lora_types starts failing on the third parametrized path (vision-tower-only LoRA, no connector, lora_id=5/6). The first two LoRA paths (language-only, tower+connector) in the same test pass; only the vision-tower-only adapter diverges:

AssertionError: Generated text "A view of the Tokyo" doesn't match expected pattern "A closeup shot of the Tokyo Skytree with pink flowers in the foreground."

Passes on torch 2.11 and on the same torch-2.12 branch through 2026-04-22 (builds 62138/62232/62495/62583); newly failing on 2026-04-24 (build 62848). Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Code Example

tests/lora/test_qwenvl.py::test_qwen2vl_multiple_lora_types

---

# Test 3: Vision tower only LoRA adapter (no connector)
tester.config.lora_path = qwen2vl_vision_tower_lora_files
for lora_id in [5, 6]:
    tester.run_test(
        TEST_IMAGES,
        expected_outputs=EXPECTED_OUTPUTS_VISION_NO_CONNECTOR,
        lora_id=lora_id,
        lora_name="vision_tower_only",
    )

---

for generated, expected in zip(generated_texts, expected_outputs):
    assert expected.startswith(generated), (
        f"Generated text {generated} doesn't match expected pattern {expected}"
    )
RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0 + triton 3.7.0, vLLM's test_qwen2vl_multiple_lora_types starts failing on the third parametrized path (vision-tower-only LoRA, no connector, lora_id=5/6). The first two LoRA paths (language-only, tower+connector) in the same test pass; only the vision-tower-only adapter diverges:

AssertionError: Generated text "A view of the Tokyo" doesn't match expected pattern "A closeup shot of the Tokyo Skytree with pink flowers in the foreground."

Passes on torch 2.11 and on the same torch-2.12 branch through 2026-04-22 (builds 62138/62232/62495/62583); newly failing on 2026-04-24 (build 62848). Blocking the torch 2.12 upgrade for vLLM (vllm-project/vllm#40077).

Environment

  • torch: 2.12.0+cu130 (test channel)
  • triton: 3.7.0
  • CUDA: 13.0 / Driver: 570.133.20
  • Python: 3.12.13
  • Base model: Qwen2-VL (QWEN2VL_MODEL_PATH)
  • LoRA adapter: prashanth058/qwen2vl-flickr-lora-tower (vision tower only, no connector)

Reproduction

Failing test:

tests/lora/test_qwenvl.py::test_qwen2vl_multiple_lora_types

The test runs three LoRA configurations on the same LLM instance; the failure hits on the third:

# Test 3: Vision tower only LoRA adapter (no connector)
tester.config.lora_path = qwen2vl_vision_tower_lora_files
for lora_id in [5, 6]:
    tester.run_test(
        TEST_IMAGES,
        expected_outputs=EXPECTED_OUTPUTS_VISION_NO_CONNECTOR,
        lora_id=lora_id,
        lora_name="vision_tower_only",
    )

Assertion:

for generated, expected in zip(generated_texts, expected_outputs):
    assert expected.startswith(generated), (
        f"Generated text {generated} doesn't match expected pattern {expected}"
    )

Observed:

  • Generated: "A view of the Tokyo"
  • Expected: "A closeup shot of the Tokyo Skytree with pink flowers in the foreground."

(The assertion uses expected.startswith(generated), so vLLM output must be a prefix of the golden; here it diverges at token ~3.)

Reproducibility on torch 2.12 branch

BuildDateLoRA 2
621382026-04-20passed
622322026-04-21passed
624952026-04-22passed
625832026-04-22passed
628482026-04-24failedhttps://buildkite.com/vllm/ci/builds/62848#019dbf56-e7ea-4cd5-bab6-dcbb4fb4da0e

Passes on same-day main build (torch 2.11):

Relationship to other umbrella issues

Qwen2-VL-family regressions are also tracked as:

  • pytorch/pytorch#181168 — base Qwen2-VL multi-image output divergence (different test, different assertion; small mid-sentence word swap)
  • pytorch/pytorch#181249 — Qwen2.5-Math-PRM-7B reward-logits divergence

This LoRA case is distinct: it only triggers on the vision-tower LoRA adapter code path, not the base inference path, and the divergence is much larger (different sentence start, not a one-word swap).

Diagnosis request

Only the vision-tower-only LoRA fails; language-only and tower+connector LoRA paths pass in the same test on the same run. That narrows the suspect region to the vision-tower LoRA forward path under torch 2.12 / triton 3.7 (possibly a matmul or attention kernel interacting with LoRA's low-rank update). The regression is new between 2026-04-22 and 2026-04-24 — either a torch/triton test-wheel rebuild or a vLLM rebase could be the trigger; bisecting against vLLM commits on release_212_tests should be quick since the base PR is unchanged.

Links

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

The most likely fix is to investigate and potentially revert changes introduced between torch builds 62583 and 62848, focusing on the vision-tower LoRA adapter code path.

Guidance

  1. Bisect commits: Perform a binary search on the commits between the last passing build (62583) and the first failing build (62848) to identify the specific commit causing the regression.
  2. Investigate matmul or attention kernel changes: Examine any recent changes to matrix multiplication or attention mechanisms in the torch or triton libraries, as these may interact with LoRA's low-rank update.
  3. Verify LoRA adapter code: Review the vision-tower LoRA adapter code for any potential issues or inconsistencies that could cause the divergence.
  4. Test with previous torch version: Run the test with torch 2.11 to confirm that the issue is indeed specific to torch 2.12.

Example

No code snippet is provided as the issue is more related to investigating and debugging rather than applying a specific code fix.

Notes

The exact cause of the issue is still unknown, and further investigation is required to determine the root cause. The provided information suggests that the issue is specific to the vision-tower LoRA adapter code path and torch 2.12.

Recommendation

Apply a workaround by reverting to a previous torch version (e.g., 2.11) until the issue is resolved, as the problem seems to be introduced in the newer version of torch.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression] Qwen2-VL vision-tower-only LoRA generation diverges from golden output [1 participants]