pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][Dynamo] test_dynamic_shapes_compilation: assert 'no' == 'yes' across backed/unbacked modes

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Under torch 2.12.0, vLLM's test_dynamic_shapes_compilation asserts 'no' == 'yes' across many parametrizations (gpt2, Qwen2-7B, Qwen3-4B) and symbolic-shape modes (backed, unbacked, backed_size_oblivious). 12 tests end with this assertion; the other 10 end in EngineCore init failure (some of those were confounded by GPU OOM in the same CI job — a clean rerun is planned; filing now to track the real-assertion subset).

Root Cause

A portion of this job's failures were caused by pre-existing GPU pressure on the shared runner (repeated ValueError: Free memory on device cuda:0 (1.3/22 GiB) on startup is less than desired GPU memory utilization). Those engine-core-init failures will be filtered out on rerun; the 12 assertion-style failures listed above progressed past engine-init and are almost certainly real.

RAW_BUFFERClick to expand / collapse

Summary

Under torch 2.12.0, vLLM's test_dynamic_shapes_compilation asserts 'no' == 'yes' across many parametrizations (gpt2, Qwen2-7B, Qwen3-4B) and symbolic-shape modes (backed, unbacked, backed_size_oblivious). 12 tests end with this assertion; the other 10 end in EngineCore init failure (some of those were confounded by GPU OOM in the same CI job — a clean rerun is planned; filing now to track the real-assertion subset).

Environment

  • torch: 2.12.0+cu130
  • triton: 3.7.0
  • CUDA: 13.0
  • Python: 3.12
  • GPU: 1× H100 (shared, partially OOM at run time — rerun pending)

Failed test parametrizations (assert 'no' == 'yes')

12 of 22, e.g.:

  • test_dynamic_shapes_compilation[False-True-0-backed-gpt2]
  • test_dynamic_shapes_compilation[False-True-0-backed_size_oblivious-gpt2]
  • test_dynamic_shapes_compilation[False-True-1-backed-gpt2]
  • test_dynamic_shapes_compilation[False-True-1-backed-Qwen/Qwen3-4B-Instruct-2507]
  • test_dynamic_shapes_compilation[False-True-1-backed_size_oblivious-gpt2]
  • test_dynamic_shapes_compilation[False-True-1-backed_size_oblivious-Qwen/Qwen3-4B-Instruct-2507]
  • test_dynamic_shapes_compilation[False-False-0-unbacked-gpt2]
  • test_dynamic_shapes_compilation[False-False-0-backed_size_oblivious-gpt2]
  • test_dynamic_shapes_compilation[False-False-1-backed-gpt2]
  • test_dynamic_shapes_compilation[False-False-1-unbacked-gpt2]
  • test_dynamic_shapes_compilation[False-False-1-backed_size_oblivious-gpt2]
  • test_dynamic_shapes_compilation[False-False-1-backed_size_oblivious-Qwen/Qwen3-4B-Instruct-2507]

The assertion compares the is_compiled (or similar) string status — vLLM expected 'yes' but dynamo/inductor on 2.12 returns 'no', suggesting some graph that previously compiled is now breaking out to eager.

Question / diagnosis

Under torch 2.12, is there a new guard / cond / data-dependent branch that now causes test_dynamic_shapes_compilation graphs to fall back to eager (hence 'no')? Particularly in backed_size_oblivious mode, which affects all three models.

Caveat

A portion of this job's failures were caused by pre-existing GPU pressure on the shared runner (repeated ValueError: Free memory on device cuda:0 (1.3/22 GiB) on startup is less than desired GPU memory utilization). Those engine-core-init failures will be filtered out on rerun; the 12 assertion-style failures listed above progressed past engine-init and are almost certainly real.

Links

cc @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo @azahed98

extent analysis

TL;DR

The most likely fix involves investigating and addressing the changes in PyTorch 2.12 that cause graphs to fall back to eager execution, particularly in backed_size_oblivious mode.

Guidance

  1. Review PyTorch 2.12 release notes: Check for any new guards, conditions, or data-dependent branches that might cause graphs to fall back to eager execution.
  2. Investigate backed_size_oblivious mode: Focus on why this mode is affected across all three models (gpt2, Qwen2-7B, Qwen3-4B) and how it interacts with the changes in PyTorch 2.12.
  3. Compare graph compilation: Analyze the differences in graph compilation between PyTorch 2.11 and 2.12 to identify what causes the fallback to eager execution.
  4. Test with reduced GPU pressure: Ensure that the issue persists when GPU pressure is minimized to rule out any environmental factors.
  5. Check vLLM's expectations: Verify that vLLM's expectations for is_compiled status are correct and align with the behavior changes in PyTorch 2.12.

Example

No specific code snippet can be provided without further details on the implementation, but reviewing the test_dynamic_shapes_compilation test cases and the backed_size_oblivious mode implementation would be a good starting point.

Notes

The investigation should consider the potential impact of CUDA 13.0 and PyTorch 2.12 interactions, as well as any other environmental factors that might influence the behavior.

Recommendation

Apply a workaround by identifying and addressing the specific changes in PyTorch 2.12 that cause the graphs to fall back to eager execution, particularly in backed_size_oblivious mode, as this seems to be the root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [vllm] [2.12 regression][Dynamo] test_dynamic_shapes_compilation: assert 'no' == 'yes' across backed/unbacked modes