pytorch - 💡(How to fix) Fix capture_dynamic_output_shape_ops=True crashes 8/50 HuggingFace models with GuardOnDataDependentSymNode [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180596Fetched 2026-04-17 08:26:06
View on GitHub
Comments
0
Participants
1
Timeline
21
Reactions
0
Author
Participants
Assignees
Timeline (top)
mentioned ×7subscribed ×7labeled ×5assigned ×1

Setting torch._dynamo.config.capture_dynamic_output_shape_ops = True causes GuardOnDataDependentSymNode crashes on 17% of tested models (8/50), while providing measurable benefit (fewer graph breaks) on only 4% (2/50). All crashes occur in vision-language models with data-dependent control flow.

Error Message

| Create error (model setup) | 1 | — | All crashes produce the same error: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0) | Model | Baseline Graphs | Error | 6 of the 8 CDO-crashing models also crash with capture_scalar_outputs (different error: PendingUnbackedSymbolNotFound, see pytorch/pytorch#180595). 2 models crash only with CDO:

  • All 8 crashes share the exact same error pattern (Eq(u0, 0) guard), suggesting a single fix could resolve all cases
  • CDO and CSO crash independently with different error types — these are separate code paths

Root Cause

The crash-to-benefit ratio is 4:1 (8 crashes vs 2 benefits), worse than capture_scalar_outputs. However, when CDO does work, the benefit is much larger — reducing graph breaks by 6-7 per model vs 1-2 for CSO.

Key observations:

  • All 8 crashes share the exact same error pattern (Eq(u0, 0) guard), suggesting a single fix could resolve all cases
  • All crashes are in VL/multimodal models — decoder-only, encoder-decoder, and vision models are unaffected
  • The 2 models where CDO helps (Phi4MultimodalAudio, Aria) are also multimodal, but with simpler dynamic shape patterns
  • CDO and CSO crash independently with different error types — these are separate code paths

Recommendation: If the Eq(u0, 0) guard issue is resolved, this flag could eliminate significant graph breaks in multimodal models. The benefit magnitude (6-7 fewer breaks when it works) justifies fixing the crashes.

Code Example

import torch
import torch._dynamo

torch._dynamo.config.capture_dynamic_output_shape_ops = True

# Example: Qwen3VLForConditionalGeneration
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True)
model = model.eval().cuda()

compiled = torch.compile(model, backend="eager")
with torch.no_grad():
    compiled(**inputs)  # Crashes with GuardOnDataDependentSymNode
RAW_BUFFERClick to expand / collapse

Summary

Setting torch._dynamo.config.capture_dynamic_output_shape_ops = True causes GuardOnDataDependentSymNode crashes on 17% of tested models (8/50), while providing measurable benefit (fewer graph breaks) on only 4% (2/50). All crashes occur in vision-language models with data-dependent control flow.

Environment

  • PyTorch: 2.12.0.dev20260408+cu128 (nightly)
  • transformers: 5.5.0
  • GPU: NVIDIA PG509-210
  • CUDA: 12.8

Methodology

We tested a stratified random sample of 50 models drawn from 226 HuggingFace models that have graph breaks under torch.compile. The sample was stratified across 4 architecture types to ensure representative coverage:

StratumSample SizeFrom Population
VL/Multimodal1550
Decoder/LLM1533
Encoder-Decoder1058
Vision/Other1085

Each model was compiled 3 times using a lightweight counting backend:

  1. Baseline: no flags (default config)
  2. +capture_scalar_outputs: (tested separately, see pytorch/pytorch#180595)
  3. +capture_dynamic_output_shape_ops: torch._dynamo.config.capture_dynamic_output_shape_ops = True

We measured subgraph count per run and classified each flag's effect as: reduces_graphs, no_effect, increases_graphs, crashes, fixes_crash, both_crash, or create_error.

Why only graph_break models? Models that already pass fullgraph=True have no graph breaks, and fullgraph=True implicitly captures dynamic output shape ops. These flags only matter for models that graph-break.

Results

EffectCount% of Testable*
No effect3879%
Crashes817%
Reduces graphs24%
Both crash (baseline + flag)1
Create error (model setup)1

*Testable = excluding create_error and both_crash (48 models)

Models where capture_dynamic_output_shape_ops helps (2 models)

ModelBaseline GraphsWith FlagReduction
Phi4MultimodalAudioModel71-6
AriaForConditionalGeneration2922-7

When it works, the benefit is substantial — reducing graph breaks by 6-7 per model.

Models where capture_dynamic_output_shape_ops crashes (8 models)

All crashes produce the same error: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0)

ModelBaseline GraphsError
Ernie4_5_VLMoeForConditionalGeneration18GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0)
Ernie4_5_VL_MoeForConditionalGeneration18same
Glm46VForConditionalGeneration24same
Glm4vForConditionalGeneration24same
GlmOcrVisionModel9same
Qwen3VLForConditionalGeneration26same
VideoLlama3VisionModel8same
Qwen3OmniMoeThinkerForConditionalGeneration25same

Pattern: All 8 models are VL/multimodal models. The consistent Eq(u0, 0) guard failure suggests these models have conditional branches on dynamic shape dimensions that the tracer cannot resolve.

Overlap with capture_scalar_outputs crashes

6 of the 8 CDO-crashing models also crash with capture_scalar_outputs (different error: PendingUnbackedSymbolNotFound, see pytorch/pytorch#180595). 2 models crash only with CDO:

  • Qwen3VLForConditionalGeneration (CDO-only crash)
  • Qwen3OmniMoeThinkerForConditionalGeneration (CDO-only crash; CSO increases graphs by 1)

Reproduction

import torch
import torch._dynamo

torch._dynamo.config.capture_dynamic_output_shape_ops = True

# Example: Qwen3VLForConditionalGeneration
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True)
model = model.eval().cuda()

compiled = torch.compile(model, backend="eager")
with torch.no_grad():
    compiled(**inputs)  # Crashes with GuardOnDataDependentSymNode

Analysis

The crash-to-benefit ratio is 4:1 (8 crashes vs 2 benefits), worse than capture_scalar_outputs. However, when CDO does work, the benefit is much larger — reducing graph breaks by 6-7 per model vs 1-2 for CSO.

Key observations:

  • All 8 crashes share the exact same error pattern (Eq(u0, 0) guard), suggesting a single fix could resolve all cases
  • All crashes are in VL/multimodal models — decoder-only, encoder-decoder, and vision models are unaffected
  • The 2 models where CDO helps (Phi4MultimodalAudio, Aria) are also multimodal, but with simpler dynamic shape patterns
  • CDO and CSO crash independently with different error types — these are separate code paths

Recommendation: If the Eq(u0, 0) guard issue is resolved, this flag could eliminate significant graph breaks in multimodal models. The benefit magnitude (6-7 fewer breaks when it works) justifies fixing the crashes.

Data

Results from the OSS Model Graph Break Corpus — a systematic catalog of torch.compile graph breaks across 716 HuggingFace models.

cc @chauhang @ezyang @bobrenjc93 @aditvenk @laithsakka @williamwen42 @jansel

extent analysis

TL;DR

Setting torch._dynamo.config.capture_dynamic_output_shape_ops = True may cause crashes in vision-language models, but resolving the Eq(u0, 0) guard issue could significantly reduce graph breaks.

Guidance

  • Identify the specific models that crash with capture_dynamic_output_shape_ops and verify if they are vision-language models with data-dependent control flow.
  • Investigate the Eq(u0, 0) guard failure and potential fixes, as all 8 crashes share the same error pattern.
  • Consider testing capture_dynamic_output_shape_ops with a smaller set of models to reproduce the issue and validate potential fixes.
  • Review the OSS Model Graph Break Corpus for similar issues and potential solutions.

Example

import torch
import torch._dynamo

# Set up a test model
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True)
model = model.eval().cuda()

# Test with capture_dynamic_output_shape_ops
torch._dynamo.config.capture_dynamic_output_shape_ops = True
compiled = torch.compile(model, backend="eager")
with torch.no_grad():
    try:
        compiled(**inputs)  # May crash with GuardOnDataDependentSymNode
    except Exception as e:
        print(f"Error: {e}")

Notes

The provided information suggests that resolving the Eq(u0, 0) guard issue could significantly reduce graph breaks in multimodal models. However, the crash-to-benefit ratio is currently 4:1, indicating that more work is needed to make capture_dynamic_output_shape_ops reliable.

Recommendation

Apply a workaround to avoid using capture_dynamic_output_shape_ops for vision-language models until the Eq(u0, 0) guard issue is resolved, as the current crash rate outweighs the potential benefits.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING