pytorch - 💡(How to fix) Fix capture_dynamic_output_shape_ops=True crashes 8/50 HuggingFace models with GuardOnDataDependentSymNode [1 participants]

penguinwu · 2026-04-16T19:18:53Z

[pytorch] Setting torch. dynamo.config.capture dynamic output shape ops = True causes GuardOnDataDependentSymNode crashes on 17% of tested models 8/50 , while… Setting `torch._dynamo.config.capture_dynamic_output_shape_ops = True` causes `GuardOnDataDependentSymNode` crashes on 17% of tested models (8/50), while providing measurable benefit (fewer graph breaks) on only 4% (2/50). All crashes occur in vision-language models with data-dependent control flow. ## Summary Setting `torch._dynamo.config.capture_dynamic_output_shape_ops = True` causes `GuardOnDataDependentSymNode` crashes on 17% of tested models (8/50), while providing measurable benefit (fewer graph breaks) on only 4% (2/50). All crashes occur in vision-language models with data-dependent control flow. ## Environment - **PyTorch**: 2.12.0.dev20260408+cu128 (nightly) - **transformers**: 5.5.0 - **GPU**: NVIDIA PG509-210 - **CUDA**: 12.8 ## Methodology We tested a **stratified random sample of 50 models** drawn from 226 HuggingFace models that have graph breaks under `torch.compile`. The sample was stratified across 4 architecture types to ensure representative coverage: | Stratum | Sample Size | From Population | |---------|-------------|------------------| | VL/Multimodal | 15 | 50 | | Decoder/LLM | 15 | 33 | | Encoder-Decoder | 10 | 58 | | Vision/Other | 10 | 85 | Each model was compiled 3 times using a lightweight counting backend: 1. **Baseline**: no flags (default config) 2. **+capture_scalar_outputs**: (tested separately, see pytorch/pytorch#180595) 3. **+capture_dynamic_output_shape_ops**: `torch._dynamo.config.capture_dynamic_output_shape_ops = True` We measured subgraph count per run and classified each flag's effect as: `reduces_graphs`, `no_effect`, `increases_graphs`, `crashes`, `fixes_crash`, `both_crash`, or `create_error`. **Why only graph_break models?** Models that already pass `fullgraph=True` have no graph breaks, and `fullgraph=True` implicitly captures dynamic output shape ops. These flags only matter for models that graph-break. ## Results | Effect | Count | % of Testable* | |--------|-------|----------------| | No effect | 38 | 79% | | **Crashes** | **8** | **17%** | | Reduces graphs | 2 | 4% | | Both crash (baseline + flag) | 1 | — | | Create error (model setup) | 1 | — | *\*Testable = excluding create_error and both_crash (48 models)* ### Models where `capture_dynamic_output_shape_ops` helps (2 models) | Model | Baseline Graphs | With Flag | Reduction | |-------|-----------------|-----------|-----------| | Phi4MultimodalAudioModel | 7 | 1 | **-6** | | AriaForConditionalGeneration | 29 | 22 | **-7** | When it works, the benefit is substantial — reducing graph breaks by 6-7 per model. ### Models where `capture_dynamic_output_shape_ops` crashes (8 models) All crashes produce the same error: **`GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0)`** | Model | Baseline Graphs | Error | |-------|-----------------|---------| | Ernie4_5_VLMoeForConditionalGeneration | 18 | `GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0)` | | Ernie4_5_VL_MoeForConditionalGeneration | 18 | same | | Glm46VForConditionalGeneration | 24 | same | | Glm4vForConditionalGeneration | 24 | same | | GlmOcrVisionModel | 9 | same | | Qwen3VLForConditionalGeneration | 26 | same | | VideoLlama3VisionModel | 8 | same | | Qwen3OmniMoeThinkerForConditionalGeneration | 25 | same | **Pattern**: All 8 models are VL/multimodal models. The consistent `Eq(u0, 0)` guard failure suggests these models have conditional branches on dynamic shape dimensions that the tracer cannot resolve. ### Overlap with `capture_scalar_outputs` crashes 6 of the 8 CDO-crashing models also crash with `capture_scalar_outputs` (different error: `PendingUnbackedSymbolNotFound`, see pytorch/pytorch#180595). 2 models crash only with CDO: - **Qwen3VLForConditionalGeneration** (CDO-only crash) - **Qwen3OmniMoeThinkerForConditionalGeneration** (CDO-only crash; CSO increases graphs by 1) ## Reproduction ```python import torch import torch._dynamo torch._dynamo.config.capture_dynamic_output_shape_ops = True # Example: Qwen3VLForConditionalGeneration from transformers import AutoModelForConditionalGeneration model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True) model = model.eval().cuda() compiled = torch.compile(model, backend="eager") with torch.no_grad(): compiled(**inputs) # Crashes with GuardOnDataDependentSymNode ``` ## Analysis The crash-to-benefit ratio is **4:1** (8 crashes vs 2 benefits), worse than `capture_scalar_outputs`. However, when CDO does work, the benefit is **much larger** — reducing graph breaks by 6-7 per model vs 1-2 for CSO. Key observations: - All 8 crashes share the **exact same error pattern** (`Eq(u0, 0)` guard), suggesting a single fix could resolve all cases - All crashes are in **VL/multimodal** models — decoder-only, encoder-decoder, and vision models are un

Error Message

| Create error (model setup) | 1 | — | All crashes produce the same error: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0) | Model | Baseline Graphs | Error | 6 of the 8 CDO-crashing models also crash with capture_scalar_outputs (different error: PendingUnbackedSymbolNotFound, see pytorch/pytorch#180595). 2 models crash only with CDO:

All 8 crashes share the exact same error pattern (Eq(u0, 0) guard), suggesting a single fix could resolve all cases
CDO and CSO crash independently with different error types — these are separate code paths

Root Cause

The crash-to-benefit ratio is 4:1 (8 crashes vs 2 benefits), worse than capture_scalar_outputs. However, when CDO does work, the benefit is much larger — reducing graph breaks by 6-7 per model vs 1-2 for CSO.

Key observations:

All 8 crashes share the exact same error pattern (Eq(u0, 0) guard), suggesting a single fix could resolve all cases
All crashes are in VL/multimodal models — decoder-only, encoder-decoder, and vision models are unaffected
The 2 models where CDO helps (Phi4MultimodalAudio, Aria) are also multimodal, but with simpler dynamic shape patterns
CDO and CSO crash independently with different error types — these are separate code paths

Recommendation: If the Eq(u0, 0) guard issue is resolved, this flag could eliminate significant graph breaks in multimodal models. The benefit magnitude (6-7 fewer breaks when it works) justifies fixing the crashes.

Code Example

import torch
import torch._dynamo

torch._dynamo.config.capture_dynamic_output_shape_ops = True

# Example: Qwen3VLForConditionalGeneration
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True)
model = model.eval().cuda()

compiled = torch.compile(model, backend="eager")
with torch.no_grad():
    compiled(**inputs)  # Crashes with GuardOnDataDependentSymNode

Summary

Setting torch._dynamo.config.capture_dynamic_output_shape_ops = True causes GuardOnDataDependentSymNode crashes on 17% of tested models (8/50), while providing measurable benefit (fewer graph breaks) on only 4% (2/50). All crashes occur in vision-language models with data-dependent control flow.

Environment

PyTorch: 2.12.0.dev20260408+cu128 (nightly)
transformers: 5.5.0
GPU: NVIDIA PG509-210
CUDA: 12.8

Methodology

We tested a stratified random sample of 50 models drawn from 226 HuggingFace models that have graph breaks under torch.compile. The sample was stratified across 4 architecture types to ensure representative coverage:

Stratum	Sample Size	From Population
VL/Multimodal	15	50
Decoder/LLM	15	33
Encoder-Decoder	10	58
Vision/Other	10	85

Each model was compiled 3 times using a lightweight counting backend:

Baseline: no flags (default config)
+capture_scalar_outputs: (tested separately, see pytorch/pytorch#180595)
+capture_dynamic_output_shape_ops: torch._dynamo.config.capture_dynamic_output_shape_ops = True

We measured subgraph count per run and classified each flag's effect as: reduces_graphs, no_effect, increases_graphs, crashes, fixes_crash, both_crash, or create_error.

Why only graph_break models? Models that already pass fullgraph=True have no graph breaks, and fullgraph=True implicitly captures dynamic output shape ops. These flags only matter for models that graph-break.

Results

Effect	Count	% of Testable*
No effect	38	79%
Crashes	8	17%
Reduces graphs	2	4%
Both crash (baseline + flag)	1	—
Create error (model setup)	1	—

*Testable = excluding create_error and both_crash (48 models)

Models where `capture_dynamic_output_shape_ops` helps (2 models)

Model	Baseline Graphs	With Flag	Reduction
Phi4MultimodalAudioModel	7	1	-6
AriaForConditionalGeneration	29	22	-7

When it works, the benefit is substantial — reducing graph breaks by 6-7 per model.

Models where `capture_dynamic_output_shape_ops` crashes (8 models)

All crashes produce the same error: GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0)

Model	Baseline Graphs	Error
Ernie4_5_VLMoeForConditionalGeneration	18	`GuardOnDataDependentSymNode: Could not guard on data-dependent expression Eq(u0, 0)`
Ernie4_5_VL_MoeForConditionalGeneration	18	same
Glm46VForConditionalGeneration	24	same
Glm4vForConditionalGeneration	24	same
GlmOcrVisionModel	9	same
Qwen3VLForConditionalGeneration	26	same
VideoLlama3VisionModel	8	same
Qwen3OmniMoeThinkerForConditionalGeneration	25	same

Pattern: All 8 models are VL/multimodal models. The consistent Eq(u0, 0) guard failure suggests these models have conditional branches on dynamic shape dimensions that the tracer cannot resolve.

Overlap with `capture_scalar_outputs` crashes

6 of the 8 CDO-crashing models also crash with capture_scalar_outputs (different error: PendingUnbackedSymbolNotFound, see pytorch/pytorch#180595). 2 models crash only with CDO:

Qwen3VLForConditionalGeneration (CDO-only crash)
Qwen3OmniMoeThinkerForConditionalGeneration (CDO-only crash; CSO increases graphs by 1)

Reproduction

import torch
import torch._dynamo

torch._dynamo.config.capture_dynamic_output_shape_ops = True

# Example: Qwen3VLForConditionalGeneration
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True)
model = model.eval().cuda()

compiled = torch.compile(model, backend="eager")
with torch.no_grad():
    compiled(**inputs)  # Crashes with GuardOnDataDependentSymNode

Analysis

Key observations:

All 8 crashes share the exact same error pattern (Eq(u0, 0) guard), suggesting a single fix could resolve all cases
All crashes are in VL/multimodal models — decoder-only, encoder-decoder, and vision models are unaffected
The 2 models where CDO helps (Phi4MultimodalAudio, Aria) are also multimodal, but with simpler dynamic shape patterns
CDO and CSO crash independently with different error types — these are separate code paths

Data

Results from the OSS Model Graph Break Corpus — a systematic catalog of torch.compile graph breaks across 716 HuggingFace models.

cc @chauhang @ezyang @bobrenjc93 @aditvenk @laithsakka @williamwen42 @jansel

extent analysis

TL;DR

Setting torch._dynamo.config.capture_dynamic_output_shape_ops = True may cause crashes in vision-language models, but resolving the Eq(u0, 0) guard issue could significantly reduce graph breaks.

Guidance

Identify the specific models that crash with capture_dynamic_output_shape_ops and verify if they are vision-language models with data-dependent control flow.
Investigate the Eq(u0, 0) guard failure and potential fixes, as all 8 crashes share the same error pattern.
Consider testing capture_dynamic_output_shape_ops with a smaller set of models to reproduce the issue and validate potential fixes.
Review the OSS Model Graph Break Corpus for similar issues and potential solutions.

Example

import torch
import torch._dynamo

# Set up a test model
from transformers import AutoModelForConditionalGeneration
model = AutoModelForConditionalGeneration.from_pretrained("Qwen/Qwen2.5-VL-3B", trust_remote_code=True)
model = model.eval().cuda()

# Test with capture_dynamic_output_shape_ops
torch._dynamo.config.capture_dynamic_output_shape_ops = True
compiled = torch.compile(model, backend="eager")
with torch.no_grad():
    try:
        compiled(**inputs)  # May crash with GuardOnDataDependentSymNode
    except Exception as e:
        print(f"Error: {e}")

Notes

The provided information suggests that resolving the Eq(u0, 0) guard issue could significantly reduce graph breaks in multimodal models. However, the crash-to-benefit ratio is currently 4:1, indicating that more work is needed to make capture_dynamic_output_shape_ops reliable.

Recommendation

Apply a workaround to avoid using capture_dynamic_output_shape_ops for vision-language models until the Eq(u0, 0) guard issue is resolved, as the current crash rate outweighs the potential benefits.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix capture_dynamic_output_shape_ops=True crashes 8/50 HuggingFace models with GuardOnDataDependentSymNode [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Methodology

Results

Models where `capture_dynamic_output_shape_ops` helps (2 models)

Models where `capture_dynamic_output_shape_ops` crashes (8 models)

Overlap with `capture_scalar_outputs` crashes

Reproduction

Analysis

Data

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix capture_dynamic_output_shape_ops=True crashes 8/50 HuggingFace models with GuardOnDataDependentSymNode [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Methodology

Results

Models where capture_dynamic_output_shape_ops helps (2 models)

Models where capture_dynamic_output_shape_ops crashes (8 models)

Overlap with capture_scalar_outputs crashes

Reproduction

Analysis

Data

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Models where `capture_dynamic_output_shape_ops` helps (2 models)

Models where `capture_dynamic_output_shape_ops` crashes (8 models)

Overlap with `capture_scalar_outputs` crashes