pytorch - ✅(Solved) Fix TorchDynamo Produces Incorrect Output Shape When Model Uses Runtime Shape-Based Dictionary Lookups [3 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#176862Fetched 2026-04-08 00:24:05
View on GitHub
Comments
2
Participants
3
Timeline
94
Reactions
0
Timeline (top)
mentioned ×38subscribed ×38labeled ×6referenced ×5

Fix Action

Fix / Workaround

Vulnerability Reg file data sampling: Mitigation; Clear Register File

Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl

Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization

PR fix notes

PR #176982: [dynamo] Fix incorrect output with torch.Size dict keys containing TensorVariable

Description (problem / solution / changelog)

Summary

Fixes #176862

Problem

When a model uses torch.tensor(x.shape[1:]) to build torch.Size keys for dictionary lookups, torch.compile produces an incorrect output and the compiled model returns the wrong shape.

Eager: [4, 32] to [4, 64] Compiled: [4, 32] to [4, 32] layer skipped

Root Cause

During Dynamo tracing, torch.tensor(x.shape[1:]) creates a TensorVariable and indexing it with [0] produces another TensorVariable. When torch.Size([TensorVariable, ConstantVariable(64)]) is used as a dict key in __contains__, the _HashableTracker comparison fails to match against the concrete torch.Size([32, 64]) keys and the if branch is never taken, the nn.Linear is skipped which makes the output wrong.

Fix

In ConstDictVariable.call_method for __contains__ (torch/_dynamo/variables/dicts.py), when a SizeVariable lookup fails and it contains TensorVariable elements, resolve each element to its concrete value with the FX node's example_value metadata, rebuild as a constant SizeVariable, and retry the lookup.

Test

Reproduction script confirms fix:

# Before fix:
Eager output shape:    torch.Size([4, 64])
Compiled output shape: torch.Size([4, 32])  # Wrong

# After fix:
Eager output shape:    torch.Size([4, 64])
Compiled output shape: torch.Size([4, 64])  # Correct

Tested edge cases:

  • Multiple input feature sizes (32, 64, 128) all match
  • Correctly skips layer in both eager and compiled
  • torch.allclose passes between eager and compiled

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo @ezyang @jansel

Changed files

  • test/dynamo/test_dicts.py (modified, +54/-0)
  • torch/_dynamo/variables/dicts.py (modified, +63/-9)

PR #177312: [dynamo] Fix torch.Size dict key lookup with TensorVariable elements

Description (problem / solution / changelog)

Summary

Fix #176862

Root cause problem

torch.tensor(x.shape[1:]) produces TensorVariable elements inside a traced SizeVariable. ConstDictVariable._HashableTracker then hashes and compares that runtime torch.Size against concrete dictionary keys using variable-tracker equality, so shape_key in self.layer_configs misses and the compiled branch is silently skipped.

Proposed fix

Teach _HashableTracker to recover a constant torch.Size when a SizeVariable is composed of Python constants or scalar TensorVariables backed by FX example_value metadata. Use that recovered key for hashing and equality, keep ConstDictVariable.__contains__ on the shared is_hashable() path, and add a regression test that covers both matching and non-matching dynamic shape lookups.

Why this is the right long term fix

The bug lives in Dynamo's dictionary-key abstraction, not in a single model pattern. Normalizing traced torch.Size keys inside _HashableTracker keeps dict lookup semantics aligned with eager behavior anywhere these keys flow through ConstDictVariable, while the regression test guards the behavior going forward.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

Changed files

  • test/dynamo/test_dicts.py (modified, +108/-0)
  • torch/_dynamo/variables/dicts.py (modified, +135/-11)

PR #177313: [dynamo] Fix torch.Size dict lookups with tensor-backed keys

Description (problem / solution / changelog)

Summary

Fix #176862

Root Cause

ConstDictVariable._HashableTracker hashed and compared torch.Size keys using the raw TensorVariable elements inside them. When a lookup key was built from scalar tensor metadata, for example torch.tensor(x.shape[1:])[0], it no longer matched the concrete torch.Size keys stored in the dictionary, so Dynamo incorrectly skipped the branch.

The follow-up CI issue on this PR was that the new canonicalization path also tried to call .item() on any scalar tensor example_value. For non-constant fake tensors that is not safe: it can raise or over-specialize a data-backed lookup.

Proposed Fix

Canonicalize torch.Size keys inside _HashableTracker only when tensor-backed elements are proven concrete via scalar example_value.constant metadata. If the tensor element is not actually constant, return _MISSING and preserve the existing fallback behavior. Add a regression test that exercises a data-backed scalar tensor key across multiple inputs.

Why this is the right long term fix

The bug is in Dynamo's dict-key canonicalization layer, not in model-specific retry logic. Fixing hashing and equality at _HashableTracker keeps dictionary membership behavior consistent for torch.Size keys, stays narrowly scoped to the supported constant-backed case, and avoids unsafe .item() calls on non-constant fake tensors.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela @jataylo

Drafted via @codex, published after manual review by @bobrenjc93

Changed files

  • benchmarks/dynamo/ci_expected_accuracy/cpu_inductor_torchbench_inference.csv (modified, +3/-3)
  • test/dynamo/test_dicts.py (modified, +40/-0)
  • torch/_dynamo/variables/dicts.py (modified, +60/-4)

Code Example

import torch
import torch.nn as nn
import torch.nn.functional as F
class DynamicShapeModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer_configs = {
            torch.Size([32, 64]): nn.Linear(32, 64),
            torch.Size([64, 128]): nn.Linear(64, 128),
            torch.Size([128, 256]): nn.Linear(128, 256)
        }
        self.activation_functions = {
            torch.Size([32]): nn.ReLU(),
            torch.Size([64]): nn.Tanh(),
            torch.Size([128]): nn.Sigmoid()
        }
    def forward(self, x):
        current_shape = torch.tensor(x.shape[1:])
        shape_key = torch.Size([current_shape[0], 64])
        if shape_key in self.layer_configs:
            x = self.layer_configs[shape_key](x)
        activation_shape = torch.Size([x.shape[1]])
        if activation_shape in self.activation_functions:
            x = self.activation_functions[activation_shape](x)
        return x
def get_default_model():
    return DynamicShapeModel()

def get_sample_inputs():
    batch_size = 4
    features = 32
    x = torch.randn(batch_size, features)
    return (x,)

def main():
    model = get_default_model()
    model.train()
    inputs = get_sample_inputs()
    print(f"input shape: {inputs[0].shape}")
    original_output = model(*inputs)
    print(f"output shape: {original_output.shape}")
    compiled_model = torch.compile(model)
    with torch.no_grad():
        compiled_output = compiled_model(*inputs)
    print(f"compile output shape: {compiled_output.shape}")
if __name__ == '__main__':
    main()

---

input shape: torch.Size([4, 32])
output shape: torch.Size([4, 64])
compile output shape: torch.Size([4, 32])
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Description: When compiling a model that uses torch.Size objects as dictionary keys for dynamic layer selection based on input shapes, torch.compile() produces incorrect output shapes. The compiled model returns a tensor with the wrong shape ([4, 32]) compared to the eager execution output shape ([4, 64]). This is a silent correctness issue. code:

import torch
import torch.nn as nn
import torch.nn.functional as F
class DynamicShapeModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.layer_configs = {
            torch.Size([32, 64]): nn.Linear(32, 64),
            torch.Size([64, 128]): nn.Linear(64, 128),
            torch.Size([128, 256]): nn.Linear(128, 256)
        }
        self.activation_functions = {
            torch.Size([32]): nn.ReLU(),
            torch.Size([64]): nn.Tanh(),
            torch.Size([128]): nn.Sigmoid()
        }
    def forward(self, x):
        current_shape = torch.tensor(x.shape[1:])
        shape_key = torch.Size([current_shape[0], 64])
        if shape_key in self.layer_configs:
            x = self.layer_configs[shape_key](x)
        activation_shape = torch.Size([x.shape[1]])
        if activation_shape in self.activation_functions:
            x = self.activation_functions[activation_shape](x)
        return x
def get_default_model():
    return DynamicShapeModel()

def get_sample_inputs():
    batch_size = 4
    features = 32
    x = torch.randn(batch_size, features)
    return (x,)

def main():
    model = get_default_model()
    model.train()
    inputs = get_sample_inputs()
    print(f"input shape: {inputs[0].shape}")
    original_output = model(*inputs)
    print(f"output shape: {original_output.shape}")
    compiled_model = torch.compile(model)
    with torch.no_grad():
        compiled_output = compiled_model(*inputs)
    print(f"compile output shape: {compiled_output.shape}")
if __name__ == '__main__':
    main()

output:

input shape: torch.Size([4, 32])
output shape: torch.Size([4, 64])
compile output shape: torch.Size([4, 32])

Versions

Environment Information PyTorch Build Details:

PyTorch version: 2.10.0.dev20251124+cpu

Is debug build: False

CUDA used to build PyTorch: Could not collect

ROCM used to build PyTorch: N/A

OS and Compilers:

OS: Ubuntu 24.04.1 LTS (x86_64)

GCC version: (Ubuntu 10.5.0-4ubuntu2) 10.5.0

Clang version: 18.1.3 (1)

CMake version: version 3.28.3

Libc version: glibc-2.39

Python Environment:

Python version: 3.12.3 (main, Nov 6 2025, 13:44:16) [GCC 13.3.0] (64-bit runtime)

Python platform: Linux-6.14.0-36-generic-x86_64-with-glibc2.39

Is CUDA available: False

CUDA runtime version: Could not collect

CUDA_MODULE_LOADING set to: N/A

GPU Information:

GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Laptop GPU

Nvidia driver version: 580.95.05

cuDNN version: Could not collect

Is XPU available: False

HIP runtime version: N/A

MIOpen runtime version: N/A

Is XNNPACK available: True

Caching allocator config: N/A

CPU Information:

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Address sizes: 39 bits physical, 48 bits virtual

Byte Order: Little Endian

CPU(s): 32

On-line CPU(s) list: 0-31

Vendor ID: GenuineIntel

Model name: Intel(R) Core(TM) i9-14900HX

CPU family: 6

Model: 183

Thread(s) per core: 2

Core(s) per socket: 24

Socket(s): 1

Stepping: 1

CPU(s) scaling MHz: 33%

CPU max MHz: 5800.0000

CPU min MHz: 800.0000

BogoMIPS: 4838.40

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdseed adx smap clflushopt clwb intel_pt sha_ni xsaveopt xsavec xgetbv1 xsaves split_lock_detect user_shstk avx_vnni dtherm ida arat pln pts hwp hwp_notify hwp_act_window hwp_epp hwp_pkg_req hfi vnmi umip pku ospke waitpkg gfni vaes vpclmulqdq rdpid movdiri movdir64b fsrm md_clear serialize arch_lbr ibt flush_l1d arch_capabilities

Virtualization: VT-x

L1d cache: 896 KiB (24 instances)

L1i cache: 1.3 MiB (24 instances)

L2 cache: 32 MiB (12 instances)

L3 cache: 36 MiB (1 instance)

NUMA node(s): 1

NUMA node0 CPU(s): 0-31

Vulnerability Gather data sampling: Not affected

Vulnerability Ghostwrite: Not affected

Vulnerability Indirect target selection: Not affected

Vulnerability Itlb multihit: Not affected

Vulnerability L1tf: Not affected

Vulnerability Mds: Not affected

Vulnerability Meltdown: Not affected

Vulnerability Mmio stale data: Not affected

Vulnerability Reg file data sampling: Mitigation; Clear Register File

Vulnerability Retbleed: Not affected

Vulnerability Spec rstack overflow: Not affected

Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl

Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization

Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI BHI_DIS_S

Vulnerability Srbds: Not affected

Vulnerability Tsa: Not affected

Vulnerability Tsx async abort: Not affected

Vulnerability Vmscape: Mitigation; IBPB before exit to userspace

Versions of Relevant Libraries:

[pip3] numpy==2.3.3

[pip3] nvidia-cublas-cu12==12.1.3.1

[pip3] nvidia-cuda-cupti-cu12==12.1.105

[pip3] nvidia-cuda-nvrtc-cu12==12.1.105

[pip3] nvidia-cuda-runtime-cu12==12.1.105

[pip3] nvidia-cudnn-cu12==9.1.0.70

[pip3] nvidia-cufft-cu12==11.0.2.54

[pip3] nvidia-curand-cu12==10.3.2.106

[pip3] nvidia-cusolver-cu12==11.4.5.107

[pip3] nvidia-cusparse-cu12==12.1.0.106

[pip3] nvidia-nccl-cu12==2.21.5

[pip3] nvidia-nvjitlink-cu12==12.9.86

[pip3] nvidia-nvtx-cu12==12.1.105

[pip3] optree==0.18.0

[pip3] pytorch-triton-rocm==3.5.0

[pip3] torch==2.10.0.dev20251124+cpu

[pip3] torchao==0.15.0.dev20251124+cpu

[pip3] torchdata==0.12.0.dev20250909+cpu

[pip3] torchtext==0.17.0.dev20240912+cpu

[pip3] triton==3.1.0

[conda] Could not collect

cc @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @amjames @Lucaskabela @jataylo

extent analysis

Fix Plan

Problem Summary

The issue is caused by incorrect dynamic layer selection based on input shapes when using torch.compile().

Root Cause Analysis

The root cause is likely due to the way torch.compile() handles dynamic layer selection. In this case, the shape_key is not correctly matched with the layer_configs dictionary.

Fix Plan

Step 1: Update shape_key calculation

Update the shape_key calculation to correctly match the layer_configs dictionary.

def forward(self, x):
    current_shape = torch.tensor(x.shape[1:])
    shape_key = torch.Size([current_shape[0], current_shape[1]])
    if shape_key in self.layer_configs:
        x = self.layer_configs[shape_key](x)

Step 2: Update activation_shape calculation

Update the activation_shape calculation to correctly match the activation_functions dictionary.

def forward(self, x):
    current_shape = torch.tensor(x.shape[1:])
    shape_key = torch.Size([current_shape[0], current_shape[1]])
    if shape_key in self.layer_configs:
        x = self.layer_configs[shape_key](x)
    activation_shape = torch.Size([x.shape[1]])
    if activation_shape in self.activation_functions:
        x = self.activation_functions[activation_shape](x)

Step 3: Update main() function

Update the main() function to use the updated DynamicShapeModel class.

def main():
    model = get_default_model()
    model.train()
    inputs = get_sample_inputs()
    print(f"input shape: {inputs[0].shape}")
    original_output = model(*inputs)
    print(f"output shape: {original_output.shape}")
    compiled_model = torch.compile(model)
    with torch.no_grad():
        compiled_output = compiled_model(*inputs)
    print(f"

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix TorchDynamo Produces Incorrect Output Shape When Model Uses Runtime Shape-Based Dictionary Lookups [3 pull requests, 2 comments, 3 participants]