pytorch - ✅(Solved) Fix `GuardOnDataDependentSymNode` when `repeat_interleave` is called with an Unbacked 0-D `SymInt` [1 pull requests, 1 participants]

pytorch2026-03-09 21:48:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#176937•Fetched 2026-04-08 00:23:49

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rwkeane

Participants

rwkeane

Timeline (top)

mentioned ×6subscribed ×6labeled ×3referenced ×3

Error Message

$ TORCHDYNAMO_VERBOSE=1 python tools/bugs/12_symint_repeat_interleave_guard_panic.py REPRODUCED (0-D path): GuardOnDataDependentSymNode raised. backend='inductor' raised: GuardOnDataDependentSymNode: Could not guard on data-dependent expression u0 >= 0 (unhinted: u0 >= 0). (Size-like symbols: none)

consider using data-dependent friendly APIs such as guard_or_false, guard_or_true and statically_known_true. Caused by: (_functorch/_aot_autograd/graph_capture_wrappers.py:1375 in functional_call) For more information, run with TORCH_LOGS="dynamic" For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0" If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1 For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing

For C++ stack trace, run with TORCHDYNAMO_EXTENDED_DEBUG_CPP=1

While executing %repeat_interleave : [num_users=1] = call_method[target=repeat_interleave](args = (%zeros, %item), kwargs = {}) Original traceback: File "/home/ryan/src/equicloud/tools/bugs/12_symint_repeat_interleave_guard_panic.py", line 61, in fn_0d return torch.zeros(x.shape[0]).repeat_interleave(target_size)

Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

CONFIRMED (1-D workaround): succeeded, out.shape=torch.Size([12])

Root Cause

Fix Action

Fix / Workaround

The 1-D workaround should succeed.

out_1d = compiled_1d(x, n)
print(f"CONFIRMED (1-D workaround): succeeded, out.shape={out_1d.shape}")

CONFIRMED (1-D workaround): succeeded, out.shape=torch.Size([12])


CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  12
On-line CPU(s) list:                     0-11
Vendor ID:                               AuthenticAMD
Model name:                              AMD Ryzen 5 7600X 6-Core Processor
CPU family:                              25
Model:                                   97
Thread(s) per core:                      2
Core(s) per socket:                      6
Socket(s):                               1
Stepping:                                2
Frequency boost:                         enabled
CPU(s) scaling MHz:                      87%
CPU max MHz:                             5457.1050
CPU min MHz:                             427.3640
BogoMIPS:                                9381.76
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
Virtualization:                          AMD-V
L1d cache:                               192 KiB (6 instances)
L1i cache:                               192 KiB (6 instances)
L2 cache:                                6 MiB (6 instances)
L3 cache:                                32 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-11
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Mitigation; Clear CPU buffers
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

PR fix notes

PR #177305: Fix unbacked scalar repeat_interleave guards

Repository: pytorch/pytorch
Author: bobrenjc93
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/177305

Description (problem / solution / changelog)

Fix #176937

Root cause

The scalar repeat_interleave SymInt overload validates repeats with TORCH_CHECK(repeats >= 0) and validates output_size by forcing (repeats * size).guard_int(). That eager-only path tries to concretize an unbacked 0-D SymInt produced by .item(), so Dynamo/Inductor hits an early guard even though the tensor overload already handles the same constraints symbolically.

Proposed fix

Replace those scalar-overload validations with symbolic checks via TORCH_SYM_CHECK, including a symbolic equality check for output_size. Add an Inductor regression that compiles x.repeat_interleave(repeats.item()) with captured scalar outputs and dynamic output shapes.

Why this is the right long term fix

This fixes the bug in the operator implementation instead of adding a compiler-only workaround, keeps the scalar and tensor overloads consistent, and preserves the same runtime validation semantics for invalid inputs while allowing legitimate unbacked symbolic values to flow through compilation.

Co-authored with Codex, reviewed and published by @bobrenjc93

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Changed files

aten/src/ATen/native/Repeat.cpp (modified, +8/-4)
test/inductor/test_torchinductor_dynamic_shapes.py (modified, +12/-0)
test/inductor/test_unbacked_symints.py (modified, +17/-0)

Code Example

import sys
import torch

torch._dynamo.config.capture_scalar_outputs = True
torch._dynamo.config.capture_dynamic_output_shape_ops = True


def fn_0d(x: torch.Tensor, n: torch.Tensor) -> torch.Tensor:
    """Fails: repeat count comes from a 0-D .item() call."""
    target_size = n.item()
    return torch.zeros(x.shape[0]).repeat_interleave(target_size)


def fn_1d(x: torch.Tensor, n: torch.Tensor) -> torch.Tensor:
    """Works: repeat count is a 1-D tensor — different Inductor lowering."""
    target_size_1d = n.unsqueeze(0)
    return torch.zeros(x.shape[0]).repeat_interleave(target_size_1d)


def main() -> None:
    """Run the repro."""
    compiled_0d = torch.compile(fn_0d, dynamic=True, fullgraph=True)
    compiled_1d = torch.compile(fn_1d, dynamic=True, fullgraph=True)

    x = torch.randn(4)
    n = torch.tensor(3)

    try:
        out = compiled_0d(x, n)
    except torch._dynamo.exc.BackendCompilerFailed as e:
        cause = e.__cause__ or e
        if "GuardOnDataDependentSymNode" in str(cause) or "guard" in str(cause).lower():
            print("REPRODUCED (0-D path): GuardOnDataDependentSymNode raised.")
            print(f"  {e}")
        else:
            raise

    # The 1-D workaround should succeed.
    out_1d = compiled_1d(x, n)
    print(f"CONFIRMED (1-D workaround): succeeded, out.shape={out_1d.shape}")

if __name__ == "__main__":
    main()

---

$ TORCHDYNAMO_VERBOSE=1 python tools/bugs/12_symint_repeat_interleave_guard_panic.py
REPRODUCED (0-D path): GuardOnDataDependentSymNode raised.
  backend='inductor' raised:
GuardOnDataDependentSymNode: Could not guard on data-dependent expression u0 >= 0 (unhinted: u0 >= 0).  (Size-like symbols: none)

consider using data-dependent friendly APIs such as guard_or_false, guard_or_true and statically_known_true.
Caused by: (_functorch/_aot_autograd/graph_capture_wrappers.py:1375 in functional_call)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing

For C++ stack trace, run with TORCHDYNAMO_EXTENDED_DEBUG_CPP=1

While executing %repeat_interleave : [num_users=1] = call_method[target=repeat_interleave](args = (%zeros, %item), kwargs = {})
Original traceback:
  File "/home/ryan/src/equicloud/tools/bugs/12_symint_repeat_interleave_guard_panic.py", line 61, in fn_0d
    return torch.zeros(x.shape[0]).repeat_interleave(target_size)

Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

CONFIRMED (1-D workaround): succeeded, out.shape=torch.Size([12])

---

(env_210) ryan@ryan-dev-box:~/src/env$ curl -OL https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31107  100 31107    0     0  80083      0 --:--:-- --:--:-- --:--:-- 79966
(env_210) ryan@ryan-dev-box:~/src/env$ python3 collect_env.py
Collecting environment information...
PyTorch version: 2.10.0+cu129
Is debug build: False
CUDA used to build PyTorch: 12.9
ROCM used to build PyTorch: N/A

OS: Ubuntu 25.10 (x86_64)
GCC version: (Ubuntu 15.2.0-4ubuntu4) 15.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.42

Python version: 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:16:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.17.0-12-generic-x86_64-with-glibc2.42
Is CUDA available: True
CUDA runtime version: 12.9.86
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5060 Ti
Nvidia driver version: 580.126.09
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  12
On-line CPU(s) list:                     0-11
Vendor ID:                               AuthenticAMD
Model name:                              AMD Ryzen 5 7600X 6-Core Processor
CPU family:                              25
Model:                                   97
Thread(s) per core:                      2
Core(s) per socket:                      6
Socket(s):                               1
Stepping:                                2
Frequency boost:                         enabled
CPU(s) scaling MHz:                      87%
CPU max MHz:                             5457.1050
CPU min MHz:                             427.3640
BogoMIPS:                                9381.76
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
Virtualization:                          AMD-V
L1d cache:                               192 KiB (6 instances)
L1i cache:                               192 KiB (6 instances)
L2 cache:                                6 MiB (6 instances)
L3 cache:                                32 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-11
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Mitigation; Clear CPU buffers
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] torch==2.10.0+cu129
[pip3] torch-geometric==2.7.0
[pip3] torchvision==0.25.0+cu129
[pip3] triton==3.6.0
[conda] numpy                       2.2.6            pypi_0           pypi
[conda] nvidia-cublas-cu12          12.9.1.4         pypi_0           pypi
[conda] nvidia-cuda-cupti-cu12      12.9.79          pypi_0           pypi
[conda] nvidia-cuda-nvrtc-cu12      12.9.86          pypi_0           pypi
[conda] nvidia-cuda-runtime-cu12    12.9.79          pypi_0           pypi
[conda] nvidia-cudnn-cu12           9.10.2.21        pypi_0           pypi
[conda] nvidia-cufft-cu12           11.4.1.4         pypi_0           pypi
[conda] nvidia-curand-cu12          10.3.10.19       pypi_0           pypi
[conda] nvidia-cusolver-cu12        11.7.5.82        pypi_0           pypi
[conda] nvidia-cusparse-cu12        12.5.10.65       pypi_0           pypi
[conda] nvidia-cusparselt-cu12      0.7.1            pypi_0           pypi
[conda] nvidia-nccl-cu12            2.27.5           pypi_0           pypi
[conda] nvidia-nvjitlink-cu12       12.9.86          pypi_0           pypi
[conda] nvidia-nvtx-cu12            12.9.79          pypi_0           pypi
[conda] torch                       2.10.0+cu129     pypi_0           pypi
[conda] torch-geometric             2.7.0            pypi_0           pypi
[conda] torchvision                 0.25.0+cu129     pypi_0           pypi
[conda] triton                      3.6.0            pypi_0           pypi

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Inductor raises GuardOnDataDependentSymNode when repeat_interleave is called with an Unbacked 0-D SymInt, even though the identical value expressed as a 1-D tensor succeeds. GuardOnDataDependentSymNode: Could not guard on data-dependent expression u0 >= 0 (unhinted: u0 >= 0).

import sys
import torch

torch._dynamo.config.capture_scalar_outputs = True
torch._dynamo.config.capture_dynamic_output_shape_ops = True


def fn_0d(x: torch.Tensor, n: torch.Tensor) -> torch.Tensor:
    """Fails: repeat count comes from a 0-D .item() call."""
    target_size = n.item()
    return torch.zeros(x.shape[0]).repeat_interleave(target_size)


def fn_1d(x: torch.Tensor, n: torch.Tensor) -> torch.Tensor:
    """Works: repeat count is a 1-D tensor — different Inductor lowering."""
    target_size_1d = n.unsqueeze(0)
    return torch.zeros(x.shape[0]).repeat_interleave(target_size_1d)


def main() -> None:
    """Run the repro."""
    compiled_0d = torch.compile(fn_0d, dynamic=True, fullgraph=True)
    compiled_1d = torch.compile(fn_1d, dynamic=True, fullgraph=True)

    x = torch.randn(4)
    n = torch.tensor(3)

    try:
        out = compiled_0d(x, n)
    except torch._dynamo.exc.BackendCompilerFailed as e:
        cause = e.__cause__ or e
        if "GuardOnDataDependentSymNode" in str(cause) or "guard" in str(cause).lower():
            print("REPRODUCED (0-D path): GuardOnDataDependentSymNode raised.")
            print(f"  {e}")
        else:
            raise

    # The 1-D workaround should succeed.
    out_1d = compiled_1d(x, n)
    print(f"CONFIRMED (1-D workaround): succeeded, out.shape={out_1d.shape}")

if __name__ == "__main__":
    main()

Error logs

$ TORCHDYNAMO_VERBOSE=1 python tools/bugs/12_symint_repeat_interleave_guard_panic.py
REPRODUCED (0-D path): GuardOnDataDependentSymNode raised.
  backend='inductor' raised:
GuardOnDataDependentSymNode: Could not guard on data-dependent expression u0 >= 0 (unhinted: u0 >= 0).  (Size-like symbols: none)

consider using data-dependent friendly APIs such as guard_or_false, guard_or_true and statically_known_true.
Caused by: (_functorch/_aot_autograd/graph_capture_wrappers.py:1375 in functional_call)
For more information, run with TORCH_LOGS="dynamic"
For extended logs when we create symbols, also add TORCHDYNAMO_EXTENDED_DEBUG_CREATE_SYMBOL="u0"
If you suspect the guard was triggered from C++, add TORCHDYNAMO_EXTENDED_DEBUG_CPP=1
For more debugging help, see https://docs.google.com/document/d/1HSuTTVvYH1pTew89Rtpeu84Ht3nQEFTYhAX3Ypa_xJs/edit?usp=sharing

For C++ stack trace, run with TORCHDYNAMO_EXTENDED_DEBUG_CPP=1

While executing %repeat_interleave : [num_users=1] = call_method[target=repeat_interleave](args = (%zeros, %item), kwargs = {})
Original traceback:
  File "/home/ryan/src/equicloud/tools/bugs/12_symint_repeat_interleave_guard_panic.py", line 61, in fn_0d
    return torch.zeros(x.shape[0]).repeat_interleave(target_size)

Use tlparse to see full graph. (https://github.com/pytorch/tlparse?tab=readme-ov-file#tlparse-parse-structured-pt2-logs)

CONFIRMED (1-D workaround): succeeded, out.shape=torch.Size([12])

Versions

(env_210) ryan@ryan-dev-box:~/src/env$ curl -OL https://raw.githubusercontent.com/pytorch/pytorch/main/torch/utils/collect_env.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 31107  100 31107    0     0  80083      0 --:--:-- --:--:-- --:--:-- 79966
(env_210) ryan@ryan-dev-box:~/src/env$ python3 collect_env.py
Collecting environment information...
PyTorch version: 2.10.0+cu129
Is debug build: False
CUDA used to build PyTorch: 12.9
ROCM used to build PyTorch: N/A

OS: Ubuntu 25.10 (x86_64)
GCC version: (Ubuntu 15.2.0-4ubuntu4) 15.2.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.42

Python version: 3.12.12 | packaged by Anaconda, Inc. | (main, Oct 21 2025, 20:16:04) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.17.0-12-generic-x86_64-with-glibc2.42
Is CUDA available: True
CUDA runtime version: 12.9.86
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 5060 Ti
Nvidia driver version: 580.126.09
cuDNN version: Could not collect
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           48 bits physical, 48 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  12
On-line CPU(s) list:                     0-11
Vendor ID:                               AuthenticAMD
Model name:                              AMD Ryzen 5 7600X 6-Core Processor
CPU family:                              25
Model:                                   97
Thread(s) per core:                      2
Core(s) per socket:                      6
Socket(s):                               1
Stepping:                                2
Frequency boost:                         enabled
CPU(s) scaling MHz:                      87%
CPU max MHz:                             5457.1050
CPU min MHz:                             427.3640
BogoMIPS:                                9381.76
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpuid_fault cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze
Virtualization:                          AMD-V
L1d cache:                               192 KiB (6 instances)
L1i cache:                               192 KiB (6 instances)
L2 cache:                                6 MiB (6 instances)
L3 cache:                                32 MiB (1 instance)
NUMA node(s):                            1
NUMA node0 CPU(s):                       0-11
Vulnerability Gather data sampling:      Not affected
Vulnerability Ghostwrite:                Not affected
Vulnerability Indirect target selection: Not affected
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Not affected
Vulnerability Old microcode:             Not affected
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Mitigation; Safe RET
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds:                     Not affected
Vulnerability Tsa:                       Mitigation; Clear CPU buffers
Vulnerability Tsx async abort:           Not affected
Vulnerability Vmscape:                   Mitigation; IBPB before exit to userspace

Versions of relevant libraries:
[pip3] numpy==2.2.6
[pip3] nvidia-cublas-cu12==12.9.1.4
[pip3] nvidia-cuda-cupti-cu12==12.9.79
[pip3] nvidia-cuda-nvrtc-cu12==12.9.86
[pip3] nvidia-cuda-runtime-cu12==12.9.79
[pip3] nvidia-cudnn-cu12==9.10.2.21
[pip3] nvidia-cufft-cu12==11.4.1.4
[pip3] nvidia-curand-cu12==10.3.10.19
[pip3] nvidia-cusolver-cu12==11.7.5.82
[pip3] nvidia-cusparse-cu12==12.5.10.65
[pip3] nvidia-cusparselt-cu12==0.7.1
[pip3] nvidia-nccl-cu12==2.27.5
[pip3] nvidia-nvjitlink-cu12==12.9.86
[pip3] nvidia-nvtx-cu12==12.9.79
[pip3] torch==2.10.0+cu129
[pip3] torch-geometric==2.7.0
[pip3] torchvision==0.25.0+cu129
[pip3] triton==3.6.0
[conda] numpy                       2.2.6            pypi_0           pypi
[conda] nvidia-cublas-cu12          12.9.1.4         pypi_0           pypi
[conda] nvidia-cuda-cupti-cu12      12.9.79          pypi_0           pypi
[conda] nvidia-cuda-nvrtc-cu12      12.9.86          pypi_0           pypi
[conda] nvidia-cuda-runtime-cu12    12.9.79          pypi_0           pypi
[conda] nvidia-cudnn-cu12           9.10.2.21        pypi_0           pypi
[conda] nvidia-cufft-cu12           11.4.1.4         pypi_0           pypi
[conda] nvidia-curand-cu12          10.3.10.19       pypi_0           pypi
[conda] nvidia-cusolver-cu12        11.7.5.82        pypi_0           pypi
[conda] nvidia-cusparse-cu12        12.5.10.65       pypi_0           pypi
[conda] nvidia-cusparselt-cu12      0.7.1            pypi_0           pypi
[conda] nvidia-nccl-cu12            2.27.5           pypi_0           pypi
[conda] nvidia-nvjitlink-cu12       12.9.86          pypi_0           pypi
[conda] nvidia-nvtx-cu12            12.9.79          pypi_0           pypi
[conda] torch                       2.10.0+cu129     pypi_0           pypi
[conda] torch-geometric             2.7.0            pypi_0           pypi
[conda] torchvision                 0.25.0+cu129     pypi_0           pypi
[conda] triton                      3.6.0            pypi_0           pypi

cc @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka

extent analysis

Problem Summary

The issue is caused by a bug in the repeat_interleave function when used with an Unbacked 0-D SymInt as the repeat count. This results in a GuardOnDataDependentSymNode error.

Root Cause Analysis

The root cause of the issue is the use of an Unbacked 0-D SymInt as the repeat count in the repeat_interleave function. This is not supported by the inductor backend, which is why the error occurs.

Fix Plan

To fix this issue, we need to modify the fn_0d function to use a 1-D tensor as the repeat count, just like in the fn_1d function. Here are the steps:

Step 1: Modify the `fn_0d` function

def fn_0d(x: torch.Tensor, n: torch.Tensor) -> torch.Tensor:
    """Fails: repeat count comes from a 0-D .item() call."""
    target_size = n.unsqueeze(0)  # Use a 1-D tensor as the repeat count
    return torch.zeros(x.shape[0]).repeat_interleave(target_size)

Step 2: Re-run the `main` function

def main() -> None:
    """Run the repro."""
    compiled_0d = torch.compile(fn_0d, dynamic=True, fullgraph=True)
    # ...

Verification

To verify that the fix worked, we can run the main function again and check that the GuardOnDataDependentSymNode error no longer occurs.

Extra Tips

Make sure to use a 1-D tensor as the repeat count in the repeat_interleave function.
If you encounter any other issues, try running the code with the TORCHDYNAMO_VERBOSE=1 environment variable set to

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix `GuardOnDataDependentSymNode` when `repeat_interleave` is called with an Unbacked 0-D `SymInt` [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

The 1-D workaround should succeed.

PR fix notes

PR #177305: Fix unbacked scalar repeat_interleave guards

Description (problem / solution / changelog)

Root cause

Proposed fix

Why this is the right long term fix

Changed files

Code Example

🐛 Describe the bug

Error logs

Versions

extent analysis

Problem Summary

Root Cause Analysis

Fix Plan

Step 1: Modify the `fn_0d` function

Step 2: Re-run the `main` function

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix `GuardOnDataDependentSymNode` when `repeat_interleave` is called with an Unbacked 0-D `SymInt` [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

The 1-D workaround should succeed.

PR fix notes

PR #177305: Fix unbacked scalar repeat_interleave guards

Description (problem / solution / changelog)

Root cause

Proposed fix

Why this is the right long term fix

Changed files

Code Example

🐛 Describe the bug

Error logs

Versions

extent analysis

Problem Summary

Root Cause Analysis

Fix Plan

Step 1: Modify the fn_0d function

Step 2: Re-run the main function

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Step 1: Modify the `fn_0d` function

Step 2: Re-run the `main` function