pytorch - 💡(How to fix) Fix torch.compile crashes on nested_tensor_from_jagged + layer

Error Message

import torch import torch.nn.functional as F

device = "cuda" if torch.cuda.is_available() else "cpu"

torch.manual_seed(11)

values = torch.randn(20, 12, device=device, requires_grad=True) offsets = torch.tensor([0, 6, 11, 14, 20], device=device, dtype=torch.int64)

def fn(v, o): nt = torch.nested.nested_tensor_from_jagged(v, o) return F.layer_norm(nt, [12]).values().sum()

fn(values, offsets).backward()

torch._dynamo.reset()

try: torch.compile(fn, backend="inductor")( values.detach().clone().requires_grad_(True), offsets, ) print("OK") except Exception as e: msg = str(e).splitlines()[0] matched = "InternalTorchDynamoError" in type(e).name or "list index out of range" in repr(e) print(f"raised: {type(e).name}: {msg[:120]}") print("BUG" if matched else "?")

Code Example

import torch
import torch.nn.functional as F

device = "cuda" if torch.cuda.is_available() else "cpu"

torch.manual_seed(11)

values = torch.randn(20, 12, device=device, requires_grad=True)
offsets = torch.tensor([0, 6, 11, 14, 20], device=device, dtype=torch.int64)

def fn(v, o):
    nt = torch.nested.nested_tensor_from_jagged(v, o)
    return F.layer_norm(nt, [12]).values().sum()

fn(values, offsets).backward()

torch._dynamo.reset()

try:
    torch.compile(fn, backend="inductor")(
        values.detach().clone().requires_grad_(True),
        offsets,
    )
    print("OK")
except Exception as e:
    msg = str(e).splitlines()[0]
    matched = "InternalTorchDynamoError" in type(e).__name__ or "list index out of range" in repr(e)
    print(f"raised: {type(e).__name__}: {msg[:120]}")
    print("BUG" if matched else "?")

🐛 Describe the bug

torch.compile with the Inductor backend crashes when compiling a function that creates a jagged nested tensor and applies F.layer_norm to it.

Eager execution, including backward, runs successfully. However, torch.compile fails during shape guard generation with an InternalTorchDynamoError.

import torch
import torch.nn.functional as F

device = "cuda" if torch.cuda.is_available() else "cpu"

torch.manual_seed(11)

values = torch.randn(20, 12, device=device, requires_grad=True)
offsets = torch.tensor([0, 6, 11, 14, 20], device=device, dtype=torch.int64)

def fn(v, o):
    nt = torch.nested.nested_tensor_from_jagged(v, o)
    return F.layer_norm(nt, [12]).values().sum()

fn(values, offsets).backward()

torch._dynamo.reset()

try:
    torch.compile(fn, backend="inductor")(
        values.detach().clone().requires_grad_(True),
        offsets,
    )
    print("OK")
except Exception as e:
    msg = str(e).splitlines()[0]
    matched = "InternalTorchDynamoError" in type(e).__name__ or "list index out of range" in repr(e)
    print(f"raised: {type(e).__name__}: {msg[:120]}")
    print("BUG" if matched else "?")

Error logs

W0512 07:21:37.558000 site-packages/torch/fx/experimental/symbolic_shapes.py:6117] [0/0] Failing guard allocated at return F.layer_norm(nt, [12]).values().sum()

E0512 07:21:37.559000 site-packages/torch/_guards.py:368] [0/0] Error while creating guard: E0512 07:21:37.559000 site-packages/torch/_guards.py:368] [0/0] Source: shape_env E0512 07:21:37.559000 site-packages/torch/_guards.py:368] [0/0] Create Function: SHAPE_ENV

Traceback (most recent call last): File ".../torch/fx/experimental/symbolic_shapes.py", line 6094, in issue_guard source = symbol_to_source[symbol][0] IndexError: list index out of range

raised: InternalTorchDynamoError: IndexError: list index out of range BUG

Versions

PyTorch version: 2.11.0+cu130 Is debug build: False CUDA used to build PyTorch: 13.0 ROCM used to build PyTorch: N/A

OS: Ubuntu 24.04.2 LTS (x86_64) GCC version: (Ubuntu 13.3.0-6ubuntu2~24.04.1) 13.3.0 Clang version: Could not collect CMake version: Could not collect Libc version: glibc-2.39

Python version: 3.10.20 (main, Mar 11 2026, 17:46:40) [GCC 14.3.0] (64-bit runtime) Python platform: Linux-5.15.167.4-microsoft-standard-WSL2-x86_64-with-glibc2.39 Is CUDA available: False CUDA runtime version: Could not collect CUDA_MODULE_LOADING set to: N/A GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080 Laptop GPU Nvidia driver version: 545.92 cuDNN version: Could not collect Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] onnx==1.21.0 [pip3] onnx2torch==1.5.15 [pip3] onnxruntime==1.23.2 [pip3] torch==2.11.0 [pip3] torchvision==0.26.0 [pip3] triton==3.6.0

cc @cpuhrsch @jbschlosser @bhosmer @drisspg @soulitzer @davidberard98 @YuqingJ @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix torch.compile crashes on nested_tensor_from_jagged + layer_norm with InternalTorchDynamoError

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Error logs

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix torch.compile crashes on nested_tensor_from_jagged + layer_norm with InternalTorchDynamoError

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Error logs

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING