pytorch - 💡(How to fix) Fix `torch.compile` + `torch.while_loop`: stride mismatch crash during backward pass

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

InductorError: UncapturedHigherOrderOpError: Expected carried_inputs and body_output
to have same metadata but found:
pair[1] differ in 'stride: (0, 0) vs (64, 1)', where lhs is
FakeTensor(..., device='cuda:0', size=(4, 64)) and rhs is
FakeTensor(..., device='cuda:0', size=(4, 64))

While executing %while_loop : [num_users=1] = call_function[target=torch.ops.higher_order.while_loop](...)

Code Example

InductorError: UncapturedHigherOrderOpError: Expected carried_inputs and body_output
to have same metadata but found:
pair[1] differ in 'stride: (0, 0) vs (64, 1)', where lhs is
FakeTensor(..., device='cuda:0', size=(4, 64)) and rhs is
FakeTensor(..., device='cuda:0', size=(4, 64))

While executing %while_loop : [num_users=1] = call_function[target=torch.ops.higher_order.while_loop](...)

---

import torch

torch.manual_seed(42)
x = torch.randn(4, 64, device='cuda', requires_grad=True)

def cond_fn(i, x):
    return i < 5

def body_fn(i, x):
    return i + 1, x * 0.9 + 0.1

def fn(x):
    _, result = torch.while_loop(
        cond_fn, body_fn,
        (torch.tensor(0, device='cuda'), x)
    )
    return result.sum()

# EagerOK
loss = fn(x)
loss.backward()
print(f"Eager OK: grad_max={x.grad.abs().max().item():.6f}")
x.grad = None

# CompiledCRASH
torch._dynamo.reset()
fn_c = torch.compile(fn, fullgraph=True)
loss_c = fn_c(x)
loss_c.backward()  # <-- crashes here

---

PyTorch: 2.13.0.dev20260501+cu126
Python: 3.11
CUDA: 12.6
GPU: Tesla T4
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

🐛 Describe the bug

torch.compile(fullgraph=True) with the Inductor backend crashes when running backward through a torch.while_loop that modifies carried tensor state. The error is a stride mismatch between the carried input (stride (0, 0)) and the body output (stride (64, 1)).

  • Eager mode: works fine
  • aot_eager backend: works fine
  • Inductor backend (forward only, no grad): works fine
  • Inductor backend (with backward): CRASHES

Error message

InductorError: UncapturedHigherOrderOpError: Expected carried_inputs and body_output
to have same metadata but found:
pair[1] differ in 'stride: (0, 0) vs (64, 1)', where lhs is
FakeTensor(..., device='cuda:0', size=(4, 64)) and rhs is
FakeTensor(..., device='cuda:0', size=(4, 64))

While executing %while_loop : [num_users=1] = call_function[target=torch.ops.higher_order.while_loop](...)

To reproduce

import torch

torch.manual_seed(42)
x = torch.randn(4, 64, device='cuda', requires_grad=True)

def cond_fn(i, x):
    return i < 5

def body_fn(i, x):
    return i + 1, x * 0.9 + 0.1

def fn(x):
    _, result = torch.while_loop(
        cond_fn, body_fn,
        (torch.tensor(0, device='cuda'), x)
    )
    return result.sum()

# Eager — OK
loss = fn(x)
loss.backward()
print(f"Eager OK: grad_max={x.grad.abs().max().item():.6f}")
x.grad = None

# Compiled — CRASH
torch._dynamo.reset()
fn_c = torch.compile(fn, fullgraph=True)
loss_c = fn_c(x)
loss_c.backward()  # <-- crashes here

Note: Forward-only (without requires_grad) compiles and runs correctly. The crash is specific to the backward pass through the Inductor-compiled while_loop.

Versions

Versions

PyTorch: 2.13.0.dev20260501+cu126
Python: 3.11
CUDA: 12.6
GPU: Tesla T4

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @ydwu4 @bdhirsh @bobrenjc93 @aorenste

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` + `torch.while_loop`: stride mismatch crash during backward pass