pytorch - 💡(How to fix) Fix `torch.compile` + `torch.while_loop`: stride mismatch crash during backward pass

pytorch2026-05-09 00:56:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

InductorError: UncapturedHigherOrderOpError: Expected carried_inputs and body_output
to have same metadata but found:
pair[1] differ in 'stride: (0, 0) vs (64, 1)', where lhs is
FakeTensor(..., device='cuda:0', size=(4, 64)) and rhs is
FakeTensor(..., device='cuda:0', size=(4, 64))

While executing %while_loop : [num_users=1] = call_function[target=torch.ops.higher_order.while_loop](...)

Code Example

InductorError: UncapturedHigherOrderOpError: Expected carried_inputs and body_output
to have same metadata but found:
pair[1] differ in 'stride: (0, 0) vs (64, 1)', where lhs is
FakeTensor(..., device='cuda:0', size=(4, 64)) and rhs is
FakeTensor(..., device='cuda:0', size=(4, 64))

While executing %while_loop : [num_users=1] = call_function[target=torch.ops.higher_order.while_loop](...)

---

import torch

torch.manual_seed(42)
x = torch.randn(4, 64, device='cuda', requires_grad=True)

def cond_fn(i, x):
    return i < 5

def body_fn(i, x):
    return i + 1, x * 0.9 + 0.1

def fn(x):
    _, result = torch.while_loop(
        cond_fn, body_fn,
        (torch.tensor(0, device='cuda'), x)
    )
    return result.sum()

# Eager — OK
loss = fn(x)
loss.backward()
print(f"Eager OK: grad_max={x.grad.abs().max().item():.6f}")
x.grad = None

# Compiled — CRASH
torch._dynamo.reset()
fn_c = torch.compile(fn, fullgraph=True)
loss_c = fn_c(x)
loss_c.backward()  # <-- crashes here

---

PyTorch: 2.13.0.dev20260501+cu126
Python: 3.11
CUDA: 12.6
GPU: Tesla T4

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile(fullgraph=True) with the Inductor backend crashes when running backward through a torch.while_loop that modifies carried tensor state. The error is a stride mismatch between the carried input (stride (0, 0)) and the body output (stride (64, 1)).

Eager mode: works fine
aot_eager backend: works fine
Inductor backend (forward only, no grad): works fine
Inductor backend (with backward): CRASHES

Error message

InductorError: UncapturedHigherOrderOpError: Expected carried_inputs and body_output
to have same metadata but found:
pair[1] differ in 'stride: (0, 0) vs (64, 1)', where lhs is
FakeTensor(..., device='cuda:0', size=(4, 64)) and rhs is
FakeTensor(..., device='cuda:0', size=(4, 64))

While executing %while_loop : [num_users=1] = call_function[target=torch.ops.higher_order.while_loop](...)

To reproduce

import torch

torch.manual_seed(42)
x = torch.randn(4, 64, device='cuda', requires_grad=True)

def cond_fn(i, x):
    return i < 5

def body_fn(i, x):
    return i + 1, x * 0.9 + 0.1

def fn(x):
    _, result = torch.while_loop(
        cond_fn, body_fn,
        (torch.tensor(0, device='cuda'), x)
    )
    return result.sum()

# Eager — OK
loss = fn(x)
loss.backward()
print(f"Eager OK: grad_max={x.grad.abs().max().item():.6f}")
x.grad = None

# Compiled — CRASH
torch._dynamo.reset()
fn_c = torch.compile(fn, fullgraph=True)
loss_c = fn_c(x)
loss_c.backward()  # <-- crashes here

Note: Forward-only (without requires_grad) compiles and runs correctly. The crash is specific to the backward pass through the Inductor-compiled while_loop.

Versions

PyTorch: 2.13.0.dev20260501+cu126
Python: 3.11
CUDA: 12.6
GPU: Tesla T4

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @ydwu4 @bdhirsh @bobrenjc93 @aorenste

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.compile` + `torch.while_loop`: stride mismatch crash during backward pass

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

🐛 Describe the bug

Error message

To reproduce

Versions

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` + `torch.while_loop`: stride mismatch crash during backward pass

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

🐛 Describe the bug

Error message

To reproduce

Versions

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING