pytorch - 💡(How to fix) Fix Inductor standalone_compile misses fallback output unbacked symbol binding

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

torch._inductor.standalone_compile can generate Python wrapper code that uses an unbacked symbol from a fallback op output without first binding it from the runtime output tensor.

The minimal case is aten.repeat_interleave.Tensor: the fallback output shape is data-dependent, so the generated wrapper must define the output extent from buf.size(0). Instead, the wrapper can emit an assert_size_stride and downstream allocation using the symbol directly.

Error Message

NameError: name 'u4' is not defined

Root Cause

When fresh unbacked symbols are ignored, FallbackKernel.process_kernel can see a fallback output whose fake shape contains unbacked symbols, but compute_unbacked_bindings(...) returns None because those symbols are not in the pending-fresh set anymore.

The wrapper still needs runtime bindings for any unbacked symbols present in the fallback output layout, otherwise later wrapper code uses undefined variables.

Code Example

import torch
from torch._inductor import standalone_compile
from torch._subclasses.fake_tensor import FakeTensor
from torch.fx.experimental.proxy_tensor import make_fx


def fn(counts, x):
    idx = torch.repeat_interleave(counts)
    return x[idx].sin()


counts = torch.tensor([1, 2, 1, 0], device="cuda", dtype=torch.int64)
x = torch.randn(4, 8, device="cuda")
torch._dynamo.mark_dynamic(counts, 0, min=1, max=8)

gm = make_fx(fn, tracing_mode="symbolic")(counts, x)
fake_mode = next(
    node.meta["val"].fake_mode
    for node in gm.graph.nodes
    if isinstance(node.meta.get("val"), FakeTensor)
)

with (
    torch._guards.tracing(torch._guards.TracingContext(fake_mode)),
    fake_mode.shape_env.ignore_fresh_unbacked_symbols(),
):
    compiled = standalone_compile(
        gm,
        [counts, x],
        dynamic_shapes="from_tracing_context",
        options={},
    )

torch.testing.assert_close(compiled(counts, x), fn(counts, x))

---

buf0 = torch.ops.aten.repeat_interleave.Tensor(arg0_1)
buf1 = buf0
assert_size_stride(buf1, (u4,), (1,), "torch.ops.aten.repeat_interleave.Tensor")
buf2 = empty_strided_cuda((u4, s9), (s9, 1), torch.float32)

---

NameError: name 'u4' is not defined

---

buf0 = torch.ops.aten.repeat_interleave.Tensor(arg0_1)
u4 = buf0.size(0)
buf1 = buf0
assert_size_stride(buf1, (u4,), (1,), "torch.ops.aten.repeat_interleave.Tensor")
RAW_BUFFERClick to expand / collapse

Inductor standalone_compile misses fallback output unbacked symbol binding

Summary

torch._inductor.standalone_compile can generate Python wrapper code that uses an unbacked symbol from a fallback op output without first binding it from the runtime output tensor.

The minimal case is aten.repeat_interleave.Tensor: the fallback output shape is data-dependent, so the generated wrapper must define the output extent from buf.size(0). Instead, the wrapper can emit an assert_size_stride and downstream allocation using the symbol directly.

Repro

import torch
from torch._inductor import standalone_compile
from torch._subclasses.fake_tensor import FakeTensor
from torch.fx.experimental.proxy_tensor import make_fx


def fn(counts, x):
    idx = torch.repeat_interleave(counts)
    return x[idx].sin()


counts = torch.tensor([1, 2, 1, 0], device="cuda", dtype=torch.int64)
x = torch.randn(4, 8, device="cuda")
torch._dynamo.mark_dynamic(counts, 0, min=1, max=8)

gm = make_fx(fn, tracing_mode="symbolic")(counts, x)
fake_mode = next(
    node.meta["val"].fake_mode
    for node in gm.graph.nodes
    if isinstance(node.meta.get("val"), FakeTensor)
)

with (
    torch._guards.tracing(torch._guards.TracingContext(fake_mode)),
    fake_mode.shape_env.ignore_fresh_unbacked_symbols(),
):
    compiled = standalone_compile(
        gm,
        [counts, x],
        dynamic_shapes="from_tracing_context",
        options={},
    )

torch.testing.assert_close(compiled(counts, x), fn(counts, x))

Observed Failure

The generated wrapper contains code equivalent to:

buf0 = torch.ops.aten.repeat_interleave.Tensor(arg0_1)
buf1 = buf0
assert_size_stride(buf1, (u4,), (1,), "torch.ops.aten.repeat_interleave.Tensor")
buf2 = empty_strided_cuda((u4, s9), (s9, 1), torch.float32)

u4 is never defined, so execution fails with:

NameError: name 'u4' is not defined

Root Cause

When fresh unbacked symbols are ignored, FallbackKernel.process_kernel can see a fallback output whose fake shape contains unbacked symbols, but compute_unbacked_bindings(...) returns None because those symbols are not in the pending-fresh set anymore.

The wrapper still needs runtime bindings for any unbacked symbols present in the fallback output layout, otherwise later wrapper code uses undefined variables.

Expected Behavior

The generated wrapper should bind the fallback output extent before using it:

buf0 = torch.ops.aten.repeat_interleave.Tensor(arg0_1)
u4 = buf0.size(0)
buf1 = buf0
assert_size_stride(buf1, (u4,), (1,), "torch.ops.aten.repeat_interleave.Tensor")

Proposed Fix

After compute_unbacked_bindings(...), if no bindings were produced but the fallback output still contains free unbacked symbols, derive bindings directly from the output structure using _free_unbacked_symbols_with_path(...). This keeps the fix local to fallback output binding and does not change graph chunking or symbolic-shape policy.

cc @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING