pytorch - 💡(How to fix) Fix [inductor] `associative_scan` crashes with `dynamic=True` for ndim >= 2 [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#181584Fetched 2026-04-28 06:24:34
View on GitHub
Comments
0
Participants
1
Timeline
72
Reactions
0
Author
Participants
Timeline (top)
mentioned ×31subscribed ×31labeled ×10

torch.compile(dynamic=True) crashes when compiling torch._higher_order_ops.associative_scan on inputs with 2 or more dimensions. Static compilation works fine. 1D inputs with dynamic=True also work.

The error is LoweringException: RuntimeError: Unable to generate code for associative_scan op, because there are lifted arguments.

The source code already has a TODO for this (torch/_higher_order_ops/associative_scan.py line 46 in my local nightly: "Support lifted arguments in inductor for associative_scan"). I have not found a developer comment or issue confirming that this is an officially tracked limitation, so this is best described as a user-visible crash in an unsupported/TODO lowering path.

I also re-confirmed this on a fuzzer-discovered 3D case on 2026-04-27. In that case, eager and aot_eager both run successfully, while Inductor with dynamic=True fails with the same lowering error. The failure reproduced 4/4 times on the same nightly build.

Error Message

import torch

def combine_fn(a, b): return a + b

x = torch.randn(4, 8, device='cuda')

Static: works

torch._dynamo.reset() r = torch.compile( lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0) )(x) print(f"static: OK, shape={r.shape}")

Dynamic: crashes

torch._dynamo.reset() r = torch.compile( lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0), dynamic=True )(x)

LoweringException: RuntimeError: Unable to generate code for

associative_scan op, because there are lifted arguments

Root Cause

When associative_scan operates on ndim >= 2, it permutes the tensor to move the scan dim to the last position. Under dynamic=True, the non-scan dimensions become symbolic sizes, which appear as "lifted arguments" in the Inductor lowering for the scan subgraph. The Inductor code generator for associative_scan does not yet support lifted arguments in the combine function's subgraph.

Code Example

import torch

def combine_fn(a, b):
    return a + b

x = torch.randn(4, 8, device='cuda')

# Static: works
torch._dynamo.reset()
r = torch.compile(
    lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0)
)(x)
print(f"static: OK, shape={r.shape}")

# Dynamic: crashes
torch._dynamo.reset()
r = torch.compile(
    lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0),
    dynamic=True
)(x)
# LoweringException: RuntimeError: Unable to generate code for
# associative_scan op, because there are lifted arguments

---

import torch

x = torch.randn(2, 64, 65, dtype=torch.float64, device="cuda").clamp(-2, 2)

def program(x):
    def combine_fn(a, b):
        return a * b
    return torch._higher_order_ops.associative_scan(combine_fn, x, dim=1)

# eager: OK
program(x)

# aot_eager: OK
torch.compile(program, backend="aot_eager", dynamic=True)(x)

# inductor: crashes
torch.compile(program, backend="inductor", dynamic=True)(x)

---

InductorError: LoweringException: RuntimeError: Unable to generate code for associative_scan op, because there are lifted arguments
  target: associative_scan
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Description

torch.compile(dynamic=True) crashes when compiling torch._higher_order_ops.associative_scan on inputs with 2 or more dimensions. Static compilation works fine. 1D inputs with dynamic=True also work.

The error is LoweringException: RuntimeError: Unable to generate code for associative_scan op, because there are lifted arguments.

The source code already has a TODO for this (torch/_higher_order_ops/associative_scan.py line 46 in my local nightly: "Support lifted arguments in inductor for associative_scan"). I have not found a developer comment or issue confirming that this is an officially tracked limitation, so this is best described as a user-visible crash in an unsupported/TODO lowering path.

I also re-confirmed this on a fuzzer-discovered 3D case on 2026-04-27. In that case, eager and aot_eager both run successfully, while Inductor with dynamic=True fails with the same lowering error. The failure reproduced 4/4 times on the same nightly build.

Minimal reproducer

import torch

def combine_fn(a, b):
    return a + b

x = torch.randn(4, 8, device='cuda')

# Static: works
torch._dynamo.reset()
r = torch.compile(
    lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0)
)(x)
print(f"static: OK, shape={r.shape}")

# Dynamic: crashes
torch._dynamo.reset()
r = torch.compile(
    lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0),
    dynamic=True
)(x)
# LoweringException: RuntimeError: Unable to generate code for
# associative_scan op, because there are lifted arguments

Trigger conditions

ConditionRequired
dynamic=TrueYes
ndim>= 2 (1D works)
scan dimAny (tested dim=0 on 2D, dim=1 on 3D)
dtypeAny (tested fp32, fp64)
deviceCUDA

Fuzzer-discovered variant

This variant was found by the graph consistency fuzzer in the v211 novelty-guided run:

import torch

x = torch.randn(2, 64, 65, dtype=torch.float64, device="cuda").clamp(-2, 2)

def program(x):
    def combine_fn(a, b):
        return a * b
    return torch._higher_order_ops.associative_scan(combine_fn, x, dim=1)

# eager: OK
program(x)

# aot_eager: OK
torch.compile(program, backend="aot_eager", dynamic=True)(x)

# inductor: crashes
torch.compile(program, backend="inductor", dynamic=True)(x)

Observed result:

InductorError: LoweringException: RuntimeError: Unable to generate code for associative_scan op, because there are lifted arguments
  target: associative_scan

Local artifacts:

  • minimal_repro.py: minimized trigger and control cases.
  • fuzzer_case_v211_d15_scan_mul.py: original fuzzer-discovered program.
  • fuzzer_case_v211_d15_scan_mul.json: original fuzzer metadata.
  • nightly_repro_2026_04_27*.json: repeated nightly reproduction evidence, 4/4 failures for Inductor with eager and aot_eager both OK.

Root cause

When associative_scan operates on ndim >= 2, it permutes the tensor to move the scan dim to the last position. Under dynamic=True, the non-scan dimensions become symbolic sizes, which appear as "lifted arguments" in the Inductor lowering for the scan subgraph. The Inductor code generator for associative_scan does not yet support lifted arguments in the combine function's subgraph.

Versions

Environment

  • PyTorch 2.13.0.dev20260425+cu126
  • GPU: Tesla T4 (sm_75)
  • Triton 3.7.0

cc @ptrblck @msaroufim @eqy @jerryzh168 @tinglvv @nWEIdia @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @ydwu4 @bdhirsh @aorenste

extent analysis

TL;DR

The issue can be worked around by avoiding the use of dynamic=True when compiling torch._higher_order_ops.associative_scan on inputs with 2 or more dimensions.

Guidance

  • The root cause of the issue is that the Inductor code generator for associative_scan does not yet support lifted arguments in the combine function's subgraph when dynamic=True.
  • To verify the issue, run the provided minimal reproducer code with dynamic=True and observe the LoweringException.
  • As a temporary workaround, consider using static compilation or aot_eager backend, which do not exhibit this issue.
  • To mitigate the issue, ensure that the input tensor has only one dimension or avoid using dynamic=True when compiling associative_scan.

Example

# Static compilation works
r = torch.compile(
    lambda x: torch._higher_order_ops.associative_scan(combine_fn, x, dim=0)
)(x)

# aot_eager backend works
r = torch.compile(program, backend="aot_eager", dynamic=True)(x)

Notes

The issue is specific to the Inductor backend with dynamic=True and may not be applicable to other backends or compilation modes.

Recommendation

Apply workaround: Avoid using dynamic=True when compiling torch._higher_order_ops.associative_scan on inputs with 2 or more dimensions, or use aot_eager backend instead.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - 💡(How to fix) Fix [inductor] `associative_scan` crashes with `dynamic=True` for ndim >= 2 [1 participants]