pytorch - ✅(Solved) Fix `torch.compile` crashes on `cumsum(x + broadcast_bias)` when scan dim >= 129 [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180221Fetched 2026-04-15 06:19:17
View on GitHub
Comments
1
Participants
2
Timeline
79
Reactions
1
Author
Participants
Timeline (top)
mentioned ×36subscribed ×36labeled ×4commented ×1

Error Message

torch._inductor.exc.InductorError: TypeError: list indices must be integers or slices, not NoneType

Root Cause

The crash is in Inductor's SplitScan codegen path. When the scan dimension is <= 128, Inductor uses a single-block scan (works fine). When >= 129, it switches to SplitScan (multi-block), which crashes because TritonSplitScanKernel.initialize_range_tree sets tensor_dim=None for pointwise range trees, but the codegen later indexes a list with this None value when loading broadcast variables.

PR fix notes

PR #180369: [inductor] Fix torch.compile crash on cumsum with broadcast input when scan dim >= 129

Description (problem / solution / changelog)

Fixes https://github.com/pytorch/pytorch/issues/180221

SplitScan kernels set no_x_dim=True, so the "x" range tree gets tensor_dim=None. When a broadcast variable (e.g. bias) references this tree, get_block_shape does shape[None] and crashes with TypeError.

Fix: treat tensor_dim=None as scalar shape () in get_block_shape, matching how other scalar symbols (e.g. SymT.SIZE) are handled.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Changed files

  • test/inductor/test_torchinductor.py (modified, +9/-0)
  • torch/_inductor/codegen/triton.py (modified, +9/-2)

Code Example

import torch

x = torch.randn(1, 129, 64, device='cuda')
b = torch.randn(64, device='cuda')

# Works
print(torch.cumsum(x + b, dim=1).shape)

# Crashes
print(torch.compile(lambda x, b: torch.cumsum(x + b, dim=1))(x, b).shape)

---

torch._inductor.exc.InductorError: TypeError: list indices must be integers or slices, not NoneType
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile(backend='inductor') crashes with TypeError: list indices must be integers or slices, not NoneType when compiling torch.cumsum over a tensor that includes a broadcast addition with a bias vector, and the scan dimension size is >= 129.

The crash is in Inductor's SplitScan codegen path. When the scan dimension is <= 128, Inductor uses a single-block scan (works fine). When >= 129, it switches to SplitScan (multi-block), which crashes because TritonSplitScanKernel.initialize_range_tree sets tensor_dim=None for pointwise range trees, but the codegen later indexes a list with this None value when loading broadcast variables.

The same crash affects torch.cumprod, torch.logcumsumexp, and torch._higher_order_ops.associative_scan.

Reproducer

import torch

x = torch.randn(1, 129, 64, device='cuda')
b = torch.randn(64, device='cuda')

# Works
print(torch.cumsum(x + b, dim=1).shape)

# Crashes
print(torch.compile(lambda x, b: torch.cumsum(x + b, dim=1))(x, b).shape)

Error:

torch._inductor.exc.InductorError: TypeError: list indices must be integers or slices, not NoneType

Trigger conditions

The crash requires both:

  1. A scan op (cumsum/cumprod/logcumsumexp) with scan dim size >= 129
  2. A broadcast variable in the fused input (e.g., adding a bias vector)

Without broadcast (standalone cumsum(x)): compiles fine at any size. With broadcast but scan dim <= 128: compiles fine (uses single-block scan, not SplitScan).

Scan dimcumsum(x)cumsum(x + bias)
128OKOK
129OKCRASH
256OKCRASH
512OKCRASH

Versions

  • PyTorch 2.11.0+cu126: CRASH
  • PyTorch 2.12.0.dev20260410+cu126 (nightly): CRASH
  • GPU: NVIDIA Tesla T4 (sm_75), but not GPU-specific (codegen crash before kernel launch)

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

TL;DR

  • The issue can be worked around by avoiding the use of torch.compile with backend='inductor' for scan operations with a dimension size of 129 or more when a broadcast variable is involved.

Guidance

  • Identify the specific scan operations (torch.cumsum, torch.cumprod, torch.logcumsumexp) and check if the scan dimension size is 129 or more.
  • Verify if a broadcast variable is used in the operation, as this is a required condition for the crash to occur.
  • Consider using a different backend or avoiding compilation for these specific operations until a fix is available.
  • Test with smaller scan dimension sizes (less than 129) to confirm if the issue is specific to the SplitScan codegen path.

Example

import torch

# Avoid using torch.compile for scan operations with dimension size 129 or more and broadcast variables
x = torch.randn(1, 128, 64, device='cuda')
b = torch.randn(64, device='cuda')

# This should work without torch.compile
print(torch.cumsum(x + b, dim=1).shape)

# Alternatively, avoid using broadcast variables
x = torch.randn(1, 129, 64, device='cuda')
print(torch.cumsum(x, dim=1).shape)

Notes

  • The issue is not GPU-specific and occurs with different PyTorch versions.
  • The crash happens before kernel launch, indicating a codegen issue.

Recommendation

  • Apply workaround: Avoid using torch.compile with backend='inductor' for affected scan operations until a fix is available, as this allows the code to run without crashing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING