pytorch - 💡(How to fix) Fix InductorError on in-place multidimensional dynamic slicing with Tensor-derived slice bounds

Code Example

torch._inductor.exc.InductorError: LoweringException: NotImplementedError: View
  target: aten.slice.Tensor
  ...
  args[3]: u0

---

mask[i, :span, :span] = 1.0

---

span = tensor_span[i, 0]

---

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import platform
import traceback

import torch
import torch.nn as nn


class DynamicSliceMask(nn.Module):
    def forward(self, tensor_span):
        batch_size = tensor_span.shape[0]
        mask = torch.zeros(
            (batch_size, 100, 100),
            dtype=torch.float32,
            device=tensor_span.device,
        )

        for i in range(batch_size):
            span = tensor_span[i, 0]
            mask[i, :span, :span] = 1.0

        return mask


def print_env():
    print("Python:", platform.python_version())
    print("Platform:", platform.platform())
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA device count:", torch.cuda.device_count())
    print("CUDA_VISIBLE_DEVICES:", os.environ.get("CUDA_VISIBLE_DEVICES", ""))
    if torch.cuda.is_available():
        print("Current CUDA device:", torch.cuda.current_device())
        print("CUDA device name:", torch.cuda.get_device_name(0))


def main():
    print_env()

    if not torch.cuda.is_available():
        raise RuntimeError("This repro expects a CUDA device.")

    device = "cuda"
    torch.manual_seed(0)
    torch.cuda.manual_seed_all(0)

    model = DynamicSliceMask().to(device).eval()

    tensor_span = torch.tensor([[8]], dtype=torch.int64, device=device)

    print("\nInput:")
    print(tensor_span)

    with torch.no_grad():
        eager_out = model(tensor_span)

    print("\nEager succeeded.")
    print("Eager output shape:", tuple(eager_out.shape))
    print("Eager output sum:", eager_out.sum().item())

    compiled_model = torch.compile(
        model,
        backend="inductor",
        fullgraph=True,
        dynamic=True,
    )

    print("\nRunning compiled model...")
    try:
        with torch.no_grad():
            compiled_out = compiled_model(tensor_span)

        print("Compiled succeeded.")
        print("Compiled output shape:", tuple(compiled_out.shape))
        print("Compiled output sum:", compiled_out.sum().item())
        torch.testing.assert_close(eager_out, compiled_out)
        print("Eager and compiled outputs match.")

    except Exception:
        print("\nCompiled execution failed with exception:")
        traceback.print_exc()
        raise


if __name__ == "__main__":
    main()

---

Input:
tensor([[8]], device='cuda:0')

Eager succeeded.
Eager output shape: (1, 100, 100)
Eager output sum: 64.0

---

Running compiled model...

Compiled execution failed with exception:
torch._inductor.exc.InductorError: LoweringException: NotImplementedError: View
  target: aten.slice.Tensor
  args[0]: TensorBox(
    View(
      StorageBox(
        ComputedBuffer(name='buf2', layout=FlexibleLayout('cuda:0', torch.float32, size=[1, 100, 100], stride=[10000, 100, 1]), data=Pointwise(
          'cuda',
          torch.float32,
          def inner_fn(index):
              _, i1, i2 = index
              tmp0 = ops.constant(0, torch.float32)
              return tmp0
          ,
          ranges=[1, 100, 100],
          origin_node=full_default,
          origins=OrderedSet([full_default]),
          stack_traces = {,
            File ".../for_test_1.py", line 15, in forward,
              mask = torch.zeros(,
          ,
          }
        )
      ),
      size=[100, 100],
      reindex=lambda i0, i1: [0, i0, i1],
      origins=OrderedSet([select_3, full_default]),
      stack_traces = {,
        File ".../for_test_1.py", line 23, in forward,
          mask[i, :span, :span] = 1.0,
      ,
      }
    )
  )
  args[1]: 0
  args[2]: 0
  args[3]: u0

Found from:
   File ".../for_test_1.py", line 23, in forward
    mask[i, :span, :span] = 1.0

---

PyTorch version:  2.13.0a0+git059c270
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

🐛 Describe the bug

torch.compile with Inductor fails on an in-place multidimensional slice assignment when the slice bound is derived from a Tensor value.

Eager execution succeeds, but the compiled version fails during Inductor lowering with:

torch._inductor.exc.InductorError: LoweringException: NotImplementedError: View
  target: aten.slice.Tensor
  ...
  args[3]: u0

The problematic line is:

mask[i, :span, :span] = 1.0

where span is obtained from a CUDA int64 Tensor:

span = tensor_span[i, 0]

I understand that data-dependent slicing may have limited support, but this case reaches Inductor lowering and fails with an internal InductorError rather than being rejected earlier as an unsupported graph pattern. This pattern is also common in mask construction, e.g., constructing per-sample 2D masks from sequence lengths or valid spans.

Minimal repro

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import os
import platform
import traceback

import torch
import torch.nn as nn


class DynamicSliceMask(nn.Module):
    def forward(self, tensor_span):
        batch_size = tensor_span.shape[0]
        mask = torch.zeros(
            (batch_size, 100, 100),
            dtype=torch.float32,
            device=tensor_span.device,
        )

        for i in range(batch_size):
            span = tensor_span[i, 0]
            mask[i, :span, :span] = 1.0

        return mask


def print_env():
    print("Python:", platform.python_version())
    print("Platform:", platform.platform())
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    print("CUDA device count:", torch.cuda.device_count())
    print("CUDA_VISIBLE_DEVICES:", os.environ.get("CUDA_VISIBLE_DEVICES", ""))
    if torch.cuda.is_available():
        print("Current CUDA device:", torch.cuda.current_device())
        print("CUDA device name:", torch.cuda.get_device_name(0))


def main():
    print_env()

    if not torch.cuda.is_available():
        raise RuntimeError("This repro expects a CUDA device.")

    device = "cuda"
    torch.manual_seed(0)
    torch.cuda.manual_seed_all(0)

    model = DynamicSliceMask().to(device).eval()

    tensor_span = torch.tensor([[8]], dtype=torch.int64, device=device)

    print("\nInput:")
    print(tensor_span)

    with torch.no_grad():
        eager_out = model(tensor_span)

    print("\nEager succeeded.")
    print("Eager output shape:", tuple(eager_out.shape))
    print("Eager output sum:", eager_out.sum().item())

    compiled_model = torch.compile(
        model,
        backend="inductor",
        fullgraph=True,
        dynamic=True,
    )

    print("\nRunning compiled model...")
    try:
        with torch.no_grad():
            compiled_out = compiled_model(tensor_span)

        print("Compiled succeeded.")
        print("Compiled output shape:", tuple(compiled_out.shape))
        print("Compiled output sum:", compiled_out.sum().item())
        torch.testing.assert_close(eager_out, compiled_out)
        print("Eager and compiled outputs match.")

    except Exception:
        print("\nCompiled execution failed with exception:")
        traceback.print_exc()
        raise


if __name__ == "__main__":
    main()

Actual behavior

Eager execution succeeds:

Input:
tensor([[8]], device='cuda:0')

Eager succeeded.
Eager output shape: (1, 100, 100)
Eager output sum: 64.0

Compiled execution fails:

Running compiled model...

Compiled execution failed with exception:
torch._inductor.exc.InductorError: LoweringException: NotImplementedError: View
  target: aten.slice.Tensor
  args[0]: TensorBox(
    View(
      StorageBox(
        ComputedBuffer(name='buf2', layout=FlexibleLayout('cuda:0', torch.float32, size=[1, 100, 100], stride=[10000, 100, 1]), data=Pointwise(
          'cuda',
          torch.float32,
          def inner_fn(index):
              _, i1, i2 = index
              tmp0 = ops.constant(0, torch.float32)
              return tmp0
          ,
          ranges=[1, 100, 100],
          origin_node=full_default,
          origins=OrderedSet([full_default]),
          stack_traces = {,
            File ".../for_test_1.py", line 15, in forward,
              mask = torch.zeros(,
          ,
          }
        )
      ),
      size=[100, 100],
      reindex=lambda i0, i1: [0, i0, i1],
      origins=OrderedSet([select_3, full_default]),
      stack_traces = {,
        File ".../for_test_1.py", line 23, in forward,
          mask[i, :span, :span] = 1.0,
      ,
      }
    )
  )
  args[1]: 0
  args[2]: 0
  args[3]: u0

Found from:
   File ".../for_test_1.py", line 23, in forward
    mask[i, :span, :span] = 1.0

Versions

PyTorch version:  2.13.0a0+git059c270
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

cc @chauhang @penguinwu @ezyang @bobrenjc93 @aditvenk @laithsakka @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix InductorError on in-place multidimensional dynamic slicing with Tensor-derived slice bounds

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Minimal repro

Actual behavior

Versions

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix InductorError on in-place multidimensional dynamic slicing with Tensor-derived slice bounds

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Minimal repro

Actual behavior

Versions

Still need to ship something?

RELATED_DISCOVERY

TRENDING