pytorch - 💡(How to fix) Fix `torch.compile` fails on `index_fill` after `permute`: functionalization assertion (`copy_`) while eager works [2 comments, 2 participants]

Code Example

AssertionError: n=copy_, n.args[0]=empty, ...

---

import torch
import torch.nn as nn

class M(nn.Module):
    def forward(self, x, idx):
        x = x.permute(0, 2, 1)
        x = x.index_fill(1, idx, 0.0)
        return x

m = M()
x = torch.randn(5, 1, 3)
idx = torch.tensor([0, 2], dtype=torch.long)

# eager works
y = m(x)

# compile fails
cm = torch.compile(m)
cy = cm(x, idx)

---

AssertionError: n=copy_, n.args[0]=empty, placeholders={arg1_1, arg0_1}

---

%permute = aten.permute(...)
%index_put = aten.index_put(...)
%empty = aten.empty(...)
%copy_ = aten.copy_(...)

---

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: n=copy_, n.args[0]=empty, placeholders={arg1_1, arg0_1}, graph=graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %permute : [num_users=1] = call_function[target=torch.ops.aten.permute.default](args = (%arg0_1, [0, 2, 1]), kwargs = {})
    %scalar_tensor : [num_users=1] = call_function[target=torch.ops.aten.scalar_tensor.default](args = (0.0,), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu, pin_memory: False})
    %expand : [num_users=1] = call_function[target=torch.ops.aten.expand.default](args = (%scalar_tensor, [5, 2, 1]), kwargs = {})
    %index_put : [num_users=1] = call_function[target=torch.ops.aten.index_put.default](args = (%permute, [None, %arg1_1], %expand), kwargs = {})
    %empty : [num_users=1] = call_function[target=torch.ops.aten.empty.memory_format](args = ([5, 3, 1],), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu, pin_memory: False})
    %copy_ : [num_users=1] = call_function[target=torch.ops.aten.copy_.default](args = (%empty, %index_put), kwargs = {})
    return (copy_,)

---

PyTorch version:  2.11.0
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

🐛 Describe the bug

Summary

A minimal model using permute followed by index_fill runs correctly in eager mode, but fails under torch.compile with an assertion from AOTAutograd functionalization:

AssertionError: n=copy_, n.args[0]=empty, ...

The failure occurs during graph capture before codegen, indicating that the graph is not considered functional due to a copy_ introduced in lowering.

Minimal Reproduction

import torch
import torch.nn as nn

class M(nn.Module):
    def forward(self, x, idx):
        x = x.permute(0, 2, 1)
        x = x.index_fill(1, idx, 0.0)
        return x

m = M()
x = torch.randn(5, 1, 3)
idx = torch.tensor([0, 2], dtype=torch.long)

# eager works
y = m(x)

# compile fails
cm = torch.compile(m)
cy = cm(x, idx)

Actual Behavior

The compiled path fails with:

AssertionError: n=copy_, n.args[0]=empty, placeholders={arg1_1, arg0_1}

Relevant FX graph snippet:

%permute = aten.permute(...)
%index_put = aten.index_put(...)
%empty = aten.empty(...)
%copy_ = aten.copy_(...)

Full error:

torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
AssertionError: n=copy_, n.args[0]=empty, placeholders={arg1_1, arg0_1}, graph=graph():
    %arg0_1 : [num_users=1] = placeholder[target=arg0_1]
    %arg1_1 : [num_users=1] = placeholder[target=arg1_1]
    %permute : [num_users=1] = call_function[target=torch.ops.aten.permute.default](args = (%arg0_1, [0, 2, 1]), kwargs = {})
    %scalar_tensor : [num_users=1] = call_function[target=torch.ops.aten.scalar_tensor.default](args = (0.0,), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu, pin_memory: False})
    %expand : [num_users=1] = call_function[target=torch.ops.aten.expand.default](args = (%scalar_tensor, [5, 2, 1]), kwargs = {})
    %index_put : [num_users=1] = call_function[target=torch.ops.aten.index_put.default](args = (%permute, [None, %arg1_1], %expand), kwargs = {})
    %empty : [num_users=1] = call_function[target=torch.ops.aten.empty.memory_format](args = ([5, 3, 1],), kwargs = {dtype: torch.float32, layout: torch.strided, device: cpu, pin_memory: False})
    %copy_ : [num_users=1] = call_function[target=torch.ops.aten.copy_.default](args = (%empty, %index_put), kwargs = {})
    return (copy_,)

Versions

PyTorch version:  2.11.0
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

cc @chauhang @penguinwu

extent analysis

TL;DR

The issue can be resolved by modifying the model to avoid the copy_ operation introduced during graph capture.

Guidance

The error occurs due to the copy_ operation introduced in the lowered graph, which is not considered functional.
To mitigate this, try modifying the index_fill operation to avoid the copy_ introduction.
Verify the fix by checking if the compiled model runs without errors.
If the issue persists, try updating the PyTorch version to a newer release, as the current version (

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - 💡(How to fix) Fix `torch.compile` fails on `index_fill` after `permute`: functionalization assertion (`copy_`) while eager works [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Summary

Minimal Reproduction

Actual Behavior

Versions

extent analysis

TL;DR

Guidance

Still need to ship something?

TRENDING

pytorch - 💡(How to fix) Fix `torch.compile` fails on `index_fill` after `permute`: functionalization assertion (`copy_`) while eager works [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

🐛 Describe the bug

Summary

Minimal Reproduction

Actual Behavior

Versions

extent analysis

TL;DR

Guidance

Still need to ship something?

RELATED_DISCOVERY

TRENDING