pytorch - ✅(Solved) Fix [inductor] torch.compile produces incorrect results when applying `index_put` and `sort` on the return of `pad` [1 pull requests, 2 comments, 3 participants]

pytorch2026-03-17 09:22:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#177631•Fetched 2026-04-08 00:47:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

mentioned ×37subscribed ×37labeled ×12commented ×2

Error Message

Error logs

Fix Action

Fixed

Fixed by PR: [inductor] Preserve mutation ordering deps in recompute_size_and_body (https://github.com/pytorch/pytorch/pull/177816)

PR fix notes

PR #177816: [inductor] Preserve mutation ordering deps in recompute_size_and_body

Repository: pytorch/pytorch
Author: yf225
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/177816

Description (problem / solution / changelog)

Summary

Fixes https://github.com/pytorch/pytorch/issues/177631

SchedulerNode.recompute_size_and_body() calls _compute_attrs() which recomputes read_writes from scratch via extract_read_writes(). This only discovers natural data-flow dependencies and silently drops manually-added fake dependencies (WeakDep, StarDep) that were added by compute_dependencies() for mutation ordering.

When the CPU backend's fuse() method calls recompute_size_and_body() to reconcile loop ranges before fusion, the WeakDep that enforces "sort must run before the mutation of its input" is lost. The fused mutation kernel then gets scheduled before the sort, causing the sort to read already-mutated data and produce incorrect results.

Root cause chain

remove_noop_ops eliminates no-op constant_pad_nd(a, [0,0]), making sort(b) become sort(arg0_1) — both sort and the input mutation now target the same buffer
compute_dependencies() correctly adds WeakDep('buf0') to the mutation node, ensuring it runs after the sort's FallbackKernel
During fusion, the CPU backend calls recompute_size_and_body() on the mutation node to reconcile loop ranges — this recomputes read_writes from scratch, dropping the WeakDep
The fused mutation kernel now has no ordering constraint relative to the sort, and gets scheduled first
Sort reads the already-mutated input, producing [[1,1],[0,0]] instead of [[0,0],[0,0]]

Fix

Save and restore fake dependencies (WeakDep, StarDep) across _compute_attrs(), following the same pattern already used in refresh_dependencies().

Test plan

Added test_sort_after_noop_pad_and_mutation to test/inductor/test_torchinductor.py
Verified passing on both CPU and CUDA
All existing sort tests pass (no regressions)
All existing input mutation tests pass (no regressions)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @ezyang

Changed files

test/inductor/test_torchinductor.py (modified, +20/-0)
torch/_inductor/scheduler.py (modified, +9/-0)

Code Example

import torch


def fn(a):
    b = torch.nn.functional.pad(a, (0, 0))
    a[0] = 1  # a[0, 0] = 1 will not cause inconsistency
    sorted_tensor, _ = torch.sort(b) # return b here will not cause inconsistency
    return sorted_tensor

x1 = torch.zeros((2, 2), dtype=torch.float32)
x2 = torch.zeros((2, 2), dtype=torch.float32)

cfunc = torch.compile(fn, backend='inductor')
out1 = fn(x1)
out2 = cfunc(x2)

print(f'{out1=}')
print(f'{out2=}')
torch.testing.assert_close(out1, out2, equal_nan=True)

'''
out1=tensor([[0., 0.],
        [0., 0.]])
out2=tensor([[1., 1.],
        [0., 0.]])
'''

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

When compiling a model with no-op padding, broadcast index_put, and sort, the compiled model outputs incorrect results.

Here is the code to reproduce:

import torch


def fn(a):
    b = torch.nn.functional.pad(a, (0, 0))
    a[0] = 1  # a[0, 0] = 1 will not cause inconsistency
    sorted_tensor, _ = torch.sort(b) # return b here will not cause inconsistency
    return sorted_tensor

x1 = torch.zeros((2, 2), dtype=torch.float32)
x2 = torch.zeros((2, 2), dtype=torch.float32)

cfunc = torch.compile(fn, backend='inductor')
out1 = fn(x1)
out2 = cfunc(x2)

print(f'{out1=}')
print(f'{out2=}')
torch.testing.assert_close(out1, out2, equal_nan=True)

'''
out1=tensor([[0., 0.],
        [0., 0.]])
out2=tensor([[1., 1.],
        [0., 0.]])
'''

After a few investigations, I have the following findings:

changing a[0]=1 to a[0,0] will not cause inconsistency
directly returning b before the torch.sort(b) will also not cause the inconsistency.
aot_eager and eager backends do not have this issue
it seems to be a regression bug, this issue does not occur on torch 2.8

Error logs

No response

Versions

Versions of relevant libraries: [pip3] numpy==2.4.2 [pip3] torch==2.12.0.dev20260316+cpu [pip3] torchvision==0.26.0.dev20260223+cpu [pip3] triton==3.6.0 [conda] numpy 2.4.2 pypi_0 pypi [conda] torch 2.12.0.dev20260316+cpu pypi_0 pypi [conda] torchvision 0.26.0.dev20260223+cpu pypi_0 pypi [conda] triton 3.6.0 pypi_0 pypi

cc @ezyang @gchanan @kadeng @msaroufim @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @muchulee8 @amjames @aakhundov @coconutruben @jataylo

extent analysis

Fix Plan

To fix the issue with incorrect results when compiling a model with no-op padding, broadcast index_put, and sort, we need to update the PyTorch version or modify the code to avoid the regression bug.

Update PyTorch Version

Update PyTorch to a version where this issue is fixed. Since the issue does not occur on torch 2.8, updating to this version or a later version where the bug is fixed should resolve the issue.

Modify the Code

If updating PyTorch is not feasible, we can modify the code to avoid the bug. Based on the findings, changing a[0]=1 to a[0,0]=1 or directly returning b before the torch.sort(b) can prevent the inconsistency.

Here is an example of how to modify the code:

import torch

def fn(a):
    b = torch.nn.functional.pad(a, (0, 0))
    # Change a[0]=1 to a[0,0]=1 to avoid the bug
    a[0, 0] = 1  
    sorted_tensor, _ = torch.sort(b) 
    return sorted_tensor

x1 = torch.zeros((2, 2), dtype=torch.float32)
x2 = torch.zeros((2, 2), dtype=torch.float32)

cfunc = torch.compile(fn, backend='inductor')
out1 = fn(x1)
out2 = cfunc(x2)

print(f'{out1=}')
print(f'{out2=}')
torch.testing.assert_close(out1, out2, equal_nan=True)

Alternatively, you can return b directly before the torch.sort(b):

import torch

def fn(a):
    b = torch.nn.functional.pad(a, (0, 0))
    a[0] = 1  
    # Return b directly to avoid the bug
    # sorted_tensor, _ = torch.sort(b) 
    return b

x1 = torch.zeros((2, 2), dtype=torch.float32)
x2 = torch.zeros((2, 2), dtype=torch.float32)

cfunc = torch.compile(fn, backend='inductor')
out1 = fn(x1)
out2 = cfunc(x2)

print(f'{out1=}')
print(f'{out2=}')
torch.testing.assert_close(out1, out2, equal_nan=True)

Verification

To verify that the fix worked, run the modified code and check that the output of out1 and out2 are equal. You can use torch.testing.assert_close to assert that the outputs are close.

Extra Tips

Always test your code with different inputs and edge cases to ensure that the fix works correctly.
If you are using a version of PyTorch where this bug is fixed, you can update your code to use

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error #parallel task #integration issue #index setup #retrieval issue #search optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix [inductor] torch.compile produces incorrect results when applying `index_put` and `sort` on the return of `pad` [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error logs

Fix Action

Fixed

PR fix notes

PR #177816: [inductor] Preserve mutation ordering deps in recompute_size_and_body

Description (problem / solution / changelog)

Summary

Root cause chain

Fix

Test plan

Changed files

Code Example

🐛 Describe the bug

Error logs

Versions

extent analysis

Fix Plan

Update PyTorch Version

Modify the Code

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix [inductor] torch.compile produces incorrect results when applying `index_put` and `sort` on the return of `pad` [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error logs

Fix Action

Fixed

PR fix notes

PR #177816: [inductor] Preserve mutation ordering deps in recompute_size_and_body

Description (problem / solution / changelog)

Summary

Root cause chain

Fix

Test plan

Changed files

Code Example

🐛 Describe the bug

Error logs

Versions

extent analysis

Fix Plan

Update PyTorch Version

Modify the Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING