pytorch - ✅(Solved) Fix `torch.compile` does not preserve `F.pad` output layout on channels-last input [1 pull requests, 1 comments, 2 participants]

pytorch2026-04-06 04:44:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#179442•Fetched 2026-04-08 02:51:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rookieLiu2018

Participants

D-Vspec

rookieLiu2018

Timeline (top)

mentioned ×12subscribed ×12labeled ×6commented ×1

PR fix notes

PR #179837: Fix reflection/replication pad stride mismatch under torch.compile

Repository: pytorch/pytorch
Author: liqiangxl
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/179837

Description (problem / solution / changelog)

Fix https://github.com/pytorch/pytorch/issues/179442 The _reflection_or_replication_pad decomposition uses _unsafe_index which can produce non-standard strides from channels_last inputs. The existing memory format correction called suggest_memory_format(result) — but since _unsafe_index output strides don't reliably reflect the desired format, this gave wrong results.

Fix: use the original input's memory format to decide the output format. On CUDA, the eager C++ kernel always returns contiguous regardless of input format, so force contiguous_format there. On CPU, preserve the input's memory format (e.g. channels_last) to match eager.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

Changed files

test/inductor/test_torchinductor.py (modified, +11/-0)
torch/_decomp/decompositions.py (modified, +14/-2)

Code Example

import torch
import torch.nn.functional as F

def fn(x):
    return F.pad(x, (1, 2, 2, 1), mode="reflect")

x = torch.randn(2, 3, 4, 5).to(memory_format=torch.channels_last)

eager = fn(x.clone())
compiled = torch.compile(fn, backend="aot_eager_decomp_partition")(x.clone())

print("eager stride   =", eager.stride())
print("compiled stride=", compiled.stride())
print("eager channels_last   =", eager.is_contiguous(memory_format=torch.channels_last))
print("compiled channels_last=", compiled.is_contiguous(memory_format=torch.channels_last))

---

eager stride   = (168, 1, 24, 3)
compiled stride= (168, 56, 8, 1)
eager channels_last   = True
compiled channels_last= False

---

PyTorch version: 2.10.0+cpu

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.compile changes the output layout of F.pad on a dense channels_last input. Eager and compiled produce the same values, but the compiled result has different stride() and different is_contiguous(memory_format=torch.channels_last) behavior. This also reproduces with backend="aot_eager_decomp_partition". Repro:

import torch
import torch.nn.functional as F

def fn(x):
    return F.pad(x, (1, 2, 2, 1), mode="reflect")

x = torch.randn(2, 3, 4, 5).to(memory_format=torch.channels_last)

eager = fn(x.clone())
compiled = torch.compile(fn, backend="aot_eager_decomp_partition")(x.clone())

print("eager stride   =", eager.stride())
print("compiled stride=", compiled.stride())
print("eager channels_last   =", eager.is_contiguous(memory_format=torch.channels_last))
print("compiled channels_last=", compiled.is_contiguous(memory_format=torch.channels_last))

output:

eager stride   = (168, 1, 24, 3)
compiled stride= (168, 56, 8, 1)
eager channels_last   = True
compiled channels_last= False

Versions

PyTorch version: 2.10.0+cpu

cc @jamesr66a @chauhang @penguinwu @bdhirsh @bobrenjc93 @aorenste

extent analysis

TL;DR

The issue can be mitigated by ensuring consistent memory format handling when using torch.compile with F.pad on channels_last inputs.

Guidance

Verify the memory format of the input tensor x before and after compilation to ensure it matches the expected torch.channels_last format.
Check the documentation of torch.compile and F.pad to see if there are any known issues or limitations related to memory formats and compilation.
Consider using the torch.memory_format attribute to explicitly set the memory format of the compiled tensor to match the eager tensor.
Test the code with different backends, such as "eager" or other available options, to see if the issue is specific to the "aot_eager_decomp_partition" backend.

Example

import torch
import torch.nn.functional as F

def fn(x):
    return F.pad(x, (1, 2, 2, 1), mode="reflect")

x = torch.randn(2, 3, 4, 5).to(memory_format=torch.channels_last)

eager = fn(x.clone())
compiled = torch.compile(fn, backend="aot_eager_decomp_partition")(x.clone())

# Explicitly set the memory format of the compiled tensor
compiled = compiled.to(memory_format=torch.channels_last)

print("eager stride   =", eager.stride())
print("compiled stride=", compiled.stride())
print("eager channels_last   =", eager.is_contiguous(memory_format=torch.channels_last))
print("compiled channels_last=", compiled.is_contiguous(memory_format=torch.channels_last))

Notes

The issue seems to be related to the interaction between torch.compile and F.pad when using the torch.channels_last memory format. The provided code snippet and example may not fully resolve the issue, but they can help identify the root cause and potential workarounds.

Recommendation

Apply workaround: Explicitly set the memory format of the compiled tensor to match the eager tensor, as shown in the example code snippet. This may help ensure consistent memory format handling and mitigate the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#retriever error #indexing error #inference speed #output truncation #response parsing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix `torch.compile` does not preserve `F.pad` output layout on channels-last input [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #179837: Fix reflection/replication pad stride mismatch under torch.compile

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix `torch.compile` does not preserve `F.pad` output layout on channels-last input [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #179837: Fix reflection/replication pad stride mismatch under torch.compile

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING