pytorch - ✅(Solved) Fix MPS conv2d gives incorrect results for channel-slice views of channels_last tensors [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180984Fetched 2026-04-22 07:43:07
View on GitHub
Comments
0
Participants
1
Timeline
64
Reactions
1
Author
Participants
Assignees
Timeline (top)
mentioned ×24subscribed ×24labeled ×6referenced ×6

Fix Action

Fixed

PR fix notes

PR #180992: [MPS] Fix sliced channels_last tensors handling

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

  • -> #180992

By calling Tensor::suggest_memory_format(/*channels_last_strides_exact_match=*/true), that not only checks that slices look like channels last, but also that they are contiguously channels last

Fixed incorrect channels last slicing handling in conv2d, batch_norm and [adaptive_]avg_pool2d

Fixes https://github.com/pytorch/pytorch/issues/180984

Changed files

  • aten/src/ATen/native/ConvUtils.h (modified, +7/-1)
  • aten/src/ATen/native/mps/operations/Convolution.mm (modified, +4/-1)
  • aten/src/ATen/native/mps/operations/Normalization.mm (modified, +8/-3)
  • aten/src/ATen/native/mps/operations/Pooling.mm (modified, +4/-1)
  • test/test_mps.py (modified, +19/-0)

Code Example

import torch
import torch.nn.functional as F

shared = torch.randn(1, 2, 1, 2).contiguous(memory_format=torch.channels_last)
task_slice = shared[:, :1]
weight = torch.randn(1, 1, 1, 1)

cpu_view = F.conv2d(task_slice, weight)
cpu_contig = F.conv2d(task_slice.contiguous(), weight)
print("cpu view vs contig", (cpu_view - cpu_contig).abs().max().item())

if torch.backends.mps.is_available():
    shared_mps = shared.to("mps")
    weight_mps = weight.to("mps")
    task_slice_mps = shared_mps[:, :1]

    mps_view = F.conv2d(task_slice_mps, weight_mps)
    mps_contig = F.conv2d(task_slice_mps.contiguous(), weight_mps)
    torch.mps.synchronize()

    print("mps view vs contig", (mps_view.cpu() - mps_contig.cpu()).abs().max().item())
    print("mps contig vs cpu", (mps_contig.cpu() - cpu_contig).abs().max().item())

---

cpu view vs contig 0.0
mps view vs contig 0.30804574489593506
mps contig vs cpu 0.0

---

F.conv2d(task_slice, weight)
F.conv2d(task_slice.contiguous(), weight)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

conv2d on MPS produces incorrect results when the input is a channel-slice view of a tensor stored in channels_last memory format.

The same operation on CPU is correct, and forcing .contiguous() on the sliced input also makes MPS correct again.

Minimal repro:

import torch
import torch.nn.functional as F

shared = torch.randn(1, 2, 1, 2).contiguous(memory_format=torch.channels_last)
task_slice = shared[:, :1]
weight = torch.randn(1, 1, 1, 1)

cpu_view = F.conv2d(task_slice, weight)
cpu_contig = F.conv2d(task_slice.contiguous(), weight)
print("cpu view vs contig", (cpu_view - cpu_contig).abs().max().item())

if torch.backends.mps.is_available():
    shared_mps = shared.to("mps")
    weight_mps = weight.to("mps")
    task_slice_mps = shared_mps[:, :1]

    mps_view = F.conv2d(task_slice_mps, weight_mps)
    mps_contig = F.conv2d(task_slice_mps.contiguous(), weight_mps)
    torch.mps.synchronize()

    print("mps view vs contig", (mps_view.cpu() - mps_contig.cpu()).abs().max().item())
    print("mps contig vs cpu", (mps_contig.cpu() - cpu_contig).abs().max().item())

Observed output on my system:

cpu view vs contig 0.0
mps view vs contig 0.30804574489593506
mps contig vs cpu 0.0

These two calls should produce the same result:

F.conv2d(task_slice, weight)
F.conv2d(task_slice.contiguous(), weight)

Versions

Collecting environment information... PyTorch version: 2.11.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 26.4.1 (arm64) GCC version: Could not collect Clang version: 21.0.0 (clang-2100.0.123.102) CMake version: version 4.3.1 Libc version: N/A

Python version: 3.12.13 | packaged by Anaconda, Inc. | (main, Mar 19 2026, 20:12:32) [Clang 20.1.8 ] (64-bit runtime) Python platform: macOS-26.4.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] numpy==2.4.2 [pip3] onnx==1.21.0 [pip3] torch==2.11.0 [pip3] torchaudio==2.10.0 [pip3] torchmetrics==1.8.2 [pip3] torchvision==0.26.0 [conda] numpy 2.4.2 pypi_0 pypi [conda] torch 2.11.0 pypi_0 pypi [conda] torchaudio 2.10.0 pypi_0 pypi [conda] torchmetrics 1.8.2 pypi_0 pypi [conda] torchvision 0.26.0 pypi_0 pypi

cc @jamesr66a @kulinseth @malfet @DenisVieriu97 @jhavukainen @aditvenk

extent analysis

TL;DR

Forcing the input tensor to be contiguous using the .contiguous() method before passing it to F.conv2d on MPS devices may resolve the issue of incorrect results.

Guidance

  • The issue seems to be related to the memory format of the input tensor when using channels_last format on MPS devices. Forcing the tensor to be contiguous resolves the issue.
  • To verify, compare the results of F.conv2d with and without calling .contiguous() on the input tensor, as shown in the minimal repro code provided.
  • When working with MPS devices, ensure that tensors are contiguous before performing operations that may be sensitive to memory layout, such as convolution.
  • The provided code snippet already demonstrates how to mitigate the issue by calling .contiguous() on the sliced input tensor before passing it to F.conv2d.

Example

mps_contig = F.conv2d(task_slice_mps.contiguous(), weight_mps)

Notes

This workaround may have performance implications due to the additional memory allocation and copying required by .contiguous(). The root cause of the issue seems to be related to how PyTorch handles channels_last memory format on MPS devices, and a more permanent fix might require updates to PyTorch or its interaction with MPS.

Recommendation

Apply the workaround by calling .contiguous() on the input tensor before performing F.conv2d on MPS devices, as it ensures correct results in the provided scenario.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix MPS conv2d gives incorrect results for channel-slice views of channels_last tensors [1 pull requests, 1 participants]