pytorch - ✅(Solved) Fix MPS conv2d gives incorrect results for channel-slice views of channels_last tensors [1 pull requests, 1 participants]

pytorch2026-04-21 14:44:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#180984•Fetched 2026-04-22 07:43:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Tripton

Participants

Tripton

Assignees

malfet

Timeline (top)

mentioned ×24subscribed ×24labeled ×6referenced ×6

Fix Action

Fixed

Fixed by PR: [MPS] Fix sliced channels_last tensors handling (https://github.com/pytorch/pytorch/pull/180992)
Closed with commit: fc92844d4ac13a62049d3722ed4280f710d96f7b

PR fix notes

PR #180992: [MPS] Fix sliced channels_last tensors handling

Repository: pytorch/pytorch
Author: malfet
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/180992

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

-> #180992

By calling Tensor::suggest_memory_format(/*channels_last_strides_exact_match=*/true), that not only checks that slices look like channels last, but also that they are contiguously channels last

Fixed incorrect channels last slicing handling in conv2d, batch_norm and [adaptive_]avg_pool2d

Fixes https://github.com/pytorch/pytorch/issues/180984

Changed files

aten/src/ATen/native/ConvUtils.h (modified, +7/-1)
aten/src/ATen/native/mps/operations/Convolution.mm (modified, +4/-1)
aten/src/ATen/native/mps/operations/Normalization.mm (modified, +8/-3)
aten/src/ATen/native/mps/operations/Pooling.mm (modified, +4/-1)
test/test_mps.py (modified, +19/-0)

Code Example

import torch
import torch.nn.functional as F

shared = torch.randn(1, 2, 1, 2).contiguous(memory_format=torch.channels_last)
task_slice = shared[:, :1]
weight = torch.randn(1, 1, 1, 1)

cpu_view = F.conv2d(task_slice, weight)
cpu_contig = F.conv2d(task_slice.contiguous(), weight)
print("cpu view vs contig", (cpu_view - cpu_contig).abs().max().item())

if torch.backends.mps.is_available():
    shared_mps = shared.to("mps")
    weight_mps = weight.to("mps")
    task_slice_mps = shared_mps[:, :1]

    mps_view = F.conv2d(task_slice_mps, weight_mps)
    mps_contig = F.conv2d(task_slice_mps.contiguous(), weight_mps)
    torch.mps.synchronize()

    print("mps view vs contig", (mps_view.cpu() - mps_contig.cpu()).abs().max().item())
    print("mps contig vs cpu", (mps_contig.cpu() - cpu_contig).abs().max().item())

---

cpu view vs contig 0.0
mps view vs contig 0.30804574489593506
mps contig vs cpu 0.0

---

F.conv2d(task_slice, weight)
F.conv2d(task_slice.contiguous(), weight)

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

conv2d on MPS produces incorrect results when the input is a channel-slice view of a tensor stored in channels_last memory format.

The same operation on CPU is correct, and forcing .contiguous() on the sliced input also makes MPS correct again.

Minimal repro:

import torch
import torch.nn.functional as F

shared = torch.randn(1, 2, 1, 2).contiguous(memory_format=torch.channels_last)
task_slice = shared[:, :1]
weight = torch.randn(1, 1, 1, 1)

cpu_view = F.conv2d(task_slice, weight)
cpu_contig = F.conv2d(task_slice.contiguous(), weight)
print("cpu view vs contig", (cpu_view - cpu_contig).abs().max().item())

if torch.backends.mps.is_available():
    shared_mps = shared.to("mps")
    weight_mps = weight.to("mps")
    task_slice_mps = shared_mps[:, :1]

    mps_view = F.conv2d(task_slice_mps, weight_mps)
    mps_contig = F.conv2d(task_slice_mps.contiguous(), weight_mps)
    torch.mps.synchronize()

    print("mps view vs contig", (mps_view.cpu() - mps_contig.cpu()).abs().max().item())
    print("mps contig vs cpu", (mps_contig.cpu() - cpu_contig).abs().max().item())

Observed output on my system:

cpu view vs contig 0.0
mps view vs contig 0.30804574489593506
mps contig vs cpu 0.0

These two calls should produce the same result:

F.conv2d(task_slice, weight)
F.conv2d(task_slice.contiguous(), weight)

Versions

Collecting environment information... PyTorch version: 2.11.0 Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: macOS 26.4.1 (arm64) GCC version: Could not collect Clang version: 21.0.0 (clang-2100.0.123.102) CMake version: version 4.3.1 Libc version: N/A

Python version: 3.12.13 | packaged by Anaconda, Inc. | (main, Mar 19 2026, 20:12:32) [Clang 20.1.8 ] (64-bit runtime) Python platform: macOS-26.4.1-arm64-arm-64bit Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Apple M1 Pro

Versions of relevant libraries: [pip3] numpy==2.4.2 [pip3] onnx==1.21.0 [pip3] torch==2.11.0 [pip3] torchaudio==2.10.0 [pip3] torchmetrics==1.8.2 [pip3] torchvision==0.26.0 [conda] numpy 2.4.2 pypi_0 pypi [conda] torch 2.11.0 pypi_0 pypi [conda] torchaudio 2.10.0 pypi_0 pypi [conda] torchmetrics 1.8.2 pypi_0 pypi [conda] torchvision 0.26.0 pypi_0 pypi

cc @jamesr66a @kulinseth @malfet @DenisVieriu97 @jhavukainen @aditvenk

extent analysis

TL;DR

Forcing the input tensor to be contiguous using the .contiguous() method before passing it to F.conv2d on MPS devices may resolve the issue of incorrect results.

Guidance

The issue seems to be related to the memory format of the input tensor when using channels_last format on MPS devices. Forcing the tensor to be contiguous resolves the issue.
To verify, compare the results of F.conv2d with and without calling .contiguous() on the input tensor, as shown in the minimal repro code provided.
When working with MPS devices, ensure that tensors are contiguous before performing operations that may be sensitive to memory layout, such as convolution.
The provided code snippet already demonstrates how to mitigate the issue by calling .contiguous() on the sliced input tensor before passing it to F.conv2d.

Example

mps_contig = F.conv2d(task_slice_mps.contiguous(), weight_mps)

Notes

This workaround may have performance implications due to the additional memory allocation and copying required by .contiguous(). The root cause of the issue seems to be related to how PyTorch handles channels_last memory format on MPS devices, and a more permanent fix might require updates to PyTorch or its interaction with MPS.

Recommendation

Apply the workaround by calling .contiguous() on the input tensor before performing F.conv2d on MPS devices, as it ensures correct results in the provided scenario.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#container setup #orchestration issue #cache issue #memory leak #API versioning

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix MPS conv2d gives incorrect results for channel-slice views of channels_last tensors [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #180992: [MPS] Fix sliced channels_last tensors handling

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix MPS conv2d gives incorrect results for channel-slice views of channels_last tensors [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #180992: [MPS] Fix sliced channels_last tensors handling

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING