pytorch - ✅(Solved) Fix MPS: `mm`/`addmm` SEGFAULTS on M4 if 2nd matrix is padded with `LORADOWN GEMV` [1 pull requests, 4 comments, 2 participants]

pytorch2026-03-21 16:12:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178056•Fetched 2026-04-08 01:12:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Assignees

Timeline (top)

mentioned ×18subscribed ×18labeled ×10referenced ×5

Fix Action

Fixed

Fixed by PR: Detect mm padding overflow and incorrect alignment conditions and dispatch to metal_mm (https://github.com/pytorch/pytorch/pull/178203)

PR fix notes

PR #178203: Detect mm padding overflow and incorrect alignment conditions and dispatch to metal_mm

Repository: pytorch/pytorch
Author: jhavukainen
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/178203

Description (problem / solution / changelog)

Fixes issue #178056

Rather inconveniently MPS matmul may end up calling a kernel implementation that has issues. This change detects the conditions that would end up in that kernel causing either silent correctness issues or an assertion, and dispatches to the metal mm kernel instead.

Added a regression test against the original issue and correctness testing w.r.t CPU since that seems to have been an issue also.

Changed files

aten/src/ATen/native/mps/operations/LinearAlgebra.mm (modified, +14/-0)
test/test_mps.py (modified, +27/-0)

Code Example

import torch                                                                                                                                                               
                                              
device, dtype = "mps", torch.half                                                                                                                                          
                                                                                                                                                                             
# Crashes: addmm where mat2 is [1025, 2] with column stride >= 1041
y_padded = torch.randn(2, 1041, device=device, dtype=dtype)
y_t = y_padded[:, :1025].t()
bias = torch.randn(2, device=device, dtype=dtype)                                                                                         
x = torch.randn(2, 1025, device=device, dtype=dtype)
torch.addmm(bias, x, y_t)

---

/AppleInternal/Library/BuildRoots/4~CH4ougB1IHmPTvF3hYCPXV_GPX9Jt1mOhQ_sqQw/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayMatrixMultiplication.mm:644: failed assertion `LORADOWN GEMV Kernel - vectorRowPadElements will overflow its fc bit allocation.'
zsh: abort      python

RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch                                                                                                                                                               
                                              
device, dtype = "mps", torch.half                                                                                                                                          
                                                                                                                                                                             
# Crashes: addmm where mat2 is [1025, 2] with column stride >= 1041
y_padded = torch.randn(2, 1041, device=device, dtype=dtype)
y_t = y_padded[:, :1025].t()
bias = torch.randn(2, device=device, dtype=dtype)                                                                                         
x = torch.randn(2, 1025, device=device, dtype=dtype)
torch.addmm(bias, x, y_t)

Crashes with

/AppleInternal/Library/BuildRoots/4~CH4ougB1IHmPTvF3hYCPXV_GPX9Jt1mOhQ_sqQw/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayMatrixMultiplication.mm:644: failed assertion `LORADOWN GEMV Kernel - vectorRowPadElements will overflow its fc bit allocation.'
zsh: abort      python

Discovered while running python test_torchinductor.py -v -k test_weight_norm_bwd_mps

Versions

2.10/2.11/nightly

cc @ezyang @gchanan @kadeng @msaroufim @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano @kulinseth @DenisVieriu97 @jhavukainen @aditvenk

extent analysis

Fix Plan

The fix involves padding the matrix to ensure the column stride is compatible with the Metal Performance Shaders (MPS) kernel requirements.

Step-by-Step Solution

Identify the minimum padding required to avoid the overflow error.
Pad the y_t matrix to meet the minimum padding requirement.
Update the torch.addmm call to use the padded y_t matrix.

Example Code

import torch

device, dtype = "mps", torch.half

# Calculate the minimum padding required
min_padding = 1041

# Create the padded y_t matrix
y_padded = torch.randn(2, min_padding, device=device, dtype=dtype)
y_t = y_padded[:, :1025].t()

# Create the bias and x matrices
bias = torch.randn(2, device=device, dtype=dtype)
x = torch.randn(2, 1025, device=device, dtype=dtype)

# Update the torch.addmm call to use the padded y_t matrix
torch.addmm(bias, x, y_t)

Alternatively, you can also use the torch.nn.functional.pad function to pad the y_t matrix:

y_t_padded = torch.nn.functional.pad(y_t, (0, min_padding - 1025), mode='constant', value=0)
torch.addmm(bias, x, y_t_padded)

Verification

Verify that the fix works by running the test_torchinductor.py test script without encountering the crash error.

Extra Tips

Ensure that the padding size is calculated correctly to avoid unnecessary memory allocation.
Consider adding a check to handle cases where the input matrix size is larger than the minimum padding requirement.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#database connection #vector store #embedding generation #cache error #pipeline error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix MPS: `mm`/`addmm` SEGFAULTS on M4 if 2nd matrix is padded with `LORADOWN GEMV` [1 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #178203: Detect mm padding overflow and incorrect alignment conditions and dispatch to metal_mm

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

Fix Plan

Step-by-Step Solution

Example Code

Verification

Extra Tips

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix MPS: `mm`/`addmm` SEGFAULTS on M4 if 2nd matrix is padded with `LORADOWN GEMV` [1 pull requests, 4 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Fixed

PR fix notes

PR #178203: Detect mm padding overflow and incorrect alignment conditions and dispatch to metal_mm

Description (problem / solution / changelog)

Changed files

Code Example

🐛 Describe the bug

Versions

extent analysis

Fix Plan

Step-by-Step Solution

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING