pytorch - ✅(Solved) Fix MPS: `mm`/`addmm` SEGFAULTS on M4 if 2nd matrix is padded with `LORADOWN GEMV` [1 pull requests, 4 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178056Fetched 2026-04-08 01:12:26
View on GitHub
Comments
4
Participants
2
Timeline
63
Reactions
0
Author
Participants
Assignees
Timeline (top)
mentioned ×18subscribed ×18labeled ×10referenced ×5

Fix Action

Fixed

PR fix notes

PR #178203: Detect mm padding overflow and incorrect alignment conditions and dispatch to metal_mm

Description (problem / solution / changelog)

Fixes issue #178056

Rather inconveniently MPS matmul may end up calling a kernel implementation that has issues. This change detects the conditions that would end up in that kernel causing either silent correctness issues or an assertion, and dispatches to the metal mm kernel instead.

Added a regression test against the original issue and correctness testing w.r.t CPU since that seems to have been an issue also.

Changed files

  • aten/src/ATen/native/mps/operations/LinearAlgebra.mm (modified, +14/-0)
  • test/test_mps.py (modified, +27/-0)

Code Example

import torch                                                                                                                                                               
                                              
device, dtype = "mps", torch.half                                                                                                                                          
                                                                                                                                                                             
# Crashes: addmm where mat2 is [1025, 2] with column stride >= 1041
y_padded = torch.randn(2, 1041, device=device, dtype=dtype)
y_t = y_padded[:, :1025].t()
bias = torch.randn(2, device=device, dtype=dtype)                                                                                         
x = torch.randn(2, 1025, device=device, dtype=dtype)
torch.addmm(bias, x, y_t)

---

/AppleInternal/Library/BuildRoots/4~CH4ougB1IHmPTvF3hYCPXV_GPX9Jt1mOhQ_sqQw/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayMatrixMultiplication.mm:644: failed assertion `LORADOWN GEMV Kernel - vectorRowPadElements will overflow its fc bit allocation.'
zsh: abort      python
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

import torch                                                                                                                                                               
                                              
device, dtype = "mps", torch.half                                                                                                                                          
                                                                                                                                                                             
# Crashes: addmm where mat2 is [1025, 2] with column stride >= 1041
y_padded = torch.randn(2, 1041, device=device, dtype=dtype)
y_t = y_padded[:, :1025].t()
bias = torch.randn(2, device=device, dtype=dtype)                                                                                         
x = torch.randn(2, 1025, device=device, dtype=dtype)
torch.addmm(bias, x, y_t)

Crashes with

/AppleInternal/Library/BuildRoots/4~CH4ougB1IHmPTvF3hYCPXV_GPX9Jt1mOhQ_sqQw/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSNDArray/Kernels/MPSNDArrayMatrixMultiplication.mm:644: failed assertion `LORADOWN GEMV Kernel - vectorRowPadElements will overflow its fc bit allocation.'
zsh: abort      python

Discovered while running python test_torchinductor.py -v -k test_weight_norm_bwd_mps

Versions

2.10/2.11/nightly

cc @ezyang @gchanan @kadeng @msaroufim @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano @kulinseth @DenisVieriu97 @jhavukainen @aditvenk

extent analysis

Fix Plan

The fix involves padding the matrix to ensure the column stride is compatible with the Metal Performance Shaders (MPS) kernel requirements.

Step-by-Step Solution

  • Identify the minimum padding required to avoid the overflow error.
  • Pad the y_t matrix to meet the minimum padding requirement.
  • Update the torch.addmm call to use the padded y_t matrix.

Example Code

import torch

device, dtype = "mps", torch.half

# Calculate the minimum padding required
min_padding = 1041

# Create the padded y_t matrix
y_padded = torch.randn(2, min_padding, device=device, dtype=dtype)
y_t = y_padded[:, :1025].t()

# Create the bias and x matrices
bias = torch.randn(2, device=device, dtype=dtype)
x = torch.randn(2, 1025, device=device, dtype=dtype)

# Update the torch.addmm call to use the padded y_t matrix
torch.addmm(bias, x, y_t)

Alternatively, you can also use the torch.nn.functional.pad function to pad the y_t matrix:

y_t_padded = torch.nn.functional.pad(y_t, (0, min_padding - 1025), mode='constant', value=0)
torch.addmm(bias, x, y_t_padded)

Verification

Verify that the fix works by running the test_torchinductor.py test script without encountering the crash error.

Extra Tips

  • Ensure that the padding size is calculated correctly to avoid unnecessary memory allocation.
  • Consider adding a check to handle cases where the input matrix size is larger than the minimum padding requirement.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING