pytorch - ✅(Solved) Fix [CPU] Cholesky decomposition broken for N>64 [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178769Fetched 2026-04-08 01:52:16
View on GitHub
Comments
2
Participants
2
Timeline
83
Reactions
0
Author
Participants
Timeline (top)
mentioned ×32subscribed ×32labeled ×10referenced ×4

PR fix notes

PR #179154: Fix cholesky(upper=True) on macOS for matrices larger than block size

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

  • -> #179154

Apple's Accelerate LAPACK writes into the unreferenced triangle of the matrix for sizes exceeding its internal block size (e.g. n > 64). This violates the LAPACK spec, which states that "not referenced" elements are "never read, written to, or otherwise accessed." Work around this by applying triu_/tril_ cleanup after cholesky_stub on macOS, matching what is already done for non-CPU devices.

Also relax MPS cholesky test tolerance (atol 2e-5 -> 3e-5) to account for expected numerical differences between CPU and MPS implementations.

Fixes https://github.com/pytorch/pytorch/issues/178769 Fixes https://github.com/pytorch/pytorch/issues/157364

Changed files

  • aten/src/ATen/native/BatchLinearAlgebra.cpp (modified, +15/-1)
  • test/test_linalg.py (modified, +16/-0)
  • test/test_mps.py (modified, +1/-1)

Code Example

import torch

batch_dims = (1,)
matrix_size = 65
device = "cpu"

def check_cholesky(device):
    A = torch.randn(
        *(batch_dims + (matrix_size, matrix_size)), dtype=torch.float32, device=device
    )
    pd_matrix = A @ A.mT + torch.eye(matrix_size, dtype=torch.float32, device=device)
    pd_matrix = pd_matrix.squeeze(0)
    U = torch.linalg.cholesky(pd_matrix, upper=True)
    reconstructed = U.mT @ U
    print("Is upper triangular:", torch.allclose(U, torch.triu(U)))
    print("Reconstruction matches original:", torch.allclose(pd_matrix, reconstructed, atol=1e-4))

check_cholesky("cpu")

---

Is upper triangular: False
Reconstruction matches original: False

---

Collecting environment information...
PyTorch version: 2.12.0.dev20260330
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 26.3 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.4.4.1)
CMake version: version 4.1.2
Libc version: N/A

Python version: 3.13.2 (main, Mar 17 2025, 21:26:38) [Clang 20.1.0 ] (64-bit runtime)
Python platform: macOS-26.3-arm64-arm-64bit-Mach-O
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Apple M4

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Cholesky decomposition is wrong for N > 64 on CPU

import torch

batch_dims = (1,)
matrix_size = 65
device = "cpu"

def check_cholesky(device):
    A = torch.randn(
        *(batch_dims + (matrix_size, matrix_size)), dtype=torch.float32, device=device
    )
    pd_matrix = A @ A.mT + torch.eye(matrix_size, dtype=torch.float32, device=device)
    pd_matrix = pd_matrix.squeeze(0)
    U = torch.linalg.cholesky(pd_matrix, upper=True)
    reconstructed = U.mT @ U
    print("Is upper triangular:", torch.allclose(U, torch.triu(U)))
    print("Reconstruction matches original:", torch.allclose(pd_matrix, reconstructed, atol=1e-4))

check_cholesky("cpu")

Gives:

Is upper triangular: False
Reconstruction matches original: False

Versions

Both on 2.11 and nightly

Collecting environment information...
PyTorch version: 2.12.0.dev20260330
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A

OS: macOS 26.3 (arm64)
GCC version: Could not collect
Clang version: 17.0.0 (clang-1700.4.4.1)
CMake version: version 4.1.2
Libc version: N/A

Python version: 3.13.2 (main, Mar 17 2025, 21:26:38) [Clang 20.1.0 ] (64-bit runtime)
Python platform: macOS-26.3-arm64-arm-64bit-Mach-O
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Apple M4

Versions of relevant libraries:
[pip3] Could not collect
[conda] Could not collect

cc @ezyang @gchanan @kadeng @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @malfet @aditvenk @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano @snadampal @milpuz01 @nikhil-arm @fadara01 @nWEIdia

extent analysis

Fix Plan

To fix the incorrect Cholesky decomposition for matrices larger than 64x64 on CPU, we need to ensure that the input matrix is symmetric and positive-definite.

Here are the steps to fix the issue:

  • Check if the input matrix A is symmetric.
  • Ensure that the matrix pd_matrix is positive-definite by adding a sufficient multiple of the identity matrix.
  • Use a stable algorithm for Cholesky decomposition.

Code Changes

import torch

batch_dims = (1,)
matrix_size = 65
device = "cpu"

def check_cholesky(device):
    A = torch.randn(
        *(batch_dims + (matrix_size, matrix_size)), dtype=torch.float32, device=device
    )
    # Ensure A is symmetric
    A = (A + A.mT) / 2
    pd_matrix = A @ A.mT + 1e-6 * torch.eye(matrix_size, dtype=torch.float32, device=device)
    pd_matrix = pd_matrix.squeeze(0)
    U = torch.linalg.cholesky(pd_matrix, upper=True)
    reconstructed = U.mT @ U
    print("Is upper triangular:", torch.allclose(U, torch.triu(U)))
    print("Reconstruction matches original:", torch.allclose(pd_matrix, reconstructed, atol=1e-4))

check_cholesky("cpu")

Verification

To verify that the fix worked, run the check_cholesky function and check the output. The output should indicate that the matrix U is upper triangular and that the reconstruction matches the original matrix.

Extra Tips

  • Always ensure that the input matrix is symmetric and positive-definite before performing Cholesky decomposition.
  • Use a stable algorithm for Cholesky decomposition to avoid numerical instability.
  • Add a small multiple of the identity matrix to the input matrix to ensure that it is positive-definite.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix [CPU] Cholesky decomposition broken for N>64 [1 pull requests, 2 comments, 2 participants]