pytorch - ✅(Solved) Fix Performance improvement: updated backend selection for linalg.eigh on CUDA [2 pull requests, 1 comments, 2 participants]

pytorch2026-04-01 08:17:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178979•Fetched 2026-04-08 02:22:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

johannesz-codes

Participants

benediktjohannes

johannesz-codes

Timeline (top)

mentioned ×36subscribed ×36labeled ×7commented ×1

PR fix notes

PR #175403: Update eigh CUDA heuristics

Repository: pytorch/pytorch
Author: johannesz-codes
State: closed | merged: False
Link: https://github.com/pytorch/pytorch/pull/175403

Description (problem / solution / changelog)

Motivation

As described by @nikitaved in #174674 : torch.linalg.eigh is around 100x slower than CuPy for batched inputs. This was also described by @alexshtf in #174601. Therefore the backend selection heuristics developed in #53040 seem to be suboptimal with recent updates to cuSOLVER.

Solution

Update heuristics to select the fastest available backend for the input matrix (batched and single matrix).

The code I used to switch the backend for eigh can be seen in #174674. Fortunately the results are very clear:

linalg_eigh_cusolver_syevj_batched seems to be the fastest for nearly all matrices. I took a closer look at the cases where it is outperformed by linalg_eigh_cusolver_syevd and it seems this is only by 0.05ms tops.

A more detailed view for the parameters used in #174674

Therefore I propose the solution of just dispatching to linalg_eigh_cusolver_syevj_batched unconditionally.

With this change the code from #174674 is over 100x faster than current nightly (outperforming CuPy by ~8x, exact numbers in the issue.)

After this change, syevj is no longer selected by any code path. Therefore I removed it from CUDASolver.cpp/h.

Tested using test/test_linalg.py. Observing failure on TestLinalgCUDA.test_tensorinv_cuda_float32. Failure is also present on current nightly (2.12.0.dev20260219+cu128), so I guess it is unrelated.

Fixes https://github.com/pytorch/pytorch/issues/175585

CC: @nikitaved @lezcano

cc @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano

Changed files

aten/src/ATen/native/cuda/linalg/BatchLinearAlgebraLib.cpp (modified, +3/-74)
aten/src/ATen/native/cuda/linalg/CUDASolver.cpp (modified, +0/-164)
aten/src/ATen/native/cuda/linalg/CUDASolver.h (modified, +0/-45)

RAW_BUFFERClick to expand / collapse

New Feature for Release

This issue tracks the updated selection of cuSOLVER APIs for solving symmetric/hermetian eigenvalue problems in the PyTorch 2.12 release notes.

Point(s) of contact

johannesz-codes, also available on pytorch-slack

Release Mode (pytorch/pytorch features only)

In-tree

Out-Of-Tree Repo

No response

Description and value to the user

The backend selecktion for solving symmetric/hermetian eigenvalue problems on CUDA devices has been updated. For batched inputs this leads to substantial performance gains (up to 100x) over the existing backend selection. Solves lacking performance in comparison to CuPy.

Link to design doc, GitHub issues, past submissions, etc

The performance regression in comparison to cuPy was brought up in:

Changes have landet in 175403 (already merged)

What feedback adopters have provided

No response

Plan for documentations / tutorials

Tutorial exists

Additional context for tutorials

No change to user facing behaviour, covered by existing materials

Marketing/Blog Coverage

Yes

Are you requesting other marketing assistance with this feature?

No response

Release Version

PyTorch 2.12

OS / Platform / Compute Coverage

GPU only, CUDA only

Testing Support (CI, test cases, etc..)

Covered by existing tests in https://github.com/pytorch/pytorch/blob/main/test/test_linalg.py. Extended testing regarding performance has been conducted. See 175403 and 174674

cc @jerryzh168 @ptrblck @msaroufim @eqy @tinglvv @nWEIdia @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano

extent analysis

TL;DR

To leverage the updated cuSOLVER APIs for solving symmetric/hermetian eigenvalue problems, ensure you are using PyTorch 2.12 or later.

Guidance

Verify that your PyTorch version is 2.12 or newer to take advantage of the performance improvements for symmetric/hermetian eigenvalue problems on CUDA devices.
Review the changes and testing conducted in pull request 175403 for more details on the updates and their impact.
If experiencing performance issues related to eigenvalue problems, check if they are resolved by updating to PyTorch 2.12, considering the fixes and improvements made in relation to issues 174674 and 174601.
Utilize the existing tests in test_linalg.py as a reference for ensuring compatibility and performance.

Notes

The improvements are specifically for batched inputs on CUDA devices, offering substantial performance gains over previous backend selections.

Recommendation

Apply the workaround by upgrading to PyTorch 2.12 or later to leverage the updated cuSOLVER APIs for improved performance in solving symmetric/hermetian eigenvalue problems.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

pytorch - ✅(Solved) Fix Performance improvement: updated backend selection for linalg.eigh on CUDA [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

PR fix notes

PR #175403: Update eigh CUDA heuristics

Description (problem / solution / changelog)

Motivation

Solution

Changed files

New Feature for Release

Point(s) of contact

Release Mode (pytorch/pytorch features only)

Out-Of-Tree Repo

Description and value to the user

Link to design doc, GitHub issues, past submissions, etc

What feedback adopters have provided

Plan for documentations / tutorials

Additional context for tutorials

Marketing/Blog Coverage

Are you requesting other marketing assistance with this feature?

Release Version

OS / Platform / Compute Coverage

Testing Support (CI, test cases, etc..)

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING