pytorch - 💡(How to fix) Fix [ROCm] MIOpen Gemm solver receives workspace_size=0 via legacy Find1 API [3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178839Fetched 2026-04-08 01:52:03
View on GitHub
Comments
3
Participants
4
Timeline
34
Reactions
0
Timeline (top)
mentioned ×12subscribed ×12labeled ×5commented ×3

Root Cause

Full three-bug analysis with code references: https://github.com/Peterc3-dev/miopen-gfx1150

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @ROCm

Fix Action

Workaround

Setting MIOPEN_DEBUG_CONV_GEMM=0 forces MIOpen to fall back to other solvers, bypassing the issue.

RAW_BUFFERClick to expand / collapse

Bug

When using the ROCm backend, PyTorch's ATen convolution layer calls MIOpen's legacy miopenFindConvolutionForwardAlgorithm (Find1 API). This API does not call GetWorkSpaceSize before invoking Find — it passes workspace_size=0 to MIOpen.

MIOpen's Gemm solvers require workspace memory but receive 0, causing them to be silently skipped during algorithm selection.

The correct implementation already exists in problem.cpp:506, which uses the Find2 API and does allocate workspace before calling Find. The legacy Find1 path in ATen does not.

Impact

This affects all ROCm users, not just gfx1150. However, it is most visible on gfx1150 (Radeon 890M) where Gemm solvers are the only viable backward pass path — with them skipped, backward convolutions fail entirely.

Workaround

Setting MIOPEN_DEBUG_CONV_GEMM=0 forces MIOpen to fall back to other solvers, bypassing the issue.

Environment

  • Hardware: AMD Ryzen AI 9 HX 370, Radeon 890M (gfx1150)
  • ROCm: 7.2.0
  • OS: CachyOS (Linux)
  • PyTorch: built from source (main branch)

Analysis

Full three-bug analysis with code references: https://github.com/Peterc3-dev/miopen-gfx1150

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @ROCm

extent analysis

Fix Plan

To fix the issue, we need to modify the ATen convolution layer to use the Find2 API, which allocates workspace before calling Find. Here are the steps:

  • Update the miopenFindConvolutionForwardAlgorithm call to use the Find2 API.
  • Allocate workspace memory before calling Find.
  • Pass the allocated workspace size to MIOpen.

Example code:

// Allocate workspace memory
size_t workspace_size = 0;
miopenStatus_t status = miopenGetWorkSpaceSize(handle, 
                                               xDesc, 
                                               wDesc, 
                                               convDesc, 
                                               &workspace_size);

// Allocate workspace
void* workspace = malloc(workspace_size);

// Call Find with allocated workspace
status = miopenFindConvolutionForwardAlgorithm(handle, 
                                              xDesc, 
                                              wDesc, 
                                              convDesc, 
                                              &workspace_size, 
                                              workspace, 
                                              &alg);

Verification

To verify that the fix worked, test the backward convolution on the affected hardware (Radeon 890M) with the updated code. The convolution should now succeed without silently skipping Gemm solvers.

Extra Tips

  • Make sure to free the allocated workspace memory after use to avoid memory leaks.
  • Consider adding error handling for the miopenGetWorkSpaceSize and miopenFindConvolutionForwardAlgorithm calls.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING