pytorch - ✅(Solved) Fix DISABLED test_sdpa_prev_15_gpu (main.SDPAPatternRewriterGpuTests) [2 pull requests, 1 comments, 2 participants]

pytorch2026-04-01 06:34:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

pytorch/pytorch#178974•Fetched 2026-04-08 02:22:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

guangyey

Participants

guangyey

pytorch-bot[bot]

Timeline (top)

mentioned ×24subscribed ×24labeled ×8referenced ×3

Root Cause

This test was disabled because it is failing on main branch (recent examples).

Fix Action

Fixed

Fixed by PR: [xpu][fix] Fix meta kernel for _scaled_dot_product_fused_attention_overrideable to preserve query layout (https://github.com/pytorch/pytorch/pull/178986)

PR fix notes

PR #178986: [xpu][fix] Fix meta kernel for _scaled_dot_product_fused_attention_overrideable to preserve query layout

Repository: pytorch/pytorch
Author: guangyey
State: open | merged: False
Link: https://github.com/pytorch/pytorch/pull/178986

Description (problem / solution / changelog)

Stack from ghstack (oldest at bottom):

-> #178986
#178959

Motivation

The XPU kernel _scaled_dot_product_fused_attention_overrideable allocates its output using alloc_with_matching_layout, which assigns the output the same stride ordering as the query tensor. When query is non-contiguous (e.g., after permute(0, 2, 1, 3) in SDPA fusion patterns), this produces a non-contiguous output with strides matching the permuted layout.

This PR https://github.com/pytorch/pytorch/pull/178494 changes the behavior. The meta kernel allocates outputs using torch.empty, which always returns a contiguous tensor with default strides. This mismatch can cause Inductor to raise an AssertionError during stride validation at runtime on XPU CI.

Additional Context

fix https://github.com/pytorch/pytorch/issues/178984 fix https://github.com/pytorch/pytorch/issues/178974

Changed files

torch/_meta_registrations.py (modified, +1/-1)

RAW_BUFFERClick to expand / collapse

Platforms: xpu

This test was disabled because it is failing on main branch (recent examples).

cc @mruberry @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @aakhundov @coconutruben @jataylo @gujinghui @fengyuan14 @drisspg @liangel-02 @howardzhang-cv

extent analysis

TL;DR

The test failure in test_fused_attention.py on the main branch may be resolved by re-enabling and re-running the test after investigating the recent failures.

Guidance

Investigate the recent test failures on torch-ci.com to identify the root cause.
Re-enable the disabled test and re-run it to see if the issue persists.
Collaborate with the listed team members (@mruberry, @chauhang, etc.) to discuss potential fixes or workarounds.

Notes

The provided information lacks technical details about the test failure, making it challenging to provide a specific solution.

Recommendation

Apply workaround: Re-enable and re-run the test after investigating recent failures, as this may help identify and potentially resolve the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#GPU compatibility #latency issue #model loading #dependency error #configuration error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

pytorch - ✅(Solved) Fix DISABLED test_sdpa_prev_15_gpu (main.SDPAPatternRewriterGpuTests) [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #178986: [xpu][fix] Fix meta kernel for _scaled_dot_product_fused_attention_overrideable to preserve query layout

Description (problem / solution / changelog)

Motivation

Additional Context

Changed files

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

pytorch - ✅(Solved) Fix DISABLED test_sdpa_prev_15_gpu (__main__.SDPAPatternRewriterGpuTests) [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #178986: [xpu][fix] Fix meta kernel for _scaled_dot_product_fused_attention_overrideable to preserve query layout

Description (problem / solution / changelog)

Motivation

Additional Context

Changed files

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

pytorch - ✅(Solved) Fix DISABLED test_sdpa_prev_15_gpu (main.SDPAPatternRewriterGpuTests) [2 pull requests, 1 comments, 2 participants]