pytorch - ✅(Solved) Fix [Silent Bug] torch.linalg.qr allows the same tensor for both Q and R in the 'out' argument when the input is square [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#180377Fetched 2026-04-16 06:34:55
View on GitHub
Comments
1
Participants
2
Timeline
39
Reactions
0
Participants
Timeline (top)
mentioned ×14subscribed ×14labeled ×8commented ×1

Error Message

import torch

def qr_out(device): X = torch.randn(2, 2, device=device) try: B = torch.zeros_like(X) torch.linalg.qr(X, out=(B, B)) print("Ran without issue, B = R but Q is lost") except RuntimeError as e: print(f"This should result in an error: {e}")

qr_out()

Root Cause

If X is not square, this will throw an error because of a size mismatch.

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 57 bits virtual Byte Order: Little Endian CPU(s): 160 On-line CPU(s) list: 0-159 Vendor ID: GenuineIntel Model name: Intel Xeon Processor (SapphireRapids) CPU family: 6 Model: 143 Thread(s) per core: 2 Core(s) per socket: 40 Socket(s): 2 Stepping: 4 BogoMIPS: 4200.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int8 arch_capabilities Virtualization: VT-x Hypervisor vendor: KVM Virtualization type: full L1d cache: 5 MiB (160 instances) L1i cache: 5 MiB (160 instances) L2 cache: 320 MiB (80 instances) L3 cache: 32 MiB (2 instances) NUMA node(s): 2 NUMA node0 CPU(s): 0-79 NUMA node1 CPU(s): 80-159 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Mitigation; Aligned branch/return thunks Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Unknown: No mitigations Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

PR fix notes

PR #180420: linalg.qr: reject overlapping or aliased Q and R in out=

Description (problem / solution / changelog)

Previously, passing the same tensor for both out arguments of torch.linalg.qr could succeed on square matrices and produce wrong results.

This PR adds a check at the start of TORCH_META_FUNC(linalg_qr) that checks at::get_overlap_status on the out tensors if they exist. That runs before set_output_strided, so it applies to the real user buffers. The error message is specific to linalg.qr and out=(Q, R).

Currently, there will not be an error if mode='r' and the out tensor will produce the R tensor of the QR factorization. I am open to throwing an error for mode='r' too if there is a good reason.

Fixes #180377

Changed files

  • aten/src/ATen/native/BatchLinearAlgebra.cpp (modified, +12/-0)
  • test/test_linalg.py (modified, +13/-0)

Code Example

import torch

def qr_out(device):
    X = torch.randn(2, 2, device=device)
    try:
        B = torch.zeros_like(X)
        torch.linalg.qr(X, out=(B, B))
        print("Ran without issue, B = R but Q is lost")
    except RuntimeError as e:
        print(f"This should result in an error: {e}")

qr_out()

---

Ran without issue, B = R but Q is lost

---

PyTorch version: 2.12.0a0+gitd7d0482
Is debug build: True
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Fedora Linux 41 (Container Image) (x86_64)
GCC version: (GCC) 14.3.1 20251022 (Red Hat 14.3.1-4)
Clang version: Could not collect
CMake version: version 4.3.1
Libc version: glibc-2.40

Python version: 3.12.12 (main, Oct 10 2025, 00:00:00) [GCC 14.3.1 20250808 (Red Hat 14.3.1-3)] (64-bit runtime)
Python platform: Linux-5.14.0-615.el9.x86_64-x86_64-with-glibc2.40
Is CUDA available: True
CUDA runtime version: 12.8.93
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA H200
Nvidia driver version: 580.82.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_adv.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_cnn.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_precompiled.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_runtime_compiled.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_graph.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_heuristic.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_ops.so.9.6.0
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           46 bits physical, 57 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  160
On-line CPU(s) list:                     0-159
Vendor ID:                               GenuineIntel
Model name:                              Intel Xeon Processor (SapphireRapids)
CPU family:                              6
Model:                                   143
Thread(s) per core:                      2
Core(s) per socket:                      40
Socket(s):                               2
Stepping:                                4
BogoMIPS:                                4200.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int8 arch_capabilities
Virtualization:                          VT-x
Hypervisor vendor:                       KVM
Virtualization type:                     full
L1d cache:                               5 MiB (160 instances)
L1i cache:                               5 MiB (160 instances)
L2 cache:                                320 MiB (80 instances)
L3 cache:                                32 MiB (2 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-79
NUMA node1 CPU(s):                       80-159
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Mitigation; Aligned branch/return thunks
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Unknown: No mitigations
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                     Not affected
Vulnerability Tsx async abort:           Not affected

Versions of relevant libraries:
[pip3] intel-cmplr-lib-ur==2025.3.3
[pip3] intel-openmp==2025.3.3
[pip3] mkl-include==2025.3.1
[pip3] mkl-static==2025.3.1
[pip3] numpy==2.4.4
[pip3] nvidia-cusparselt-cu12==0.8.1
[pip3] onemkl-license==2025.3.1
[pip3] optree==0.19.0
[pip3] tbb==2022.3.1
[pip3] tbb-devel==2022.3.1
[pip3] tcmlib==1.4.1
[pip3] torch==2.12.0a0+gitd7d0482
[pip3] triton==3.7.0+git282c8251
[pip3] umf==1.0.3
[conda] Could not collect
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

torch.linalg.qr(x, out=(Q, R)) allows for the same tensor to be used for both the Q and R tensors in the out argument if x is square.

example:

import torch

def qr_out(device):
    X = torch.randn(2, 2, device=device)
    try:
        B = torch.zeros_like(X)
        torch.linalg.qr(X, out=(B, B))
        print("Ran without issue, B = R but Q is lost")
    except RuntimeError as e:
        print(f"This should result in an error: {e}")

qr_out()

produces

Ran without issue, B = R but Q is lost

for both device='cpu' and device='cuda'.

If X is not square, this will throw an error because of a size mismatch.

I believe it should always throw an error, or at the very least give a warning, whenever both tensors in the out tuple are aliases of each other.

One nuance is the case when mode='r', where usually Q is returned as an empty tensor and only R is computed. This can be handled as a seperate case and just fill the output as R (as it is now) or torch could throw an error here too.

If anyone has any thoughts on this, please let me know and I can add a this check.

(related issue #154356)

Versions

PyTorch version: 2.12.0a0+gitd7d0482
Is debug build: True
CUDA used to build PyTorch: 12.8
ROCM used to build PyTorch: N/A

OS: Fedora Linux 41 (Container Image) (x86_64)
GCC version: (GCC) 14.3.1 20251022 (Red Hat 14.3.1-4)
Clang version: Could not collect
CMake version: version 4.3.1
Libc version: glibc-2.40

Python version: 3.12.12 (main, Oct 10 2025, 00:00:00) [GCC 14.3.1 20250808 (Red Hat 14.3.1-3)] (64-bit runtime)
Python platform: Linux-5.14.0-615.el9.x86_64-x86_64-with-glibc2.40
Is CUDA available: True
CUDA runtime version: 12.8.93
CUDA_MODULE_LOADING set to: 
GPU models and configuration: GPU 0: NVIDIA H200
Nvidia driver version: 580.82.07
cuDNN version: Probably one of the following:
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_adv.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_cnn.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_precompiled.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_engines_runtime_compiled.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_graph.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_heuristic.so.9.6.0
/usr/local/cuda-12.8/targets/x86_64-linux/lib/libcudnn_ops.so.9.6.0
Is XPU available: False
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Caching allocator config: N/A

CPU:
Architecture:                            x86_64
CPU op-mode(s):                          32-bit, 64-bit
Address sizes:                           46 bits physical, 57 bits virtual
Byte Order:                              Little Endian
CPU(s):                                  160
On-line CPU(s) list:                     0-159
Vendor ID:                               GenuineIntel
Model name:                              Intel Xeon Processor (SapphireRapids)
CPU family:                              6
Model:                                   143
Thread(s) per core:                      2
Core(s) per socket:                      40
Socket(s):                               2
Stepping:                                4
BogoMIPS:                                4200.00
Flags:                                   fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid tsc_known_freq pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault ssbd ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves avx_vnni avx512_bf16 wbnoinvd arat vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b fsrm md_clear serialize tsxldtrk amx_bf16 avx512_fp16 amx_tile amx_int8 arch_capabilities
Virtualization:                          VT-x
Hypervisor vendor:                       KVM
Virtualization type:                     full
L1d cache:                               5 MiB (160 instances)
L1i cache:                               5 MiB (160 instances)
L2 cache:                                320 MiB (80 instances)
L3 cache:                                32 MiB (2 instances)
NUMA node(s):                            2
NUMA node0 CPU(s):                       0-79
NUMA node1 CPU(s):                       80-159
Vulnerability Gather data sampling:      Not affected
Vulnerability Indirect target selection: Mitigation; Aligned branch/return thunks
Vulnerability Itlb multihit:             Not affected
Vulnerability L1tf:                      Not affected
Vulnerability Mds:                       Not affected
Vulnerability Meltdown:                  Not affected
Vulnerability Mmio stale data:           Unknown: No mitigations
Vulnerability Reg file data sampling:    Not affected
Vulnerability Retbleed:                  Not affected
Vulnerability Spec rstack overflow:      Not affected
Vulnerability Spec store bypass:         Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1:                Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:                Mitigation; Enhanced / Automatic IBRS; IBPB conditional; PBRSB-eIBRS SW sequence; BHI SW loop, KVM SW loop
Vulnerability Srbds:                     Not affected
Vulnerability Tsx async abort:           Not affected

Versions of relevant libraries:
[pip3] intel-cmplr-lib-ur==2025.3.3
[pip3] intel-openmp==2025.3.3
[pip3] mkl-include==2025.3.1
[pip3] mkl-static==2025.3.1
[pip3] numpy==2.4.4
[pip3] nvidia-cusparselt-cu12==0.8.1
[pip3] onemkl-license==2025.3.1
[pip3] optree==0.19.0
[pip3] tbb==2022.3.1
[pip3] tbb-devel==2022.3.1
[pip3] tcmlib==1.4.1
[pip3] torch==2.12.0a0+gitd7d0482
[pip3] triton==3.7.0+git282c8251
[pip3] umf==1.0.3
[conda] Could not collect

cc @malfet @jianyuh @nikitaved @mruberry @walterddr @xwang233 @Lezcano

extent analysis

TL;DR

The issue can be addressed by adding a check to ensure that the Q and R tensors in the out argument of torch.linalg.qr are not aliases of each other.

Guidance

  • Verify that the issue is reproducible by running the provided example code with different input sizes and devices.
  • Check the PyTorch documentation to see if there are any existing checks or warnings for this specific case.
  • Consider adding a check in the torch.linalg.qr function to raise an error or warning when the Q and R tensors are aliases of each other.
  • If the mode='r' case is to be handled separately, add a conditional check to handle this case accordingly.

Example

import torch

def qr_out(device):
    X = torch.randn(2, 2, device=device)
    try:
        B = torch.zeros_like(X)
        if B.data_ptr() == B.data_ptr():  # Check if Q and R are aliases
            raise RuntimeError("Q and R cannot be aliases of each other")
        torch.linalg.qr(X, out=(B, B))
        print("Ran without issue, B = R but Q is lost")
    except RuntimeError as e:
        print(f"This should result in an error: {e}")

qr_out()

Notes

The provided example code only checks if the Q and R tensors are aliases of each other, but does not handle the case where they are not aliases but still have overlapping memory. Additional checks may be necessary to handle this case.

Recommendation

Apply a workaround by adding a check to ensure that the Q and R tensors are not aliases of each other, as shown in the example code. This will prevent the issue from occurring, but may not be a permanent fix and should be revisited when the PyTorch library is updated.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix [Silent Bug] torch.linalg.qr allows the same tensor for both Q and R in the 'out' argument when the input is square [1 pull requests, 1 comments, 2 participants]