pytorch - 💡(How to fix) Fix bad-free in torch.sparse.spdiags due to missing bounds check on diagonal offsets [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#178089Fetched 2026-04-08 01:16:47
View on GitHub
Comments
1
Participants
2
Timeline
55
Reactions
0
Author
Participants
Timeline (top)
mentioned ×21subscribed ×21labeled ×9unlabeled ×3

A malformed-input sequence in torch.sparse.spdiags can lead to an ASan-detected bad-free during tensor destruction instead of a clean Python exception, indicating a potential memory-safety issue reachable from Python.

Error Message

import torch

try: dtype = torch.float16 shape = (0, 0) diags = torch.tensor([-8], dtype=torch.int64) data_mat = torch.zeros((1, 0), dtype=dtype)

print("=== graceful error case ===")
print("dtype =", dtype)
print("shape =", shape)
print("diags =", diags.tolist())
print("data_mat.shape =", tuple(data_mat.shape))

out = torch.sparse.spdiags(data_mat, diags, shape)
print("OK:", out)

except Exception as e: print("[CAUGHT PYTHON EXCEPTION]") print("type =", type(e).name) print("msg =", e)

dtype = torch.float16 shape = (54, 63) diags = torch.tensor([-65, 10], dtype=torch.int64) data_mat = torch.zeros((2, 54), dtype=dtype)

torch.sparse.spdiags(data_mat, diags, shape)

Root Cause

The root cause appears to be insufficient input validation in torch.sparse.spdiags, mainly in two respects.

  1. Missing bounds checks for diagonal offsets.
    The spdiags function does not properly validate whether the values in diags fall within the valid range. For a matrix of shape (m, n), a valid diagonal offset should satisfy -(m - 1) <= d <= (n - 1). When diags=[-65] is used with shape=(54, 63), the absolute value of the offset already exceeds the matrix row bound, and this should be rejected at the function entry point.

  2. Resource leakage or inconsistent state on the exception path.
    As shown by the ASan stack trace, the destruction chain is:

    SparseTensorImpl::~SparseTensorImpl()
  -> TensorBase::~TensorBase()
    -> TensorImpl::~TensorImpl()
      -> StorageImpl::~StorageImpl()
        -> DataPtr::~DataPtr()
          -> UniqueVoidPtr::~UniqueVoidPtr()
            -> free()  // bad-free

This suggests that when the first call throws an exception, a SparseTensorImpl object or its internal Storage may already have been partially constructed. In other words, the object structure itself may have been allocated, but the internal DataPtr ended up pointing to an invalid address. ASan reports that this address is located 72 bytes past a previously allocated 152-byte region, indicating an out-of-bounds offset. When Python exception handling later causes the object to be reclaimed by GC, the destructor attempts to free() a pointer that was never correctly returned by malloc(). The core issue is that the C++ implementation of spdiags appears to compute index/value tensor sizes and pass invalid dimensions into lower-level tensor construction logic before fully validating them. Once an exception is thrown, the implementation does not appear to properly roll back the state of partially constructed objects.

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: AuthenticAMD Model name: AMD Ryzen Threadripper PRO 5995WX 64-Cores CPU family: 25 Model: 8 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU max MHz: 2700.0000 CPU min MHz: 1800.0000 BogoMIPS: 5389.77 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca Virtualization: AMD-V L1d cache: 2 MiB (64 instances) L1i cache: 2 MiB (64 instances) L2 cache: 32 MiB (64 instances) L3 cache: 256 MiB (8 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-127 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Code Example

import torch

try:
    dtype = torch.float16
    shape = (0, 0)
    diags = torch.tensor([-8], dtype=torch.int64)
    data_mat = torch.zeros((1, 0), dtype=dtype)

    print("=== graceful error case ===")
    print("dtype =", dtype)
    print("shape =", shape)
    print("diags =", diags.tolist())
    print("data_mat.shape =", tuple(data_mat.shape))

    out = torch.sparse.spdiags(data_mat, diags, shape)
    print("OK:", out)

except Exception as e:
    print("[CAUGHT PYTHON EXCEPTION]")
    print("type =", type(e).__name__)
    print("msg  =", e)
    

dtype = torch.float16
shape = (54, 63)
diags = torch.tensor([-65, 10], dtype=torch.int64)
data_mat = torch.zeros((2, 54), dtype=dtype)

torch.sparse.spdiags(data_mat, diags, shape)

---

=== graceful error case ===
dtype = torch.float16
shape = (0, 0)
diags = [-8]
data_mat.shape = (1, 0)
[CAUGHT PYTHON EXCEPTION]
type = RuntimeError
msg  = Trying to create tensor with negative dimension -8: [2, -8]

---

==2444454==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x60e000305a00 in thread T0
    #0 0x7ffff75a1aa7 in __interceptor_free.part.0
    #1 0x7ffff254e071 in std::unique_ptr<void, void (*)(void*)>::~unique_ptr()
    #2 0x7ffff254e071 in c10::detail::UniqueVoidPtr::~UniqueVoidPtr()
    #3 0x7ffff254e071 in c10::DataPtr::~DataPtr()
    #4 0x7ffff254e071 in c10::StorageImpl::~StorageImpl()
    #5 0x7ffff254e168 in c10::StorageImpl::~StorageImpl()
    #6 0x7fffd0228acf in c10::intrusive_ptr<c10::StorageImpl, ...>::reset_()
    #7 0x7fffd0228acf in c10::intrusive_ptr<c10::StorageImpl, ...>::~intrusive_ptr()
    #8 0x7fffd0228acf in c10::Storage::~Storage()
    #9 0x7fffd0228acf in c10::TensorImpl::~TensorImpl()
    #10 0x7fffd0228f84 in c10::TensorImpl::~TensorImpl()
    #11 0x7fffd971b1a0 in c10::intrusive_ptr<c10::TensorImpl, ...>::reset_()
    #12 0x7fffd971b1a0 in c10::intrusive_ptr<c10::TensorImpl, ...>::~intrusive_ptr()
    #13 0x7fffd971b1a0 in at::TensorBase::~TensorBase()
    #14 0x7fffd971b1a0 in at::SparseTensorImpl::~SparseTensorImpl()
    #15 0x7fffd971b3c8 in at::SparseTensorImpl::~SparseTensorImpl()
    #16 0x7ffff26eec6b in c10::MaybeOwned<at::Tensor>::operator=(...)
    #17 0x7ffff26eec6b in THPVariable_subclass_clear(THPVariable*)
    #18 0x7ffff26ed9c2 in THPVariable_subclass_dealloc(_object*)
    ...
SUMMARY: AddressSanitizer: bad-free in __interceptor_free.part.0

---

0x60e000305a00 is located 72 bytes after 152-byte region [0x60e000305920,0x60e0003059b8)
allocated by thread T0 here:
    #0 0x7ffff75a2c7f in __interceptor_malloc
    #1 0x7fffcffc5068  (/lib/x86_64-linux-gnu/libomp.so.5+0x20068)

---

SparseTensorImpl::~SparseTensorImpl()
  -> TensorBase::~TensorBase()
    -> TensorImpl::~TensorImpl()
      -> StorageImpl::~StorageImpl()
        -> DataPtr::~DataPtr()
          -> UniqueVoidPtr::~UniqueVoidPtr()
            -> free()  // bad-free
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Summary

A malformed-input sequence in torch.sparse.spdiags can lead to an ASan-detected bad-free during tensor destruction instead of a clean Python exception, indicating a potential memory-safety issue reachable from Python.

Reproducer

import torch

try:
    dtype = torch.float16
    shape = (0, 0)
    diags = torch.tensor([-8], dtype=torch.int64)
    data_mat = torch.zeros((1, 0), dtype=dtype)

    print("=== graceful error case ===")
    print("dtype =", dtype)
    print("shape =", shape)
    print("diags =", diags.tolist())
    print("data_mat.shape =", tuple(data_mat.shape))

    out = torch.sparse.spdiags(data_mat, diags, shape)
    print("OK:", out)

except Exception as e:
    print("[CAUGHT PYTHON EXCEPTION]")
    print("type =", type(e).__name__)
    print("msg  =", e)
    

dtype = torch.float16
shape = (54, 63)
diags = torch.tensor([-65, 10], dtype=torch.int64)
data_mat = torch.zeros((2, 54), dtype=dtype)

torch.sparse.spdiags(data_mat, diags, shape)

The first malformed input raises a Python exception:

=== graceful error case ===
dtype = torch.float16
shape = (0, 0)
diags = [-8]
data_mat.shape = (1, 0)
[CAUGHT PYTHON EXCEPTION]
type = RuntimeError
msg  = Trying to create tensor with negative dimension -8: [2, -8]

The subsequent malformed input then triggers an ASan crash:

==2444454==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x60e000305a00 in thread T0
    #0 0x7ffff75a1aa7 in __interceptor_free.part.0
    #1 0x7ffff254e071 in std::unique_ptr<void, void (*)(void*)>::~unique_ptr()
    #2 0x7ffff254e071 in c10::detail::UniqueVoidPtr::~UniqueVoidPtr()
    #3 0x7ffff254e071 in c10::DataPtr::~DataPtr()
    #4 0x7ffff254e071 in c10::StorageImpl::~StorageImpl()
    #5 0x7ffff254e168 in c10::StorageImpl::~StorageImpl()
    #6 0x7fffd0228acf in c10::intrusive_ptr<c10::StorageImpl, ...>::reset_()
    #7 0x7fffd0228acf in c10::intrusive_ptr<c10::StorageImpl, ...>::~intrusive_ptr()
    #8 0x7fffd0228acf in c10::Storage::~Storage()
    #9 0x7fffd0228acf in c10::TensorImpl::~TensorImpl()
    #10 0x7fffd0228f84 in c10::TensorImpl::~TensorImpl()
    #11 0x7fffd971b1a0 in c10::intrusive_ptr<c10::TensorImpl, ...>::reset_()
    #12 0x7fffd971b1a0 in c10::intrusive_ptr<c10::TensorImpl, ...>::~intrusive_ptr()
    #13 0x7fffd971b1a0 in at::TensorBase::~TensorBase()
    #14 0x7fffd971b1a0 in at::SparseTensorImpl::~SparseTensorImpl()
    #15 0x7fffd971b3c8 in at::SparseTensorImpl::~SparseTensorImpl()
    #16 0x7ffff26eec6b in c10::MaybeOwned<at::Tensor>::operator=(...)
    #17 0x7ffff26eec6b in THPVariable_subclass_clear(THPVariable*)
    #18 0x7ffff26ed9c2 in THPVariable_subclass_dealloc(_object*)
    ...
SUMMARY: AddressSanitizer: bad-free in __interceptor_free.part.0

ASan also reports that the freed address is not a valid allocation base:

0x60e000305a00 is located 72 bytes after 152-byte region [0x60e000305920,0x60e0003059b8)
allocated by thread T0 here:
    #0 0x7ffff75a2c7f in __interceptor_malloc
    #1 0x7fffcffc5068  (/lib/x86_64-linux-gnu/libomp.so.5+0x20068)

Malformed diags values should be rejected deterministically at the API boundary with a normal Python exception. Under no circumstances should invalid user input lead to partially initialized native objects, invalid frees, or memory-corruption symptoms during teardown.

Root cause analysis

The root cause appears to be insufficient input validation in torch.sparse.spdiags, mainly in two respects.

  1. Missing bounds checks for diagonal offsets.
    The spdiags function does not properly validate whether the values in diags fall within the valid range. For a matrix of shape (m, n), a valid diagonal offset should satisfy -(m - 1) <= d <= (n - 1). When diags=[-65] is used with shape=(54, 63), the absolute value of the offset already exceeds the matrix row bound, and this should be rejected at the function entry point.

  2. Resource leakage or inconsistent state on the exception path.
    As shown by the ASan stack trace, the destruction chain is:

    SparseTensorImpl::~SparseTensorImpl()
  -> TensorBase::~TensorBase()
    -> TensorImpl::~TensorImpl()
      -> StorageImpl::~StorageImpl()
        -> DataPtr::~DataPtr()
          -> UniqueVoidPtr::~UniqueVoidPtr()
            -> free()  // bad-free

This suggests that when the first call throws an exception, a SparseTensorImpl object or its internal Storage may already have been partially constructed. In other words, the object structure itself may have been allocated, but the internal DataPtr ended up pointing to an invalid address. ASan reports that this address is located 72 bytes past a previously allocated 152-byte region, indicating an out-of-bounds offset. When Python exception handling later causes the object to be reclaimed by GC, the destructor attempts to free() a pointer that was never correctly returned by malloc(). The core issue is that the C++ implementation of spdiags appears to compute index/value tensor sizes and pass invalid dimensions into lower-level tensor construction logic before fully validating them. Once an exception is thrown, the implementation does not appear to properly roll back the state of partially constructed objects.

Potential exploitability and impact

The observed result is an ASan-detected crash, but the bug involves an invalid free in native code rather than a safe exception. This means the impact may go beyond denial of service in some environments. Although I have not shown a stronger exploitation path, malformed Python inputs can clearly trigger unsafe memory handling in torch.sparse.spdiags.

Suggested remediation

spdiags should validate diagonal offsets before performing any internal size computation or tensor allocation. Any dimension derived from shape and diags should be checked before being passed to lower-level constructors. The implementation should also ensure strong exception safety so that partially initialized sparse tensor state cannot reach an invalid destruction path.

Versions

PyTorch version: 2.10.0+cu128 Is debug build: False CUDA used to build PyTorch: 12.8 ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0 Clang version: Could not collect CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.12 (main, Mar 3 2026, 11:56:32) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-5.4.0-200-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 128 On-line CPU(s) list: 0-127 Vendor ID: AuthenticAMD Model name: AMD Ryzen Threadripper PRO 5995WX 64-Cores CPU family: 25 Model: 8 Thread(s) per core: 2 Core(s) per socket: 64 Socket(s): 1 Stepping: 2 Frequency boost: enabled CPU max MHz: 2700.0000 CPU min MHz: 1800.0000 BogoMIPS: 5389.77 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 invpcid_single hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor smca Virtualization: AMD-V L1d cache: 2 MiB (64 instances) L1i cache: 2 MiB (64 instances) L2 cache: 32 MiB (64 instances) L3 cache: 256 MiB (8 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-127 Vulnerability Gather data sampling: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Retbleed: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] nvidia-cublas-cu12==12.8.4.1 [pip3] nvidia-cuda-cupti-cu12==12.8.90 [pip3] nvidia-cuda-nvrtc-cu12==12.8.93 [pip3] nvidia-cuda-runtime-cu12==12.8.90 [pip3] nvidia-cudnn-cu12==9.10.2.21 [pip3] nvidia-cufft-cu12==11.3.3.83 [pip3] nvidia-curand-cu12==10.3.9.90 [pip3] nvidia-cusolver-cu12==11.7.3.90 [pip3] nvidia-cusparse-cu12==12.5.8.93 [pip3] nvidia-cusparselt-cu12==0.7.1 [pip3] nvidia-nccl-cu12==2.27.5 [pip3] nvidia-nvjitlink-cu12==12.8.93 [pip3] nvidia-nvtx-cu12==12.8.90 [pip3] torch==2.10.0 [pip3] torchvision==0.26.0a0+48956e0 [pip3] triton==3.6.0 [conda] Could not collect

cc @nikitaved @pearu @cpuhrsch @amjames @bhosmer @jcaip @malfet

extent analysis

Fix Plan

To address the issue, we need to add input validation to torch.sparse.spdiags to ensure that the diagonal offsets in diags are within the valid range. We also need to ensure that the function handles exceptions properly to prevent resource leakage or inconsistent state.

Here are the steps to fix the issue:

  • Add bounds checks for diagonal offsets in diags:
    • Check if the absolute value of each offset in diags is less than or equal to the minimum of the number of rows and columns in the output matrix.
  • Handle exceptions properly:
    • Use a try-except block to catch any exceptions that occur during the execution of torch.sparse.spdiags.
    • In the except block, ensure that any partially constructed objects are properly cleaned up to prevent resource leakage.

Here's an example of how the updated code could look:

import torch

def sparse_spdiags(data_mat, diags, shape):
    """
    Creates a sparse matrix from the given data and diagonal offsets.

    Args:
        data_mat (Tensor): The data to be placed on the diagonals.
        diags (Tensor): The diagonal offsets.
        shape (tuple): The shape of the output matrix.

    Returns:
        Tensor: The sparse matrix.
    """
    # Check if the input tensors are valid
    if not isinstance(data_mat, torch.Tensor) or not isinstance(diags, torch.Tensor):
        raise ValueError("Input tensors must be of type torch.Tensor")

    # Check if the diagonal offsets are within the valid range
    if torch.any(torch.abs(diags) > min(shape[0], shape[1]) - 1):
        raise ValueError("Diagonal offsets must be within the valid range")

    try:
        # Create the sparse matrix
        out = torch.sparse.spdiags(data_mat, diags, shape)
        return out
    except Exception as e:
        # Handle any exceptions that occur during execution
        print(f"An error occurred: {e}")
        # Clean up any partially constructed objects
        return None

Verification

To verify that the fix worked, you can test the updated sparse_spdiags function with different inputs, including valid and invalid diagonal offsets. You can also use tools like AddressSanitizer to check for any memory safety issues.

Here's an example of how you can test the function:

# Test with valid diagonal offsets
data_mat = torch.zeros((1, 5))
diags = torch.tensor([0])
shape = (5, 5)
out = sparse_spdiags(data_mat, diags, shape)
print(out)

# Test with invalid diagonal offsets
data_mat = torch.zeros((1, 5))
diags = torch.tensor([10])
shape = (5, 5)
try:
    out = sparse_spdiags(data_mat, diags

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING