pytorch - 💡(How to fix) Fix SIGSEGV in eager `torch.ops.quantized.linear` with CUDA input

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Running on device: cuda Input: torch.Size([1, 32]) torch.float32 cuda:0 Fatal Python error: Segmentation fault

Thread 0x... (most recent call first): File ".../torch/_ops.py", line 1275 in call File ".../for_test_1.py", line 55 in forward File ".../torch/nn/modules/module.py", line 1789 in _call_impl File ".../torch/nn/modules/module.py", line 1778 in _wrapped_call_impl File ".../for_test_1.py", line 83 in run_once File ".../for_test_1.py", line 107 in main File ".../for_test_1.py", line 116 in <module>

Segmentation fault (core dumped)

Code Example

import sys
import platform
import faulthandler

import torch
import torch.nn as nn


faulthandler.enable(all_threads=True)


class MyModel(nn.Module):
    N = 64
    K = 32

    x_scale = 1.2
    x_zero_point = 0

    w_scale = 0.2
    w_zero_point = 0

    y_scale = 4.2
    y_zero_point = 0

    def __init__(self):
        super().__init__()

        torch.backends.quantized.engine = "fbgemm"

        self.weight = nn.Parameter(
            2 * torch.randn((self.N, self.K), dtype=torch.float32)
        )
        self.bias = nn.Parameter(
            3 * torch.randn(self.N, dtype=torch.float32) + 1
        )

        qw = torch.quantize_per_tensor(
            self.weight.detach(),
            self.w_scale,
            self.w_zero_point,
            torch.qint8,
        )

        self.w_packed = torch.ops.quantized.linear_prepack(qw, self.bias)

    def forward(self, x):
        qx = torch.quantize_per_tensor(
            x,
            self.x_scale,
            self.x_zero_point,
            torch.quint8,
        )

        return torch.ops.quantized.linear(
            qx,
            self.w_packed,
            self.y_scale,
            self.y_zero_point,
        )


def print_env():
    print("Python:", sys.version.replace("\n", " "))
    print("Platform:", platform.platform())
    print("PyTorch:", torch.__version__)
    print("CUDA used to build PyTorch:", torch.version.cuda)
    print("CUDA available:", torch.cuda.is_available())
    print("Quantized engine:", torch.backends.quantized.engine)
    print("Supported quantized engines:", torch.backends.quantized.supported_engines)
    print()


def run_once(device):
    print(f"Running on device: {device}")

    model = MyModel().eval().to(device)
    x = torch.randn((1, MyModel.K), dtype=torch.float32, device=device)

    print("Input:", x.shape, x.dtype, x.device)

    with torch.no_grad():
        y = model(x)

    print("Output:", y.shape, y.dtype, y.device, "is_quantized =", y.is_quantized)
    print()


def main():
    torch.manual_seed(42)

    print_env()

    run_once("cpu")

    if not torch.cuda.is_available():
        print("CUDA is not available; skipping CUDA repro.")
        return

    run_once("cuda")


if __name__ == "__main__":
    main()

---

Running on device: cpu
Input: torch.Size([1, 32]) torch.float32 cpu
Output: torch.Size([1, 64]) torch.quint8 cpu is_quantized = True

---

Running on device: cuda
Input: torch.Size([1, 32]) torch.float32 cuda:0
Fatal Python error: Segmentation fault

Thread 0x... (most recent call first):
  File ".../torch/_ops.py", line 1275 in __call__
  File ".../for_test_1.py", line 55 in forward
  File ".../torch/nn/modules/module.py", line 1789 in _call_impl
  File ".../torch/nn/modules/module.py", line 1778 in _wrapped_call_impl
  File ".../for_test_1.py", line 83 in run_once
  File ".../for_test_1.py", line 107 in main
  File ".../for_test_1.py", line 116 in <module>

Segmentation fault (core dumped)

---

PyTorch version:  2.13.0a0+git90cf0e1
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

I found that torch.ops.quantized.linear can crash the Python process with a segmentation fault when the input tensor is on CUDA.

The same minimal model runs successfully on CPU, but crashes during eager execution on CUDA. torch.compile / Inductor is not involved in this repro.

If this CUDA usage is unsupported for quantized linear, I would expect PyTorch to raise a clear RuntimeError instead of crashing the process.

Minimal reproduction

import sys
import platform
import faulthandler

import torch
import torch.nn as nn


faulthandler.enable(all_threads=True)


class MyModel(nn.Module):
    N = 64
    K = 32

    x_scale = 1.2
    x_zero_point = 0

    w_scale = 0.2
    w_zero_point = 0

    y_scale = 4.2
    y_zero_point = 0

    def __init__(self):
        super().__init__()

        torch.backends.quantized.engine = "fbgemm"

        self.weight = nn.Parameter(
            2 * torch.randn((self.N, self.K), dtype=torch.float32)
        )
        self.bias = nn.Parameter(
            3 * torch.randn(self.N, dtype=torch.float32) + 1
        )

        qw = torch.quantize_per_tensor(
            self.weight.detach(),
            self.w_scale,
            self.w_zero_point,
            torch.qint8,
        )

        self.w_packed = torch.ops.quantized.linear_prepack(qw, self.bias)

    def forward(self, x):
        qx = torch.quantize_per_tensor(
            x,
            self.x_scale,
            self.x_zero_point,
            torch.quint8,
        )

        return torch.ops.quantized.linear(
            qx,
            self.w_packed,
            self.y_scale,
            self.y_zero_point,
        )


def print_env():
    print("Python:", sys.version.replace("\n", " "))
    print("Platform:", platform.platform())
    print("PyTorch:", torch.__version__)
    print("CUDA used to build PyTorch:", torch.version.cuda)
    print("CUDA available:", torch.cuda.is_available())
    print("Quantized engine:", torch.backends.quantized.engine)
    print("Supported quantized engines:", torch.backends.quantized.supported_engines)
    print()


def run_once(device):
    print(f"Running on device: {device}")

    model = MyModel().eval().to(device)
    x = torch.randn((1, MyModel.K), dtype=torch.float32, device=device)

    print("Input:", x.shape, x.dtype, x.device)

    with torch.no_grad():
        y = model(x)

    print("Output:", y.shape, y.dtype, y.device, "is_quantized =", y.is_quantized)
    print()


def main():
    torch.manual_seed(42)

    print_env()

    run_once("cpu")

    if not torch.cuda.is_available():
        print("CUDA is not available; skipping CUDA repro.")
        return

    run_once("cuda")


if __name__ == "__main__":
    main()

Observed Behavior

The CPU run succeeds:

Running on device: cpu
Input: torch.Size([1, 32]) torch.float32 cpu
Output: torch.Size([1, 64]) torch.quint8 cpu is_quantized = True

The CUDA run crashes the Python process:

Running on device: cuda
Input: torch.Size([1, 32]) torch.float32 cuda:0
Fatal Python error: Segmentation fault

Thread 0x... (most recent call first):
  File ".../torch/_ops.py", line 1275 in __call__
  File ".../for_test_1.py", line 55 in forward
  File ".../torch/nn/modules/module.py", line 1789 in _call_impl
  File ".../torch/nn/modules/module.py", line 1778 in _wrapped_call_impl
  File ".../for_test_1.py", line 83 in run_once
  File ".../for_test_1.py", line 107 in main
  File ".../for_test_1.py", line 116 in <module>

Segmentation fault (core dumped)

Versions

PyTorch version:  2.13.0a0+git90cf0e1
Is debug build: True
CUDA used to build PyTorch: 12.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 12.3.0-1ubuntu1~22.04) 12.3.0
Clang version: Could not collect
CMake version: version 3.22.1
Libc version: glibc-2.35

Python version: 3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.8.0-59-generic-x86_64-with-glibc2.35
Is CUDA available: True

cc @jianyuh @raghuramank100 @jamesr66a @vkuzo @jgong5 @Xia-Weiwen @leslie-fang-intel

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING