pytorch - ✅(Solved) Fix Floating point exception (core dumped) with AMD processor [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
pytorch/pytorch#177227Fetched 2026-04-08 00:21:47
View on GitHub
Comments
0
Participants
1
Timeline
84
Reactions
1
Participants
Timeline (top)
mentioned ×36subscribed ×36labeled ×10cross-referenced ×1

Error Message

Recently, I am running both nn.Module and ExportedProgram. Everything works fine with my Intel CPU Processor. However, when using AMD CPU Processor, I will run into the error when performing inference: Floating point exception (core dumped).

Fix Action

Fix / Workaround

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD EPYC 7H12 64-Core Processor CPU family: 23 Model: 49 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 Stepping: 0 BogoMIPS: 5199.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xsaves clzero arat npt svm_lock nrip_save vmcb_clean flushbyasid decodeassists overflow_recov succor Virtualization: AMD-V Hypervisor vendor: VMware Virtualization type: full L1d cache: 512 KiB (16 instances) L1i cache: 512 KiB (16 instances) L2 cache: 8 MiB (16 instances) L3 cache: 4 GiB (16 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT disabled Vulnerability Spec rstack overflow: Mitigation; SMT disabled Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

PR fix notes

PR #18098: Qualcomm AI Engine Direct - AMD backend error

Description (problem / solution / changelog)

Summary

We noticed that when performing inference with AMD CPU, we will run into Floating point exception (core dumped). This can be easily reproduced with following lines of code:

import torch.nn as nn import torch w2_conv = nn.Conv2d(1536, 32, 1, bias=False) x = torch.randn(1,1536,1,32) w2_conv(x)

Temp solution is to set mkldnn.enabled=False: torch.backends.mkldnn.enabled = False

Test plan

NA

cc @cccclai @cbilgin

Changed files

  • .ci/scripts/setup-qnn-deps.sh (modified, +2/-1)
  • .ci/scripts/test_wheel_package_qnn.sh (modified, +3/-0)
  • backends/qualcomm/__init__.py (modified, +12/-0)
  • backends/qualcomm/requirements.txt (added, +5/-0)
  • backends/qualcomm/scripts/build.sh (modified, +1/-1)

Code Example

import torch.nn as nn
import torch
w2_conv = nn.Conv2d(1536, 32, 1, bias=False)
x = torch.randn(1,1536,1,32)
w2_conv(x)
RAW_BUFFERClick to expand / collapse

🐛 Describe the bug

Recently, I am running both nn.Module and ExportedProgram. Everything works fine with my Intel CPU Processor. However, when using AMD CPU Processor, I will run into the error when performing inference: Floating point exception (core dumped). The AMD Processor model is: AMD EPYC 7H12 64-Core Processor. Also, since I am using ExecuTorch, this is a nightly torch version: 2.11.0.dev20260215+cpu This issue can be easily reproduced with a simple nn.Module model.

import torch.nn as nn
import torch
w2_conv = nn.Conv2d(1536, 32, 1, bias=False)
x = torch.randn(1,1536,1,32)
w2_conv(x)

To resolve the issue, we will need to manually turn on the following flag, and inference for AMD works as usual. torch.backends.mkldnn.enabled = False

Versions

PyTorch version: 2.11.0.dev20260215+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A

OS: Ubuntu 22.04.3 LTS (x86_64) GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 Clang version: Could not collect CMake version: version 3.31.10 Libc version: glibc-2.35

Python version: 3.10.19 (main, Oct 21 2025, 16:43:05) [GCC 11.2.0] (64-bit runtime) Python platform: Linux-5.15.0-164-generic-x86_64-with-glibc2.35 Is CUDA available: False CUDA runtime version: No CUDA CUDA_MODULE_LOADING set to: N/A GPU models and configuration: No CUDA Nvidia driver version: No CUDA cuDNN version: No CUDA Is XPU available: False HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Caching allocator config: N/A

CPU: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 43 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 16 On-line CPU(s) list: 0-15 Vendor ID: AuthenticAMD Model name: AMD EPYC 7H12 64-Core Processor CPU family: 23 Model: 49 Thread(s) per core: 1 Core(s) per socket: 1 Socket(s): 16 Stepping: 0 BogoMIPS: 5199.99 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xsaves clzero arat npt svm_lock nrip_save vmcb_clean flushbyasid decodeassists overflow_recov succor Virtualization: AMD-V Hypervisor vendor: VMware Virtualization type: full L1d cache: 512 KiB (16 instances) L1i cache: 512 KiB (16 instances) L2 cache: 8 MiB (16 instances) L3 cache: 4 GiB (16 instances) NUMA node(s): 1 NUMA node0 CPU(s): 0-15 Vulnerability Gather data sampling: Not affected Vulnerability Indirect target selection: Not affected Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Mmio stale data: Not affected Vulnerability Reg file data sampling: Not affected Vulnerability Retbleed: Mitigation; untrained return thunk; SMT disabled Vulnerability Spec rstack overflow: Mitigation; SMT disabled Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; Retpolines; IBPB conditional; STIBP disabled; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected Vulnerability Srbds: Not affected Vulnerability Tsa: Not affected Vulnerability Tsx async abort: Not affected Vulnerability Vmscape: Not affected

Versions of relevant libraries: [pip3] executorch==1.2.0a0+a614149 [pip3] flake8==6.1.0 [pip3] flake8-breakpoint==1.1.0 [pip3] flake8-bugbear==24.4.26 [pip3] flake8-comprehensions==3.14.0 [pip3] flake8-plugin-utils==1.3.3 [pip3] flake8-pyi==23.5.0 [pip3] mypy==1.14.1 [pip3] mypy_extensions==1.1.0 [pip3] numpy==2.2.6 [pip3] nvidia-cublas-cu12==12.1.3.1 [pip3] nvidia-cuda-cupti-cu12==12.1.105 [pip3] nvidia-cuda-nvrtc-cu12==12.1.105 [pip3] nvidia-cuda-runtime-cu12==12.1.105 [pip3] nvidia-cudnn-cu12==9.1.0.70 [pip3] nvidia-cufft-cu12==11.0.2.54 [pip3] nvidia-curand-cu12==10.3.2.106 [pip3] nvidia-cusolver-cu12==11.4.5.107 [pip3] nvidia-cusparse-cu12==12.1.0.106 [pip3] nvidia-nccl-cu12==2.20.5 [pip3] nvidia-nvjitlink-cu12==12.9.41 [pip3] nvidia-nvtx-cu12==12.1.105 [pip3] pytorch_sphinx_theme==0.0.24 [pip3] pytorch_tokenizers==1.1.0 [pip3] torch==2.11.0.dev20260215+cpu [pip3] torchao==0.17.0+git42bcdc491 [pip3] torchaudio==2.11.0.dev20260215+cpu [pip3] torchdata==0.11.0 [pip3] torchmetrics==1.8.2 [pip3] torchsr==1.0.4 [pip3] torchtune==0.0.0 [pip3] torchvision==0.26.0.dev20260215+cpu [pip3] triton==3.0.0 [conda] executorch 1.2.0a0+a614149 pypi_0 pypi [conda] numpy 2.2.6 pypi_0 pypi [conda] nvidia-cublas-cu12 12.1.3.1 pypi_0 pypi [conda] nvidia-cuda-cupti-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cuda-nvrtc-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cuda-runtime-cu12 12.1.105 pypi_0 pypi [conda] nvidia-cudnn-cu12 9.1.0.70 pypi_0 pypi [conda] nvidia-cufft-cu12 11.0.2.54 pypi_0 pypi [conda] nvidia-curand-cu12 10.3.2.106 pypi_0 pypi [conda] nvidia-cusolver-cu12 11.4.5.107 pypi_0 pypi [conda] nvidia-cusparse-cu12 12.1.0.106 pypi_0 pypi [conda] nvidia-nccl-cu12 2.20.5 pypi_0 pypi [conda] nvidia-nvjitlink-cu12 12.9.41 pypi_0 pypi [conda] nvidia-nvtx-cu12 12.1.105 pypi_0 pypi [conda] pytorch-sphinx-theme 0.0.24 pypi_0 pypi [conda] pytorch-tokenizers 1.1.0 pypi_0 pypi [conda] torch 2.11.0.dev20260215+cpu pypi_0 pypi [conda] torchao 0.17.0+git42bcdc491 pypi_0 pypi [conda] torchaudio 2.11.0.dev20260215+cpu pypi_0 pypi [conda] torchdata 0.11.0 pypi_0 pypi [conda] torchfix 0.6.0 pypi_0 pypi [conda] torchmetrics 1.8.2 pypi_0 pypi [conda] torchsr 1.0.4 pypi_0 pypi [conda] torchtune 0.0.0 pypi_0 pypi [conda] torchvision 0.26.0.dev20260215+cpu pypi_0 pypi [conda] triton 3.0.0 pypi_0 pypi

cc @ezyang @gchanan @kadeng @msaroufim @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @gujinghui @PenghuiCheng @jianyuh @min-jean-cho @yanbing-j @Guobing-Chen @Xia-Weiwen @snadampal

extent analysis

Fix Plan

Problem Summary

The issue is caused by a floating point exception on AMD CPU when running a simple nn.Module model with ExecuTorch and PyTorch 2.11.0.dev20260215+cpu.

Root Cause Analysis

The root cause is likely due to a compatibility issue between the nightly PyTorch version and the AMD CPU architecture.

Fix Plan

To resolve the issue, we need to manually turn off the mkldnn backend by setting torch.backends.mkldnn.enabled = False.

Code Changes

import torch
torch.backends.mkldnn.enabled = False

Add this line of code before creating the nn.Module model.

Example Code

import torch
torch.backends.mkldnn.enabled = False
w2_conv = nn.Conv2d(1536, 32, 1, bias=False)
x = torch.randn(1,1536,1,32)
w2_conv(x)

This should resolve the floating point exception issue on AMD CPU.

Verification

To verify that the fix worked, run the code with the mkldnn backend disabled and check if the floating point exception is resolved.

Extra Tips

  • Make sure to update the PyTorch version to the latest stable release if possible.
  • If you are using a custom PyTorch build, try building it with the latest mkldnn version.
  • Consider using a different backend, such as torch.backends.cudnn or torch.backends.cuda, if possible.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

pytorch - ✅(Solved) Fix Floating point exception (core dumped) with AMD processor [1 pull requests, 1 participants]