Para entrar en la lista de los complejos TOP 50, 100, 500 HPC (High Performance Computing), los resultados de la prueba obtenidos utilizando el punto de referencia HPL (High Performance Linpack) son adecuados.El benchmark Linpack (Linear Algebra PACKage) implementa un algoritmo para resolver SLAEs usando el método de descomposición LU. Este paquete está disponible públicamente, es fácil de instalar y ejecutar. Bueno para demostrar el rendimiento de la CPU.Todos los que estén familiarizados con la arquitectura de los aceleradores gráficos pueden asumir que este paquete es aún mejor para probar dispositivos informáticos con arquitectura GPU. Sin embargo, la versión 2011 de CUDA para la arquitectura Fermi está disponible para descargar en línea.En esta guía, daré un ejemplo de creación y ejecución de HPL para la GPU.¿Cómo controlar el acceso al software?
¿Cómo instalar CUDA?
¿Cómo instalar openmpi?
¿Cómo instalar openblas?
¿Cómo instalar HPL para GPU?Instalar el paquete MODULES
Para administrar las variables de entorno, instale el paquete MODULES y prepare un archivo de módulo de prueba.$ yum install environment-modules
$ mcedit /etc/modulefailes/test/v1.0
proc ModulesHelp { } {
global version
puts stderr "Modulefile for test v1.0"
}
set version v1.0
module-whatis "Modulefile for test v1.0"
setenv MAINDIR /nfs/software/test/v1.0
prepend-path PATH $env(MAINDIR)/bin
prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
prepend-path LIBRARY_PATH $env(MAINDIR)/lib64
prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib64
Verificar archivos de módulo
La probabilidad de cometer un error al preparar el módulo es bastante alta. Por lo tanto, verifico todas las rutas especificadas en el archivo del módulo. Para no verificar cada ruta manualmente, preparé un script. Si es 0, entonces la ruta es correcta.$ cat check-modulefiles
ModulePath=$1
MainDir=$(cat $ModulePath | grep "setenv MAINDIR" | cut -f7 -d " ")
ListOfPaths=$(cat $ModulePath | grep path | cut -f7 -d " ")
ListOfPaths=$(echo $ListOfPaths | sed "s@\$env(MAINDIR)@$MainDir@g")
for u in $ListOfPaths; do
ls -la $u 1> /dev/null 2> /dev/null;
printf "%60s %4d\n" $u $?;
done
$ chmod +x check-modulefiles
$ ./check-modulefiles /etc/modulefiles/test/v1.0
/nfs/software/test/v1.0/bin 0
/nfs/software/test/v1.0/include 0
/nfs/software/test/v1.0/include 0
/nfs/software/test/v1.0/lib64 0
/nfs/software/test/v1.0/lib64 0
Comandos de gestión de módulos
$ module avail
$ module add cuda/v10.1
$ nvcc –version
Cuda compilation tools, release 10.1, V10.1.168
$ module switch cuda/v10.1 cuda/v9.2
$ nvcc –version
Cuda compilation tools, release 9.2, V9.2.88
$ module list
$ module rm cuda/v9.2
1. Veamos la lista de módulos disponibles para la conexión2. Conecte el módulo3-4. Verifique la versión5. Cambie el módulo6-7.Verifiquemos la versión 8. Veamos la lista de módulos conectados9. Elimine el módulo de la lista de módulos conectados.Instalar CUDA
Descargue CUDA 9.2 para Centos 7 aquí .$ chmod +x cuda_9.2.run
$ ./cuda_9.2.run
Do you accept the previously read EULA? accept
Install the CUDA 9.2 Toolkit? yes
Enter Toolkit Location: /nfs/software/cuda/v9.2
Do you want to install a symbolic link at /usr/local/cuda? no
Install the CUDA 9.2 Samples? no
$ cat /etc/modulefiles/cuda/v9.2
proc ModulesHelp { } {
global version
puts stderr "Modulefile for cuda v9.2"
}
set version v9.2
module-whatis "Modulefile for cuda v9.2"
setenv MAINDIR /nfs/software/cuda/v9.2
prepend-path PATH $env(MAINDIR)/bin
prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
prepend-path LIBRARY_PATH $env(MAINDIR)/lib64/stubs
prepend-path LIBRARY_PATH $env(MAINDIR)/lib64
prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib64/stubs
prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib64
$ module add cuda/v9.2
$ nvcc --version
Cuda compilation tools, release 9.2, V9.2.148
Instalar OpenBLAS
$ wget https://github.com/xianyi/OpenBLAS/archive/v0.3.6.tar.gz
$ tar -xzvf v0.3.6.tar.gz
$ cd OpenBLAS-0.3.6
$ mkdir -p /nfs/software/openblas/v0.3.6
$ make -j4
$ make PREFIX=/nfs/software/openblas/v0.3.6/ install
$ ls -la /nfs/software/openblas/v0.3.6/lib/
$ cat /etc/modulefiles/openblas/v0.3.6
proc ModulesHelp { } {
global version
puts stderr "Modulefile for openblas v0.3.6"
}
set version v0.3.6
module-whatis "Modulefile for openblas v0.3.6"
setenv MAINDIR /nfs/software/openblas/v0.3.6
prepend-path PATH $env(MAINDIR)/bin
prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
prepend-path LIBRARY_PATH $env(MAINDIR)/lib
prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib
$ ls -la /nfs/software/openblas/v0.3.6/lib
Instalar OpenMPI
wget https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.6.tar.gz
$ tar -xzvf openmpi-2.1.6.tar.gz
$ cd openmpi-2.1.6
$ mkdir -p /nfs/software/openmpi/v2.1.6
$ module add cuda/v9.2
$ ./configure --prefix=/nfs/software/openmpi/v2.1.6/ --with-cuda --enable-static
$ make
$ make install
$ cat /etc/modulefiles/openmpi/v2.1.6
proc ModulesHelp { } {
global version
puts stderr "Modulefile for openmpi v2.1.6"
}
set version v2.1.6
module-whatis "Modulefile for openmpi v2.1.6"
setenv MAINDIR /nfs/software/openmpi/v2.1.6
prepend-path PATH $env(MAINDIR)/bin
prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
prepend-path LIBRARY_PATH $env(MAINDIR)/lib
prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib
$ module add openmpi/v2.1.6
$ mpirun --version
mpirun (Open MPI) 2.1.6
Instalar HPL para GPU
Configure las variables de entorno conectando los módulos y descargue HPL 2.0.$ module add openmpi/v2.1.6
$ module add cuda/v9.2
$ module add openblas/v0.3.6
$ wget https://developer.download.nvidia.com/assets/cuda/secure/AcceleratedLinpack/hpl-2.0_FERMI_v15.tgz
$ tar -xvf hpl-2.0_FERMI_v15.tgz
$ mv hpl-2.0_FERMI_v15.tgz hpl-2.0
$ cd hpl-2.0
Antes del ensamblaje, debe editar varios archivos. El primero es Make.CUDA en el directorio hpl-2.0. Copie el siguiente código en Make.CUDA:$ cat Make.CUDA
SHELL = /bin/sh
CD = cd
CP = cp
LN_S = ln -fs
MKDIR = mkdir -p
RM = /bin/rm -f
TOUCH = touch
ARCH = CUDA
TOPdir = /home/user/hpl-2.0
INCdir = $(TOPdir)/include
BINdir = $(TOPdir)/bin/$(ARCH)
LIBdir = $(TOPdir)/lib/$(ARCH)
HPLlib = $(LIBdir)/libhpl.a
MPdir = /nfs/software/openmpi/v2.1.6
MPinc = -I$(MPdir)/include
MPlib = -L$(MPdir)/lib -lmpi
LAdir = /nfs/software/openblas/v0.3.6
LAinc = -I$(LAdir)/include
LAlib = -L$(TOPdir)/src/cuda -ldgemm -L/nfs/software/cuda/v9.2/lib64 -lcuda -lcudart -lcublas -L$(LAdir)/lib -lopenblas
F2CDEFS = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib)
HPL_OPTS = -DCUDA
HPL_DEFS = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
CC = mpicc
CCFLAGS = -fopenmp -lpthread -fomit-frame-pointer -O3 -funroll-loops $(HPL_DEFS)
CCNOOPT = $(HPL_DEFS) -O0 -w
LINKER = $(CC)
LINKFLAGS = $(CCFLAGS)
ARCHIVER = ar
ARFLAGS = r
RANLIB = echo
MAKE = make TOPdir=$(TOPdir)
11. Ruta al directorio hpl-2.017. Ruta a OpenMPI21. Ruta a OpenBLAS23. Ruta a CUDA lib64Reemplace las siguientes líneas en el archivo hpl-2.0 / src / cuda / cuda_dgemm.c:$ mcedit src/cuda/cuda_dgemm.c
…
// handle2 = dlopen ("libmkl_intel_lp64.so", RTLD_LAZY);
handle2 = dlopen ("libopenblas.so", RTLD_LAZY);
…
// dgemm_mkl = (void(*)())dlsym(handle, "dgemm");
dgemm_mkl = (void(*)())dlsym(handle, "dgemm_");
…
// handle = dlopen ("libmkl_intel_lp64.so", RTLD_LAZY);
handle = dlopen ("libopenblas.so", RTLD_LAZY);
…
// mkl_dtrsm = (void(*)())dlsym(handle2, "dtrsm");
mkl_dtrsm = (void(*)())dlsym(handle2, "dtrsm_");
Cree y ejecute HPL en una GPU 4x:$ make arch=CUDA
$ cd bin/CUDA
$ export LD_LIBRARY_PATH=/home/user/hpl-2.0/src/cuda/:$LD_LIBRARY_PATH
$ mpirun -np 4 ./xhpl
================================================================================
HPLinpack 2.0 -- High-Performance Linpack benchmark -- September 10, 2008
Written by A. Petitet and R. Clint Whaley, Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
================================================================================
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 25000
NB : 768
PMAP : Row-major process mapping
P : 2
Q : 2
PFACT : Left
NBMIN : 2
NDIV : 2
RFACT : Left
BCAST : 1ring
DEPTH : 1
SWAP : Spread-roll (long)
L1 : no-transposed form
U : no-transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
================================================================================
T/V N NB P Q Time Gflops
--------------------------------------------------------------------------------
WR10L2L2 25000 768 2 2 16.72 6.232e+02
--------------------------------------------------------------------------------
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0019019 ...... PASSED
================================================================================
Finished 1 tests with the following results:
1 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
--------------------------------------------------------------------------------
End of Tests.
================================================================================
Para editar los parámetros de prueba, use el archivo hpl-2.0 / bin / CUDA / HPL.dat