👸 ⛏️ 🚢 Installer la version GPL de HPL avec OpenBLAS 🐮 🐤 🔩

Pour entrer dans la liste des TOP 50, 100, 500 complexes HPC (High Performance Computing), les résultats des tests obtenus en utilisant le benchmark HPL (High Performance Linpack) conviennent.

Le benchmark Linpack (Linear Algebra PACKage) implémente un algorithme pour résoudre les SLAE en utilisant la méthode de décomposition LU. Ce package est accessible au public, facile à installer et à exécuter. Bon pour démontrer les performances du CPU.

Quiconque connaît l'architecture des accélérateurs graphiques peut suggérer que ce package est encore meilleur pour tester les périphériques informatiques avec une architecture GPU. Cependant, la version 2011 de l'architecture CUDA pour Fermi est téléchargeable en ligne.

Dans ce guide, je vais donner un exemple de construction et d'exécution de HPL pour le GPU.

Comment contrôler l'accès au logiciel?
Comment installer CUDA?
Comment installer openmpi?
Comment installer openblas?
Comment installer HPL pour GPU?

Installation du module MODULES

Pour gérer les variables d'environnement, installez le package MODULES et préparez un fichier de module de test.

$ yum install environment-modules
$ mcedit /etc/modulefailes/test/v1.0
  #%Module1.0
  proc ModulesHelp { } {
    global version
      puts stderr "Modulefile for test v1.0"
      }
      set version v1.0
      module-whatis "Modulefile for test v1.0"
      # Our environment
      setenv MAINDIR /nfs/software/test/v1.0
      prepend-path PATH $env(MAINDIR)/bin
      prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
      prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
      prepend-path LIBRARY_PATH $env(MAINDIR)/lib64
      prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib64

Vérifier les fichiers du module

La probabilité de faire une erreur lors de la préparation du module est assez élevée. Par conséquent, je vérifie tous les chemins d'accès spécifiés dans le fichier de module. Afin de ne pas vérifier chaque chemin manuellement, j'ai préparé un script. Si 0, le chemin est correct.

$ cat check-modulefiles
  #!/bin/sh
  ModulePath=$1
  MainDir=$(cat $ModulePath | grep "setenv MAINDIR" | cut -f7 -d " ")
  ListOfPaths=$(cat $ModulePath | grep path | cut -f7 -d " ")
  #Replace MainDir setenv in modulefile
  ListOfPaths=$(echo $ListOfPaths | sed "s@\$env(MAINDIR)@$MainDir@g")
  for u in $ListOfPaths; do
    ls -la $u 1> /dev/null 2> /dev/null;
    printf "%60s %4d\n" $u $?;
  done
$ chmod +x check-modulefiles
$ ./check-modulefiles /etc/modulefiles/test/v1.0
  /nfs/software/test/v1.0/bin            0
  /nfs/software/test/v1.0/include        0
  /nfs/software/test/v1.0/include        0
  /nfs/software/test/v1.0/lib64          0
  /nfs/software/test/v1.0/lib64          0

Commandes de gestion des modules

$ module avail
$ module add cuda/v10.1
$ nvcc –version
  Cuda compilation tools, release 10.1, V10.1.168
$ module switch cuda/v10.1 cuda/v9.2
$ nvcc –version
  Cuda compilation tools, release 9.2, V9.2.88
$ module list
$ module rm cuda/v9.2

1. Voyons la liste des modules disponibles pour la connexion.
2. Connectez le module
3-4. Vérifiez la version
5. Changez le module
6-7. Vérifions la version
8. Voyons la liste des modules connectés
9. Retirez le module de la liste des modules connectés

Installer CUDA

Téléchargez CUDA 9.2 pour Centos 7 ici .

$ chmod +x cuda_9.2.run
$ ./cuda_9.2.run
  Do you accept the previously read EULA? accept
  Install the CUDA 9.2 Toolkit? yes
  Enter Toolkit Location: /nfs/software/cuda/v9.2
  Do you want to install a symbolic link at /usr/local/cuda? no
  Install the CUDA 9.2 Samples? no
$ cat /etc/modulefiles/cuda/v9.2
  #%Module1.0
  proc ModulesHelp { } {
    global version
      puts stderr "Modulefile for cuda v9.2"
      }
      set version v9.2
      module-whatis "Modulefile for cuda v9.2"
      # Our environment
      setenv MAINDIR /nfs/software/cuda/v9.2
      prepend-path PATH $env(MAINDIR)/bin
      prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
      prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
      prepend-path LIBRARY_PATH $env(MAINDIR)/lib64/stubs
      prepend-path LIBRARY_PATH $env(MAINDIR)/lib64
      prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib64/stubs
      prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib64
  $ module add cuda/v9.2
  $ nvcc --version
  Cuda compilation tools, release 9.2, V9.2.148

Installer OpenBLAS

$ wget https://github.com/xianyi/OpenBLAS/archive/v0.3.6.tar.gz
$ tar -xzvf v0.3.6.tar.gz
$ cd OpenBLAS-0.3.6
$ mkdir -p /nfs/software/openblas/v0.3.6
$ make -j4
$ make PREFIX=/nfs/software/openblas/v0.3.6/ install
$ ls -la /nfs/software/openblas/v0.3.6/lib/
$ cat /etc/modulefiles/openblas/v0.3.6
  #%Module1.0
  proc ModulesHelp { } {
    global version
      puts stderr "Modulefile for openblas v0.3.6"
      }
      set version v0.3.6
      module-whatis "Modulefile for openblas v0.3.6"
      # Our environment
      setenv MAINDIR /nfs/software/openblas/v0.3.6
      prepend-path PATH $env(MAINDIR)/bin
      prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
      prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
      prepend-path LIBRARY_PATH $env(MAINDIR)/lib
      prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib
$ ls -la /nfs/software/openblas/v0.3.6/lib

Installer OpenMPI

wget https://download.open-mpi.org/release/open-mpi/v2.1/openmpi-2.1.6.tar.gz
$ tar -xzvf openmpi-2.1.6.tar.gz
$ cd openmpi-2.1.6
$ mkdir -p /nfs/software/openmpi/v2.1.6
$ module add cuda/v9.2
$ ./configure --prefix=/nfs/software/openmpi/v2.1.6/ --with-cuda --enable-static
$ make
$ make install
$ cat /etc/modulefiles/openmpi/v2.1.6
#%Module1.0
proc ModulesHelp { } {
  global version
    puts stderr "Modulefile for openmpi v2.1.6"
    }
    set version v2.1.6
    module-whatis "Modulefile for openmpi v2.1.6"
    # Our environment
    setenv MAINDIR /nfs/software/openmpi/v2.1.6
    prepend-path PATH $env(MAINDIR)/bin
    prepend-path C_INCLUDE_PATH $env(MAINDIR)/include
    prepend-path CPLUS_INCLUDE_PATH $env(MAINDIR)/include
    prepend-path LIBRARY_PATH $env(MAINDIR)/lib
    prepend-path LD_LIBRARY_PATH $env(MAINDIR)/lib
$ module add openmpi/v2.1.6
$ mpirun --version
mpirun (Open MPI) 2.1.6

Installer HPL pour GPU

Configurez les variables d'environnement en connectant les modules et téléchargez HPL 2.0.

$ module add openmpi/v2.1.6
$ module add cuda/v9.2
$ module add openblas/v0.3.6
$ wget https://developer.download.nvidia.com/assets/cuda/secure/AcceleratedLinpack/hpl-2.0_FERMI_v15.tgz
$ tar -xvf hpl-2.0_FERMI_v15.tgz
$ mv hpl-2.0_FERMI_v15.tgz hpl-2.0
$ cd hpl-2.0

Avant l'assemblage, vous devez éditer plusieurs fichiers. Le premier est Make.CUDA dans le répertoire hpl-2.0. Copiez le code suivant dans Make.CUDA:

$ cat Make.CUDA
  SHELL        = /bin/sh
  CD           = cd
  CP           = cp
  LN_S         = ln -fs
  MKDIR        = mkdir -p
  RM           = /bin/rm -f
  TOUCH        = touch
  ARCH         = CUDA
  
  TOPdir       = /home/user/hpl-2.0
  INCdir       = $(TOPdir)/include
  BINdir       = $(TOPdir)/bin/$(ARCH)
  LIBdir       = $(TOPdir)/lib/$(ARCH)
  HPLlib       = $(LIBdir)/libhpl.a
  
  MPdir        = /nfs/software/openmpi/v2.1.6
  MPinc        = -I$(MPdir)/include
  MPlib        = -L$(MPdir)/lib -lmpi
  
  LAdir        = /nfs/software/openblas/v0.3.6
  LAinc        = -I$(LAdir)/include
  LAlib        = -L$(TOPdir)/src/cuda -ldgemm -L/nfs/software/cuda/v9.2/lib64 -lcuda -lcudart -lcublas -L$(LAdir)/lib -lopenblas
  F2CDEFS      = -DAdd__ -DF77_INTEGER=int -DStringSunStyle
  HPL_INCLUDES = -I$(INCdir) -I$(INCdir)/$(ARCH) $(LAinc) $(MPinc)
  HPL_LIBS     = $(HPLlib) $(LAlib) $(MPlib)
  HPL_OPTS     =  -DCUDA
  HPL_DEFS     = $(F2CDEFS) $(HPL_OPTS) $(HPL_INCLUDES)
  CC           = mpicc
  CCFLAGS      = -fopenmp -lpthread -fomit-frame-pointer -O3 -funroll-loops $(HPL_DEFS)
  CCNOOPT      = $(HPL_DEFS) -O0 -w
  LINKER       = $(CC)
  LINKFLAGS    = $(CCFLAGS)
  ARCHIVER     = ar
  ARFLAGS      = r
  RANLIB       = echo
  MAKE         = make TOPdir=$(TOPdir)

11. Chemin d'accès au répertoire
hpl -2.0 17. Chemin d'accès à OpenMPI
21. Chemin d'accès à OpenBLAS
23. Chemin d'accès à CUDA lib64

Remplacez les lignes suivantes dans le fichier hpl-2.0 / src / cuda / cuda_dgemm.c:

$ mcedit src/cuda/cuda_dgemm.c
  …
  // handle2 = dlopen ("libmkl_intel_lp64.so", RTLD_LAZY);
  handle2 = dlopen ("libopenblas.so", RTLD_LAZY);
  …
  // dgemm_mkl = (void(*)())dlsym(handle, "dgemm");
  dgemm_mkl = (void(*)())dlsym(handle, "dgemm_");
  …
  // handle = dlopen ("libmkl_intel_lp64.so", RTLD_LAZY);
  handle = dlopen ("libopenblas.so", RTLD_LAZY);
  …
  // mkl_dtrsm = (void(*)())dlsym(handle2, "dtrsm");
  mkl_dtrsm = (void(*)())dlsym(handle2, "dtrsm_");

Créez et exécutez HPL sur un GPU 4x:

$ make arch=CUDA
$ cd bin/CUDA
$ export LD_LIBRARY_PATH=/home/user/hpl-2.0/src/cuda/:$LD_LIBRARY_PATH
$ mpirun -np 4 ./xhpl
  ================================================================================
  HPLinpack 2.0  --  High-Performance Linpack benchmark  --   September 10, 2008
  Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
  Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
  Modified by Julien Langou, University of Colorado Denver
  ================================================================================

  An explanation of the input/output parameters follows:
  T/V    : Wall time / encoded variant.
  N      : The order of the coefficient matrix A.
  NB     : The partitioning blocking factor.
  P      : The number of process rows.
  Q      : The number of process columns.
  Time   : Time in seconds to solve the linear system.
  Gflops : Rate of execution for solving the linear system.

  The following parameter values will be used:

  N      :   25000
  NB     :     768
  PMAP   : Row-major process mapping
  P      :       2
  Q      :       2
  PFACT  :    Left
  NBMIN  :       2
  NDIV   :       2
  RFACT  :    Left
  BCAST  :   1ring
  DEPTH  :       1
  SWAP   : Spread-roll (long)
  L1     : no-transposed form
  U      : no-transposed form
  EQUIL  : yes
  ALIGN  : 8 double precision words

  --------------------------------------------------------------------------------

  - The matrix A is randomly generated for each test.
  - The following scaled residual check will be computed:
        ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
  - The relative machine precision (eps) is taken to be               1.110223e-16
  - Computational tests pass if scaled residuals are less than                16.0

  ================================================================================
  T/V                N    NB     P     Q               Time                 Gflops
  --------------------------------------------------------------------------------
  WR10L2L2       25000   768     2     2              16.72              6.232e+02
  --------------------------------------------------------------------------------
  ||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=        0.0019019 ...... PASSED
  ================================================================================

  Finished      1 tests with the following results:
                1 tests completed and passed residual checks,
                0 tests completed and failed residual checks,
                0 tests skipped because of illegal input values.
  --------------------------------------------------------------------------------

  End of Tests.
  ================================================================================

Pour modifier les paramètres de test, utilisez le fichier hpl-2.0 / bin / CUDA / HPL.dat

Installer la version GPL de HPL avec OpenBLAS