# 环境搭建

# 安装 OpenBLAS

  1. 官方下载 OpenBLAS-0.3.10.tar.gz
  2. 解压后,在解压目录中执行 make

OS               ... Linux
Architecture     ... x86_64
BINARY           ... 64bit
C compiler       ... GCC  (cmd & version : cc (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4)
Fortran compiler ... GFORTRAN  (cmd & version : GNU Fortran (Ubuntu 4.8.4-2ubuntu1~14.04.4) 4.8.4)
Library Name     ... libopenblas_haswellp-r0.3.10.a (Multi-threading; Max num-threads is 32)

To install the library, you can run "make PREFIX=/path/to/your/installation install".
  1. 执行 make PREFIX=/home/riolu/HPL/openblas install

# 安装 openMPI

  1. 官方下载 openmpi-4.0.5.tar.gz
  2. 解压后,在解压目录中执行 ./configure --prefix=/home/riolu/HPL/openmpi
Resource Managers
Cray Alps: no
Grid Engine: no
LSF: no
Moab: no
Slurm: yes
ssh/rsh: yes
Torque: no

OMPIO File Systems
Generic Unix FS: yes
Lustre: no
PVFS2/OrangeFS: no
  1. make , make install
  2. 修改 ~/.bashrc ,在后面加上
export PATH=/home/riolu/HPL/openmpi/bin:$PATH
export INCLUDE=/home/riolu/HPL/openmpi/include:$INCLUDE
export LD_LIBRARY_PATH=/home/riolu/HPL/openmpi/lib:$LD_LIBRARY_PATH

保存后 source ~/.bashrc
(libreOffice 保存会改编码,还是在 Windows 上改完传过去或者在 jupyter 上改

# HPL 安装与编译

  1. 官方下载网站下载:hpl.tar.gz
  2. 进入安装文件夹下的 setup,在 setup 中中找到 Make.Linux_PII_CBLAS ,将其放置到上层目录并且命名为 Make.Linux
  3. 修改 Make.Linux
ARCH         = Linux
TOPdir       = $(HOME)/HPL/hpl-2.3 /*改为hpl解压后产生文件夹*/
MPdir        = $(HOME)/HPL/openmpi /*改为mpich安装文件夹*/
MPinc        = -I$(MPdir)/include
MPlib        = -L$(MPdir)/lib
LAdir        = $(HOME)/HPL/openblas
LAinc        = -I$(LAdir)/include
LAlib        = $(LAdir)/lib/libopenblas_haswellp-r0.3.10.a
CC           = $(MPdir)/bin/mpicc
CCFLAGS      = $(HPL_DEFS) -fomit-frame-pointer -fopenmp -O3 -funroll-loops
LINKER       = $(MPdir)/bin/mpif77

执行 make arch=Linux

  1. 此时查看安装文件夹下 bin,会看到有 Linux 文件夹,里面有 HPL.datxhpl ,安装完成。
  2. 执行 mpirun -np 4 ./xhpl ,得到正确结果!


# 测试优化

  1. 查看内存:
$ free -h
                total       used       free     shared    buffers     cached
Mem:          251G       138G       113G        14M       4.9G       104G
-/+ buffers/cache:        29G       222G
Swap:         7.6G       1.2G       6.4G
  1. 测试运行:
~/HPL/hpl-2.3/bin/Linux$ mpirun -np 8 ./xhpl
HPLinpack 2.3  --  High-Performance Linpack benchmark  --   December 2, 2018
Written by A. Petitet and R. Clint Whaley,  Innovative Computing Laboratory, UTK
Modified by Piotr Luszczek, Innovative Computing Laboratory, UTK
Modified by Julien Langou, University of Colorado Denver
An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N      :   70000
NB     :     256      192
PMAP   : Row-major process mapping
P      :       2
Q      :       4
PFACT  :    Left
NBMIN  :       2
NDIV   :       2
RFACT  :    Left
BCAST  :   1ring
DEPTH  :       0
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words
+ The matrix A is randomly generated for each test.
+ The following scaled residual check will be computed:
        ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
+ The relative machine precision (eps) is taken to be               1.110223e-16
+ Computational tests pass if scaled residuals are less than                16.0
T/V                N    NB     P     Q               Time                 Gflops
WR00L2L2       70000   256     2     4             894.95             2.5551e+02
HPL_pdgesv() start time Fri Dec  4 22:56:34 2020
HPL_pdgesv() end time   Fri Dec  4 23:11:29 2020
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.51800829e-03 ...... PASSED
T/V                N    NB     P     Q               Time                 Gflops
WR00L2L2       70000   192     2     4             927.80             2.4647e+02
HPL_pdgesv() start time Fri Dec  4 23:12:17 2020
HPL_pdgesv() end time   Fri Dec  4 23:27:45 2020
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   1.48198443e-03 ...... PASSED
Finished      2 tests with the following results:
                2 tests completed and passed residual checks,
                0 tests completed and failed residual checks,
                0 tests skipped because of illegal input values.
End of Tests.
  1. 结果整合:

     T/V                N    NB     P     Q               Time                 Gflops
     WR00L2L2       70000   192     2     4             927.80             2.4647e+02
     WR00L2L2       70000   256     2     4             910.19             2.5124e+02
     WR00L2L2       70000   256     2     4             894.95             2.5551e+02
     WR00L2L2       70000   256     2     4             902.53             2.5337e+02
     WR00L2L2       70000   336     2     4             877.33             2.6065e+02
     WR00L2L2       70000   336     2     4             894.92             2.5553e+02
     WR00L2L2       70000   384     2     4             862.82             2.6503e+02

看起来选 NB=384 比较好呢,实测浮点峰值为 265.03Gflops=2.6503 千亿次 / 秒。

# 绑定进程

~/HPL/openmpi/bin/mpirun -np 16 --bind-to core --map-by core --report-bindings ./xhpl

mpirun -np 16 --bind-to core --map-by core --report-bindings ./xhpl
[amax:18074] MCW rank 0 bound to socket 0[core 0[hwt 0-1]]: [BB/../../../../../../..][../../../../../../../..]
[amax:18074] MCW rank 1 bound to socket 0[core 1[hwt 0-1]]: [../BB/../../../../../..][../../../../../../../..]
[amax:18074] MCW rank 2 bound to socket 0[core 2[hwt 0-1]]: [../../BB/../../../../..][../../../../../../../..]
[amax:18074] MCW rank 3 bound to socket 0[core 3[hwt 0-1]]: [../../../BB/../../../..][../../../../../../../..]
[amax:18074] MCW rank 4 bound to socket 0[core 4[hwt 0-1]]: [../../../../BB/../../..][../../../../../../../..]
[amax:18074] MCW rank 5 bound to socket 0[core 5[hwt 0-1]]: [../../../../../BB/../..][../../../../../../../..]
[amax:18074] MCW rank 6 bound to socket 0[core 6[hwt 0-1]]: [../../../../../../BB/..][../../../../../../../..]
[amax:18074] MCW rank 7 bound to socket 0[core 7[hwt 0-1]]: [../../../../../../../BB][../../../../../../../..]
[amax:18074] MCW rank 8 bound to socket 1[core 8[hwt 0-1]]: [../../../../../../../..][BB/../../../../../../..]
[amax:18074] MCW rank 9 bound to socket 1[core 9[hwt 0-1]]: [../../../../../../../..][../BB/../../../../../..]
[amax:18074] MCW rank 10 bound to socket 1[core 10[hwt 0-1]]: [../../../../../../../..][../../BB/../../../../..]
[amax:18074] MCW rank 11 bound to socket 1[core 11[hwt 0-1]]: [../../../../../../../..][../../../BB/../../../..]
[amax:18074] MCW rank 12 bound to socket 1[core 12[hwt 0-1]]: [../../../../../../../..][../../../../BB/../../..]
[amax:18074] MCW rank 13 bound to socket 1[core 13[hwt 0-1]]: [../../../../../../../..][../../../../../BB/../..]
[amax:18074] MCW rank 14 bound to socket 1[core 14[hwt 0-1]]: [../../../../../../../..][../../../../../../BB/..]
[amax:18074] MCW rank 15 bound to socket 1[core 15[hwt 0-1]]: [../../../../../../../..][../../../../../../../BB]
