- HPCG 测试 OPENBLAS+OPENMPI -

# 环境搭建

前几篇中已经配置好了 c++ 编译器,以及 openmpi 并行环境。

#TODO 若是在 Intel 处理器上建议用 Intel 自家的编译器、mpi、hpcg 执行文件……

# HPCG 安装与编译

  1. 官方下载网站下载:hpcg-master
  2. 进入 setup 文件夹下,修改 Make.Linux_MPI ,另存为 Make.Linux
MPdir        = $(HOME)/HPL/openmpi
MPinc        = -I$(MPdir)/include
MPlib        = -L$(MPdir)/lib

CXX          = $(MPdir)/bin/mpicxx
  1. 设置安装环境:到安装目录下, mkdir hpcg , cd hpcg , ~/HPL/hpcg-master/configure Linux
  2. 安装测试: makecd binmpirun -np 16 ./xhpcg
    hpcg.dat 很简单,第三行是执行的问题的规模,第四行是执行的时间(秒)。
    HPCG 测试很快(整机仅需几分钟),测试时需要不断调节 n 值,以获得一个较好的测试结果。
    n 值不能设置太小,否则测试完全在缓存中进行,测试需要保证内存占用 > 25%。
    官方规定运行时间必须要 1800s 才能得到一个正式的结果。但 t 较小时得到的结果相差不大。
  3. 测试结束后在 bin 文件夹中得到一个 HPCG-Benchmark 文件,这个文件详细记录了运行结果,运行的问题规模占用内存的量,以及各个主要的函数所占运行时间。
Ns = 256 256 128
t = 1800

Benchmark Time Summary::Total=1890.2
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=8.03429
Final Summary::HPCG 2.4 rating for historical reasons is=8.61255
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Please upload results from the YAML file contents to=http://hpcg-benchmark.org

-----

Ns = 256 256 128
t = 60

Benchmark Time Summary::Total=144.725
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=8.01359
Final Summary::HPCG 2.4 rating for historical reasons is=8.65271
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Results are valid but execution time (sec) is=144.725
Final Summary::Official results execution time (sec) must be at least=1800

# 系统信息获取

# CPU

  1. 逻辑 CPU 个数与 CPU 型号
    cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
    32 Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz
    网上查到 TDP 为 85W,睿频 3.0 GHz。
  2. 物理 CPU 个数
    grep "physical id" /proc/cpuinfo|sort -u
    physical id : 0
    physical id : 1
  3. 每个物理 CPU 内核个数
    grep "cpu cores" /proc/cpuinfo|uniq
    cpu cores : 8
  4. 每个物理 CPU 上逻辑 CPU 个数
    grep "siblings" /proc/cpuinfo|uniq
    siblings : 16
    逻辑 CPU 个数是物理个数的两倍,说明开启了超线程。
  5. 每个逻辑 CPU 对应的物理位置
    cat /proc/cpuinfo | grep -E "physical id|processor"

# Linux

  1. 操作系统信息
    uname -a
    Linux amax 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
  2. 操作系统发行版信息
    cat /etc/issue
    Ubuntu 14.04.6 LTS
  3. 内存
    cat /proc/meminfo

    free -h
    集群共有 251.8G 内存,猜测是 64G×4。
  4. 内存设备
    dmidecode |grep -A16 "Memory Device$"

    dmidecode -t memory
    无权限……
  5. 硬盘空间
    df -hl
Filesystem      Size  Used Avail Use% Mounted on
udev            126G   12K  126G   1% /dev
tmpfs            26G  2.1M   26G   1% /run
/dev/sda6       188G   37G  142G  21% /
none            4.0K     0  4.0K   0% /sys/fs/cgroup
none            5.0M     0  5.0M   0% /run/lock
none            126G  1.3M  126G   1% /run/shm
none            100M  188K  100M   1% /run/user
/dev/sda1       453M   73M  353M  17% /boot
/dev/sda7       274G  258G  1.8G 100% /home
/dev/sdc1       1.8T  167G  1.6T  10% /data1
/dev/sdb1       1.8T   33G  1.7T   2% /data0
  1. 硬盘设备
    fdisk -l
    无信息?需要管理员?
  2. 网卡信息
    dmesg | grep -i eth
  3. 设备接口信息
    lspci
    -v :显示更多的 PCI 接口装置的详细信息
    -vv :比 -v 还要更详细的信息
    -n :直接观察 PCI 的 ID 而不是厂商名称
    -s 00:01.0 :查看地址 00:01.0 的信息
  4. 查看节点 / 主机名称
    cat /etc/hosts
127.0.0.1       localhost
127.0.1.1       amax

# The following lines are desirable for IPv6 capable hosts
::1     ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

经测试集群应该是在 amax 节点下布置了两块 Intel Xeon CPU,localhost 指向 amax。
即单节点双路 8 核?

@amax:~/HPL$mpirun -np 16 ./cpi
Process 5 of 16 is on amax
Process 7 of 16 is on amax
Process 8 of 16 is on amax
Process 9 of 16 is on amax
Process 10 of 16 is on amax
Process 12 of 16 is on amax
Process 14 of 16 is on amax
Process 0 of 16 is on amax
Process 1 of 16 is on amax
Process 2 of 16 is on amax
Process 3 of 16 is on amax
Process 4 of 16 is on amax
Process 11 of 16 is on amax
Process 13 of 16 is on amax
Process 15 of 16 is on amax
Process 6 of 16 is on amax
pi is approximately 3.1415926544231274, Error is 0.0000000008333343
wall clock time = 0.004565
@amax:~/HPL$ mpirun -np 16 -nolocal ./cpi
--------------------------------------------------------------------------
All nodes which are allocated for this job are already filled.
--------------------------------------------------------------------------
  1. 查看当前进程
    top
top - 15:32:47 up 148 days,  5:36,  6 users,  load average: 107.26, 100.88, 63.62
Tasks: 933 total,  17 running, 916 sleeping,   0 stopped,   0 zombie
%Cpu(s): 81.9 us, 13.4 sy,  0.0 ni,  4.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:  26404217+total, 17041590+used, 93626272 free,  3731332 buffers
KiB Swap:  7999484 total,  1274312 used,  6725172 free. 88929968 cached Mem

        PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                               P
        5173 riolu     20   0 5448340 2.788g   8492 R 429.5  1.1  24:00.06 xhpl                                                 30 1
        5033 riolu     20   0 5450740 2.788g   8404 R 428.5  1.1  23:58.68 xhpl                                                 13 1

1 : 查看各逻辑 cpu 情况。
F - 方向键选择P=Last Used Cpu - 空格 :显示进程在哪个 CPU 上运行。
q : 退出。
参考:Linux 查看 CPU 和内存使用情况

更新于 阅读次数