- HPCG 测试 OPENBLAS+OPENMPI -
# 环境搭建
前几篇中已经配置好了 c++ 编译器,以及 openmpi 并行环境。
#TODO 若是在 Intel 处理器上建议用 Intel 自家的编译器、mpi、hpcg 执行文件……
# HPCG 安装与编译
- 官方下载网站下载:hpcg-master
- 进入
setup
文件夹下,修改Make.Linux_MPI
,另存为Make.Linux
。
MPdir = $(HOME)/HPL/openmpi
MPinc = -I$(MPdir)/include
MPlib = -L$(MPdir)/lib
CXX = $(MPdir)/bin/mpicxx
- 设置安装环境:到安装目录下,
mkdir hpcg
,cd hpcg
,~/HPL/hpcg-master/configure Linux
- 安装测试:
make
,cd bin
,mpirun -np 16 ./xhpcg
。
hpcg.dat
很简单,第三行是执行的问题的规模,第四行是执行的时间(秒)。
HPCG 测试很快(整机仅需几分钟),测试时需要不断调节 n 值,以获得一个较好的测试结果。
n 值不能设置太小,否则测试完全在缓存中进行,测试需要保证内存占用 > 25%。
官方规定运行时间必须要 1800s 才能得到一个正式的结果。但 t 较小时得到的结果相差不大。 - 测试结束后在 bin 文件夹中得到一个 HPCG-Benchmark 文件,这个文件详细记录了运行结果,运行的问题规模占用内存的量,以及各个主要的函数所占运行时间。
Ns = 256 256 128
t = 1800
Benchmark Time Summary::Total=1890.2
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=8.03429
Final Summary::HPCG 2.4 rating for historical reasons is=8.61255
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Please upload results from the YAML file contents to=http://hpcg-benchmark.org
-----
Ns = 256 256 128
t = 60
Benchmark Time Summary::Total=144.725
Final Summary=
Final Summary::HPCG result is VALID with a GFLOP/s rating of=8.01359
Final Summary::HPCG 2.4 rating for historical reasons is=8.65271
Final Summary::Reference version of ComputeDotProduct used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeSPMV used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeMG used=Performance results are most likely suboptimal
Final Summary::Reference version of ComputeWAXPBY used=Performance results are most likely suboptimal
Final Summary::Results are valid but execution time (sec) is=144.725
Final Summary::Official results execution time (sec) must be at least=1800
# 系统信息获取
# CPU
- 逻辑 CPU 个数与 CPU 型号
cat /proc/cpuinfo | grep name | cut -f2 -d: | uniq -c
32 Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz
网上查到 TDP 为 85W,睿频 3.0 GHz。 - 物理 CPU 个数
grep "physical id" /proc/cpuinfo|sort -u
physical id : 0
physical id : 1 - 每个物理 CPU 内核个数
grep "cpu cores" /proc/cpuinfo|uniq
cpu cores : 8 - 每个物理 CPU 上逻辑 CPU 个数
grep "siblings" /proc/cpuinfo|uniq
siblings : 16
逻辑 CPU 个数是物理个数的两倍,说明开启了超线程。 - 每个逻辑 CPU 对应的物理位置
cat /proc/cpuinfo | grep -E "physical id|processor"
# Linux
- 操作系统信息
uname -a
Linux amax 4.2.0-42-generic #49~14.04.1-Ubuntu SMP Wed Jun 29 20:22:11 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux - 操作系统发行版信息
cat /etc/issue
Ubuntu 14.04.6 LTS - 内存
cat /proc/meminfo
或
free -h
集群共有 251.8G 内存,猜测是 64G×4。 - 内存设备
dmidecode |grep -A16 "Memory Device$"
或
dmidecode -t memory
无权限…… - 硬盘空间
df -hl
Filesystem Size Used Avail Use% Mounted on
udev 126G 12K 126G 1% /dev
tmpfs 26G 2.1M 26G 1% /run
/dev/sda6 188G 37G 142G 21% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
none 5.0M 0 5.0M 0% /run/lock
none 126G 1.3M 126G 1% /run/shm
none 100M 188K 100M 1% /run/user
/dev/sda1 453M 73M 353M 17% /boot
/dev/sda7 274G 258G 1.8G 100% /home
/dev/sdc1 1.8T 167G 1.6T 10% /data1
/dev/sdb1 1.8T 33G 1.7T 2% /data0
- 硬盘设备
fdisk -l
无信息?需要管理员? - 网卡信息
dmesg | grep -i eth
- 设备接口信息
lspci
-v
:显示更多的 PCI 接口装置的详细信息
-vv
:比 -v 还要更详细的信息
-n
:直接观察 PCI 的 ID 而不是厂商名称
-s 00:01.0
:查看地址 00:01.0 的信息 - 查看节点 / 主机名称
cat /etc/hosts
127.0.0.1 localhost
127.0.1.1 amax
# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
经测试集群应该是在 amax 节点下布置了两块 Intel Xeon CPU,localhost 指向 amax。
即单节点双路 8 核?
@amax:~/HPL$mpirun -np 16 ./cpi | |
Process 5 of 16 is on amax | |
Process 7 of 16 is on amax | |
Process 8 of 16 is on amax | |
Process 9 of 16 is on amax | |
Process 10 of 16 is on amax | |
Process 12 of 16 is on amax | |
Process 14 of 16 is on amax | |
Process 0 of 16 is on amax | |
Process 1 of 16 is on amax | |
Process 2 of 16 is on amax | |
Process 3 of 16 is on amax | |
Process 4 of 16 is on amax | |
Process 11 of 16 is on amax | |
Process 13 of 16 is on amax | |
Process 15 of 16 is on amax | |
Process 6 of 16 is on amax | |
pi is approximately 3.1415926544231274, Error is 0.0000000008333343 | |
wall clock time = 0.004565 | |
@amax:~/HPL$ mpirun -np 16 -nolocal ./cpi | |
-------------------------------------------------------------------------- | |
All nodes which are allocated for this job are already filled. | |
-------------------------------------------------------------------------- |
- 查看当前进程
top
top - 15:32:47 up 148 days, 5:36, 6 users, load average: 107.26, 100.88, 63.62
Tasks: 933 total, 17 running, 916 sleeping, 0 stopped, 0 zombie
%Cpu(s): 81.9 us, 13.4 sy, 0.0 ni, 4.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 26404217+total, 17041590+used, 93626272 free, 3731332 buffers
KiB Swap: 7999484 total, 1274312 used, 6725172 free. 88929968 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND P
5173 riolu 20 0 5448340 2.788g 8492 R 429.5 1.1 24:00.06 xhpl 30 1
5033 riolu 20 0 5450740 2.788g 8404 R 428.5 1.1 23:58.68 xhpl 13 1
1
: 查看各逻辑 cpu 情况。
F
- 方向键选择P=Last Used Cpu
- 空格
:显示进程在哪个 CPU 上运行。
q
: 退出。
参考:Linux 查看 CPU 和内存使用情况