ECS-虚拟机内存性能测试

原创许玉冲 2022-12-12

1251

为满足客户对IaaS私有云的性能要求，虚拟机内存性能应达到一定水平
虚拟机上安装Stream benchmark测试工具，在不运行应用情况下，测试内存4线程读写能力，结果为MB/s。

该四个指标算一个测试项
1）copy性能≥50000 MB/s。
2）scala性能≥50000 MB/s。
3）add性能≥50000 MB/s。
4）triad性能≥50000 MB/s

# 1,stream benchmark下载

下载基准测试程序请移步https://www.cs.virginia.edu/stream/FTP/Code/。如果是使用C，你只需要下载一个文件stream.c。

# 2，编译

如果是C程序源码，你可以使用gcc或g++进行编译。编译使用
gcc -O stream.c -o stream_exe
你也可以加入openmp编译选项进行多核访存带宽的测试，加入openmp选项的编译使用
gcc -O -fopenmp stream.c -o stream_omp_exe

# 3. 运行及结果分析

直接使用./stream_omp_exe即可运行，如果多线程并未启动，可在运行前手动设置运行的进程数，如export OMP_NUM_THREADS=20
下面着重讲一下4个操作Copy，Scale，Add和Triad。
Copy为最简单的操作，即从一个内存单元中读取一个数，并复制到另一个内存单元，有2次访存操作。
Scale是乘法操作，从一个内存单元中读取一个数，与常数scale相乘，得到的结果写入另一个内存单元，有2次访存。
Add是加法操作，从两个内存单元中分别读取两个数，将其进行加法操作，得到的结果写入另一个内存单元中，有2次读和1次写共3次访存。
Triad是前面三种的结合，先从内存中读取一个数，与scale相乘得到一个乘积，然后从另一个内存单元中读取一个数与之前的乘积相加，得到的结果再写入内存。所以，有2次读和1次写共3次访存操作。
从上述的结果我们可以看出，测试的内存带宽Add>Triad>Copy>Scale。这是因为访存次数越多，内隐藏的访存延迟越大，得到的带宽越大。同理，运算的操作越复杂，操作时间就越长，程序运行时间就越长，得到的访存带宽就相应减少。这就是为什么3次访存的操作得到的带宽比2次访存操作得到的要大，而相同访存次数的操作，加法要比乘法得到的结果要好。

[root@ecs-3b4a ~]# wget https://www.cs.virginia.edu/stream/FTP/Code/stream.c
--2022-12-12 12:16:59--  https://www.cs.virginia.edu/stream/FTP/Code/stream.c
Resolving www.cs.virginia.edu (www.cs.virginia.edu)... 128.143.67.11
Connecting to www.cs.virginia.edu (www.cs.virginia.edu)|128.143.67.11|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19967 (19K) [text/plain]
Saving to: ?.tream.c?

100%[==================================================================================================================================================>] 19,967      87.4KB/s   in 0.2s   

2022-12-12 12:17:02 (87.4 KB/s) - ?.tream.c?.saved [19967/19967]

[root@ecs-3b4a ~]# ls
byte-unixbench-5.1.3  stream.c  v5.1.3
[root@ecs-3b4a ~]#  gcc -O stream.c -o stream_exe
[root@ecs-3b4a ~]# ls
byte-unixbench-5.1.3  stream.c  stream_exe  v5.1.3
[root@ecs-3b4a ~]# ll
total 184
drwxrwxr-x 3 root root   4096 Jun  5  2015 byte-unixbench-5.1.3
-rw-r--r-- 1 root root  19967 Nov 30  2021 stream.c
-rwxr-xr-x 1 root root  12960 Dec 12 12:18 stream_exe
-rw-r--r-- 1 root root 145908 Dec 12 12:10 v5.1.3
[root@ecs-3b4a ~]# OMP_NUM_THERAD=4
[root@ecs-3b4a ~]# ./stream_exe 
-------------------------------------------------------------
STREAM version $Revision: 5.10 $
-------------------------------------------------------------
This system uses 8 bytes per array element.
-------------------------------------------------------------
Array size = 10000000 (elements), Offset = 0 (elements)
Memory per array = 76.3 MiB (= 0.1 GiB).
Total memory required = 228.9 MiB (= 0.2 GiB).
Each kernel will be executed 10 times.
 The *best* time for each kernel (excluding the first iteration)
 will be used to compute the reported bandwidth.
-------------------------------------------------------------
Your clock granularity/precision appears to be 1 microseconds.
Each test below will take on the order of 7233 microseconds.
   (= 7233 clock ticks)
Increase the size of the arrays if this shows that
you are not getting at least 20 clock ticks per test.
-------------------------------------------------------------
WARNING -- The above is only a rough guideline.
For best results, please be sure you know the
precision of your system timer.
-------------------------------------------------------------
Function    Best Rate MB/s  Avg time     Min time     Max time
Copy:           14719.4     0.010967     0.010870     0.011090
Scale:          14241.2     0.011349     0.011235     0.011477
Add:            16492.4     0.014596     0.014552     0.014676
Triad:          16534.7     0.014555     0.014515     0.014691
-------------------------------------------------------------
Solution Validates: avg error less than 1.000000e-13 on all three arrays
-------------------------------------------------------------
[root@ecs-3b4a ~]#

linux

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

ECS-虚拟机内存性能测试

评论