暂无图片
暂无图片
4
暂无图片
暂无图片
暂无图片

你的数据库服务器IO调度算法不对,难怪那么慢

数据库干货铺 2020-08-24
991

之前的推文已经分享了数据库优化的方法,链接为https://mp.weixin.qq.com/s/6Atzk9UKPJRxxAs0nsKBXg 。其中操作系统部分介绍了IO调度算法的优化,本文将通过压力测试的方式来对比不同的调度算法下磁盘IO的表现。

1 准备工作

1.1  安装sysbench

本次采用sysbench进行压测,先安装sysbench,步骤如下:

    curl -s https://packagecloud.io/install/repositories/akopytov/sysbench/script.rpm.sh | sudo bash
    yum -y install sysbench
    sysbench --version

    1.2  准备测试文件

    生成后续需要使用的测试文件,block大小为16k(MySQL DBA都懂的,哈哈),创建4个文件,合计20G

      [root@mha1 ~]# sysbench fileio --file-num=4 --file-block-size=16384 --file-total-size=20G prepare
      sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

      4 files, 5242880Kb each, 20480Mb total
      Creating files for the test...
      Extra file open flags: (none)
      Creating file test_file.0
      Creating file test_file.1
      Creating file test_file.2
      Creating file test_file.3
      21474836480 bytes written in 47.94 seconds (427.24 MiB/sec).
      1.3 准备测试表

      因为也要进行数据库读写方面的测试,因此需要先创建相关表及数据

        [root@mha1 ~]# sysbench --db-driver=mysql --time=300 --threads=10 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=20 --table_size=1000000 oltp_insert --db-ps-mode=disable prepare
        sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

        Initializing worker threads...

        Creating table 'sbtest3'...
        Creating table 'sbtest2'...
        Creating table 'sbtest5'...
        Creating table 'sbtest6'...
        Creating table 'sbtest8'...
        Creating table 'sbtest7'...
        Creating table 'sbtest4'...
        Creating table 'sbtest1'...
        Creating table 'sbtest10'...
        Creating table 'sbtest9'...
        Inserting 1000000 records into 'sbtest4'
        Inserting 1000000 records into 'sbtest7'
        Inserting 1000000 records into 'sbtest5'
        Inserting 1000000 records into 'sbtest3'
        Inserting 1000000 records into 'sbtest1'
        Inserting 1000000 records into 'sbtest2'
        Inserting 1000000 records into 'sbtest8'
        Inserting 1000000 records into 'sbtest6'
        Inserting 1000000 records into 'sbtest10'
        Inserting 1000000 records into 'sbtest9'
        Creating a secondary index on 'sbtest7'...
        Creating a secondary index on 'sbtest10'...
        Creating a secondary index on 'sbtest8'...
        Creating a secondary index on 'sbtest5'...
        Creating a secondary index on 'sbtest2'...
        Creating a secondary index on 'sbtest9'...
        Creating a secondary index on 'sbtest1'...
        Creating table 'sbtest17'...
        Creating a secondary index on 'sbtest3'...
        Inserting 1000000 records into 'sbtest17'
        Creating a secondary index on 'sbtest4'...
        Creating a secondary index on 'sbtest6'...
        Creating table 'sbtest20'...
        Inserting 1000000 records into 'sbtest20'
        Creating table 'sbtest18'...
        Inserting 1000000 records into 'sbtest18'
        Creating table 'sbtest15'...
        Inserting 1000000 records into 'sbtest15'
        Creating table 'sbtest19'...
        Inserting 1000000 records into 'sbtest19'
        Creating table 'sbtest14'...
        Inserting 1000000 records into 'sbtest14'
        Creating table 'sbtest11'...
        Inserting 1000000 records into 'sbtest11'
        Creating table 'sbtest13'...
        Creating table 'sbtest12'...
        Inserting 1000000 records into 'sbtest13'
        Inserting 1000000 records into 'sbtest12'
        Creating table 'sbtest16'...
        Inserting 1000000 records into 'sbtest16'
        Creating a secondary index on 'sbtest17'...
        Creating a secondary index on 'sbtest20'...
        Creating a secondary index on 'sbtest18'...
        Creating a secondary index on 'sbtest19'...
        Creating a secondary index on 'sbtest15'...
        Creating a secondary index on 'sbtest11'...
        Creating a secondary index on 'sbtest12'...
        Creating a secondary index on 'sbtest13'...
        Creating a secondary index on 'sbtest14'...
        Creating a secondary index on 'sbtest16'...
        2    查看支持的调度算法

        本次磁盘为SSD硬盘,操作系统版本文Centos7.8 。下面将调度算法修改为三种不同的值来进行随机读与随机写的压力测试

        本系统为Centos7.8,需要查看支持的IO调度算法,然后再进行修改测试。

          [root@mha1 ~]# dmesg | grep -i scheduler
          [ 4.885816] io scheduler noop registered
          [ 4.885820] io scheduler deadline registered (default)
          [ 4.885867] io scheduler cfq registered
          [ 4.885870] io scheduler mq-deadline registered
          [ 4.885872] io scheduler kyber registered

          可见,在本系统中,默认的调度算法为 deadline。

          也可以通过如下命令查看当前的调度算法,其中中括号里代表当前使用的调度算法。

            [root@mha1 ~]# cat sys/block/sda/queue/scheduler
            noop [deadline] cfq

            3   deadline算法

            Deadline在机械盘的情况下对数据库环境(ORACLE RAC,MySQL等)是最好的选择。下面将进行随机写与随机读的压力测试

            3.1  随机写

              [root@mha1 ~]# sysbench fileio \
              > --time=180 \
              > --threads=24 \
              > --file-total-size=20G \
              > --file-test-mode=rndwr \
              > --file-num=4 \
              > --file-extra-flags=direct \
              > --file-fsync-freq=0 \
              > --file-block-size=16384 \
              > run
              sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

              Running the test with following options:
              Number of threads: 24
              Initializing random number generator from current time


              Extra file open flags: directio
              4 files, 5GiB each
              20GiB total file size
              Block size 16KiB
              Number of IO requests: 0
              Read/Write ratio for combined random IO test: 1.50
              Calling fsync() at the end of test, Enabled.
              Using synchronous I/O mode
              Doing random write test
              Initializing worker threads...

              Threads started!


              File operations:
              reads/s: 0.00
              writes/s: 6935.37
              fsyncs/s: 0.53

              Throughput:
              read, MiB/s: 0.00
              written, MiB/s: 108.37

              General statistics:
              total time: 180.0138s
              total number of events: 1248484

              Latency (ms):
              min: 0.10
              avg: 3.46
              max: 107.39
              95th percentile: 14.73
              sum: 4317610.93

              Threads fairness:
              events (avg/stddev): 52020.1667/426.95
              execution time (avg/stddev): 179.9005/0.01

              随机写的iops为6935.37,磁盘写入速度是108.37MiB/s

              3.2 随机读

                [root@mha1 ~]# sysbench fileio \
                > --time=180 \
                > --threads=24 \
                > --file-total-size=20G \
                > --file-test-mode=rndrd \
                > --file-num=4 \
                > --file-extra-flags=direct \
                > --file-fsync-freq=0 \
                > --file-block-size=16384 \
                > run
                sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

                Running the test with following options:
                Number of threads: 24
                Initializing random number generator from current time


                Extra file open flags: directio
                4 files, 5GiB each
                20GiB total file size
                Block size 16KiB
                Number of IO requests: 0
                Read/Write ratio for combined random IO test: 1.50
                Calling fsync() at the end of test, Enabled.
                Using synchronous I/O mode
                Doing random read test
                Initializing worker threads...

                Threads started!


                File operations:
                reads/s: 7956.88
                writes/s: 0.00
                fsyncs/s: 0.00

                Throughput:
                read, MiB/s: 124.33
                written, MiB/s: 0.00

                General statistics:
                total time: 180.0075s
                total number of events: 1432313

                Latency (ms):
                min: 0.10
                avg: 3.01
                max: 322.24
                95th percentile: 5.47
                sum: 4309094.67

                Threads fairness:
                events (avg/stddev): 59679.7083/2688.56
                execution time (avg/stddev): 179.5456/0.18

                随机读的iops为7956.88,磁盘读取速度是124.33MiB/s

                3.3   测试数据库写

                  sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_write_only --db-ps-mode=disable run

                  SQL statistics:
                  queries performed:
                  read: 0
                  write: 589934
                  other: 294968
                  total: 884902
                  transactions: 147483 (491.43 per sec.)
                  queries: 884902 (2948.62 per sec.)
                  ignored errors: 2 (0.01 per sec.)
                  reconnects: 0 (0.00 per sec.)

                  General statistics:
                  total time: 300.1050s
                  total number of events: 147483

                  Latency (ms):
                  min: 2.58
                  avg: 16.27
                  max: 2608.34
                  95th percentile: 35.59
                  sum: 2399415.58

                  Threads fairness:
                  events (avg/stddev): 18435.3750/90.33
                  execution time (avg/stddev): 299.9269/0.04

                  可见,随机写入的TPS为491.43 ,查询次数为2948.62

                  3.4  测试数据库读

                    sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_read_only --db-ps-mode=disable run

                    SQL statistics:
                    queries performed:
                    read: 1651692
                    write: 0
                    other: 235956
                    total: 1887648
                    transactions: 117978 (393.13 per sec.)
                    queries: 1887648 (6290.13 per sec.)
                    ignored errors: 0 (0.00 per sec.)
                    reconnects: 0 (0.00 per sec.)

                    General statistics:
                    total time: 300.0949s
                    total number of events: 117978

                    Latency (ms):
                    min: 3.08
                    avg: 20.34
                    max: 170.48
                    95th percentile: 29.19
                    sum: 2399636.31

                    Threads fairness:
                    events (avg/stddev): 14747.2500/1513.84
                    execution time (avg/stddev): 299.9545/0.04
                    ,随机读时的TPS为393.13 ,查询次数为6290.13

                    4  noop算法

                    4.1  修改为noop算法

                    noop称为电梯调度算法,是基于FIFO队列实现的。所有的请求都是先进先出的,因为SSD的随机读、随机写速度快,因此该算法适合SSD硬盘。

                      [root@mha1 ~]# echo 'noop' >/sys/block/sda/queue/scheduler
                      [root@mha1 ~]# cat sys/block/sda/queue/scheduler
                      [noop] deadline cfq

                      4.2  随机写

                        [root@mha1 ~]# sysbench fileio \
                        > --time=180 \
                        > --threads=24 \
                        > --file-total-size=20G \
                        > --file-test-mode=rndwr \
                        > --file-num=4 \
                        > --file-extra-flags=direct \
                        > --file-fsync-freq=0 \
                        > --file-block-size=16384 \
                        > run
                        sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

                        Running the test with following options:
                        Number of threads: 24
                        Initializing random number generator from current time


                        Extra file open flags: directio
                        4 files, 5GiB each
                        20GiB total file size
                        Block size 16KiB
                        Number of IO requests: 0
                        Read/Write ratio for combined random IO test: 1.50
                        Calling fsync() at the end of test, Enabled.
                        Using synchronous I/O mode
                        Doing random write test
                        Initializing worker threads...

                        Threads started!


                        File operations:
                        reads/s: 0.00
                        writes/s: 7057.60
                        fsyncs/s: 0.53

                        Throughput:
                        read, MiB/s: 0.00
                        written, MiB/s: 110.27

                        General statistics:
                        total time: 180.0136s
                        total number of events: 1270481

                        Latency (ms):
                        min: 0.10
                        avg: 3.40
                        max: 240.39
                        95th percentile: 14.46
                        sum: 4317435.99

                        Threads fairness:
                        events (avg/stddev): 52936.7083/487.57
                        execution time (avg/stddev): 179.8932/0.02
                        随机写的iops为7057.60,磁盘写入速度是110.27MiB/s

                        4.3  随机读

                          [root@mha1 ~]# sysbench fileio \
                          > --time=180 \
                          > --threads=24 \
                          > --file-total-size=20G \
                          > --file-test-mode=rndrd \
                          > --file-num=4 \
                          > --file-extra-flags=direct \
                          > --file-fsync-freq=0 \
                          > --file-block-size=16384 \
                          > run
                          sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

                          Running the test with following options:
                          Number of threads: 24
                          Initializing random number generator from current time


                          Extra file open flags: directio
                          4 files, 5GiB each
                          20GiB total file size
                          Block size 16KiB
                          Number of IO requests: 0
                          Read/Write ratio for combined random IO test: 1.50
                          Calling fsync() at the end of test, Enabled.
                          Using synchronous I/O mode
                          Doing random read test
                          Initializing worker threads...

                          Threads started!


                          File operations:
                          reads/s: 8399.89
                          writes/s: 0.00
                          fsyncs/s: 0.00

                          Throughput:
                          read, MiB/s: 131.25
                          written, MiB/s: 0.00

                          General statistics:
                          total time: 180.0100s
                          total number of events: 1512081

                          Latency (ms):
                          min: 0.10
                          avg: 2.85
                          max: 315.77
                          95th percentile: 5.00
                          sum: 4312384.33

                          Threads fairness:
                          events (avg/stddev): 63003.3750/10086.77
                          execution time (avg/stddev): 179.6827/0.12
                          随机读的iops为8399.89,磁盘读取速度是131.25MiB/s

                          4.4  数据库写入

                            sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_write_only --db-ps-mode=disable run

                            SQL statistics:
                            queries performed:
                            read: 0
                            write: 653457
                            other: 326730
                            total: 980187
                            transactions: 163364 (544.38 per sec.)
                            queries: 980187 (3266.28 per sec.)
                            ignored errors: 2 (0.01 per sec.)
                            reconnects: 0 (0.00 per sec.)

                            General statistics:
                            total time: 300.0903s
                            total number of events: 163364

                            Latency (ms):
                            min: 2.62
                            avg: 14.69
                            max: 220.12
                            95th percentile: 32.53
                            sum: 2399040.57

                            Threads fairness:
                            events (avg/stddev): 20420.5000/112.69
                            execution time (avg/stddev): 299.8801/0.04
                            可见,随机写入的TPS为 544.38 ,查询次数3266.28 

                            4.5  数据库只读

                              sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_read_only --db-ps-mode=disable run
                              SQL statistics:
                              queries performed:
                              read: 1596364
                              write: 0
                              other: 228052
                              total: 1824416
                              transactions: 114026 (379.97 per sec.)
                              queries: 1824416 (6079.59 per sec.)
                              ignored errors: 0 (0.00 per sec.)
                              reconnects: 0 (0.00 per sec.)

                              General statistics:
                              total time: 300.0869s
                              total number of events: 114026

                              Latency (ms):
                              min: 3.08
                              avg: 21.04
                              max: 321.03
                              95th percentile: 31.37
                              sum: 2399600.56

                              Threads fairness:
                              events (avg/stddev): 14253.2500/1475.71
                              execution time (avg/stddev): 299.9501/0.02
                              可见,只读时的TPS为 379.97,查询次数为6079.59

                              5  cfq算法

                               5.1 修改为cfq算法

                              cfq称为绝对公平调度算法,为每个进程及线程单独创建一个队列来管理IO请求,起到每个进程和线程均匀分布IO的效果。此算法适用于通用服务器,centos6中为默认的IO调度算法。

                                [root@mha1 ~]# echo 'cfq' >/sys/block/sda/queue/scheduler
                                [root@mha1 ~]# cat sys/block/sda/queue/scheduler
                                noop deadline [cfq]

                                5.2 随机写

                                  [root@mha1 ~]# sysbench fileio \
                                  > --time=180 \
                                  > --threads=24 \
                                  > --file-total-size=20G \
                                  > --file-test-mode=rndwr \
                                  > --file-num=4 \
                                  > --file-extra-flags=direct \
                                  > --file-fsync-freq=0 \
                                  > --file-block-size=16384 \
                                  > run
                                  sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

                                  Running the test with following options:
                                  Number of threads: 24
                                  Initializing random number generator from current time


                                  Extra file open flags: directio
                                  4 files, 5GiB each
                                  20GiB total file size
                                  Block size 16KiB
                                  Number of IO requests: 0
                                  Read/Write ratio for combined random IO test: 1.50
                                  Calling fsync() at the end of test, Enabled.
                                  Using synchronous I/O mode
                                  Doing random write test
                                  Initializing worker threads...

                                  Threads started!


                                  File operations:
                                  reads/s: 0.00
                                  writes/s: 6614.37
                                  fsyncs/s: 0.53

                                  Throughput:
                                  read, MiB/s: 0.00
                                  written, MiB/s: 103.35

                                  General statistics:
                                  total time: 180.0118s
                                  total number of events: 1190677

                                  Latency (ms):
                                  min: 0.10
                                  avg: 3.63
                                  max: 348.78
                                  95th percentile: 15.27
                                  sum: 4317092.54

                                  Threads fairness:
                                  events (avg/stddev): 49611.5417/517.80
                                  execution time (avg/stddev): 179.8789/0.03

                                  随机写的iops为6614.37,磁盘写入速度是103.35MiB/s

                                  5.3 随机读

                                    [root@mha1 ~]# sysbench fileio \
                                    > --time=180 \
                                    > --threads=24 \
                                    > --file-total-size=20G \
                                    > --file-test-mode=rndrd \
                                    > --file-num=4 \
                                    > --file-extra-flags=direct \
                                    > --file-fsync-freq=0 \
                                    > --file-block-size=16384 \
                                    > run
                                    sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

                                    Running the test with following options:
                                    Number of threads: 24
                                    Initializing random number generator from current time


                                    Extra file open flags: directio
                                    4 files, 5GiB each
                                    20GiB total file size
                                    Block size 16KiB
                                    Number of IO requests: 0
                                    Read/Write ratio for combined random IO test: 1.50
                                    Calling fsync() at the end of test, Enabled.
                                    Using synchronous I/O mode
                                    Doing random read test
                                    Initializing worker threads...

                                    Threads started!


                                    File operations:
                                    reads/s: 7481.39
                                    writes/s: 0.00
                                    fsyncs/s: 0.00

                                    Throughput:
                                    read, MiB/s: 116.90
                                    written, MiB/s: 0.00

                                    General statistics:
                                    total time: 180.0086s
                                    total number of events: 1346731

                                    Latency (ms):
                                    min: 0.10
                                    avg: 3.20
                                    max: 374.49
                                    95th percentile: 5.77
                                    sum: 4312382.07

                                    Threads fairness:
                                    events (avg/stddev): 56113.7917/3058.00
                                    execution time (avg/stddev): 179.6826/0.17

                                    随机读的iops为7481.39,磁盘读取速度是116.90MiB/s

                                     5.4 数据库写

                                      sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_write_only --db-ps-mode=disable run
                                      SQL statistics:
                                      queries performed:
                                      read: 0
                                      write: 598765
                                      other: 299384
                                      total: 898149
                                      transactions: 149691 (498.54 per sec.)
                                      queries: 898149 (2991.25 per sec.)
                                      ignored errors: 2 (0.01 per sec.)
                                      reconnects: 0 (0.00 per sec.)

                                      General statistics:
                                      total time: 300.2552s
                                      total number of events: 149691

                                      Latency (ms):
                                      min: 2.55
                                      avg: 16.02
                                      max: 779.62
                                      95th percentile: 35.59
                                      sum: 2397311.08

                                      Threads fairness:
                                      events (avg/stddev): 18711.3750/132.24
                                      execution time (avg/stddev): 299.6639/0.38
                                      可见,随机写入的TPS为498.54 ,查询次数为2991.25 

                                      5.5  数据库读

                                        sysbench --db-driver=mysql --time=300 --threads=8 --report-interval=1 --mysql-host=127.0.0.1 --mysql-port=3306 --mysql-user=root --mysql-password=123456 --mysql-db=testdb --tables=10 --table_size=1000000 oltp_read_only --db-ps-mode=disable run
                                        SQL statistics:
                                        queries performed:
                                        read: 1448342
                                        write: 0
                                        other: 206906
                                        total: 1655248
                                        transactions: 103453 (344.66 per sec.)
                                        queries: 1655248 (5514.58 per sec.)
                                        ignored errors: 0 (0.00 per sec.)
                                        reconnects: 0 (0.00 per sec.)

                                        General statistics:
                                        total time: 300.1562s
                                        total number of events: 103453

                                        Latency (ms):
                                        min: 3.11
                                        avg: 23.19
                                        max: 222.31
                                        95th percentile: 38.25
                                        sum: 2399486.55

                                        Threads fairness:
                                        events (avg/stddev): 12931.6250/1278.72
                                        execution time (avg/stddev): 299.9358/0.01
                                         可见,只读时的TPS为 344.66,查询次数为5514.58

                                        6 小结

                                        根据测试结果对比一下三种调度算法的读写速度

                                        算法IOPS磁盘写速度IOPS磁盘读速度oltp_write_only oltp_read_only 
                                        deadline6935.37118.37MiB/s7956.88124.33MiB/sTPS为491.43 ,查询次数为2948.62 TPS为393.13 ,查询次数为6290.13
                                        noop7057.60110.27MiB/s8399.89131.25MiB/sTPS为 544.38 ,查询次数为3266.28TPS为 379.97,查询次数为6079.59
                                        cfq6614.37103.35MiB/s7481.39116.90MiB/sTPS为498.54 ,查询次数为2991.25 TPS为 344.66,查询次数为5514.58

                                        因为本次测试环境为SSD硬盘,因此,在此情况下建议选择noop磁盘IO调度算法,此结论也符合我们的预期。

                                        特别注意:磁盘IO的调度算法还需要根据磁盘情况、数据库类型、数据库架构、业务场景(OLTP、OLAP等)等各种场景进行区分,不同的场景调度算法也要调整,不可一概而论。如果不确定的话,建议进行压测来判断,选择符合对应场景下最合适的算法。

                                        想了解更多内容或参与技术交流可以关注微信公众号【数据库干货铺】或进技术交流群沟通。


                                        MAMBA NEVER OUT!


                                        往期精彩回顾


                                        1.  MySQL高可用之MHA集群部署

                                        2.  mysql8.0新增用户及加密规则修改的那些事

                                        3.  比hive快10倍的大数据查询利器-- presto

                                        4.  监控利器出鞘:Prometheus+Grafana监控MySQL、Redis数据库

                                        5.  PostgreSQL主从复制--物理复制

                                        6.  MySQL传统点位复制在线转为GTID模式复制

                                        7.  MySQL敏感数据加密及解密

                                        8.  MySQL数据备份及还原(一)

                                        9.  MySQL数据备份及还原(二)













                                        最后修改时间:2020-08-25 10:05:53
                                        文章转载自数据库干货铺,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                        评论