暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

redis aof和rdb引起的磁盘io告警

数据库这点小事 2019-11-29
2606

问题:

生成上线codis之后,通过redis-sync完成了单点到codis的数据同步,

在检查环境切换后的状态情况时,发现codis-server各节点的对应日志里大量的刷磁盘繁忙的状态,期间codis-server有间断性的timeout出现

    209:M 28 Nov 01:36:01.028 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:04.029 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:06.030 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:09.029 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:12.033 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:15.046 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:19.094 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
    9209:M 28 Nov 01:36:22.019 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.

    分析:

    从告警看,在做aof持久化的过程中,产生了大量的写磁盘操作,影响了codis整个集群的性能。

    redis的两种持久化方式:

    快照rdb:

    rdb方式可能会丢失数据,根据配置文件里设置的触发rdb持久化的机制,当触发rdb进行持久化时,会进行一次全量备份,存储的是内存数据的二进制序列化形式,存储上非常紧凑。当进行快照持久化时,会开启一个子进程专门负责快照持久化,子进程会拥有父进程的内存数据,父进程修改内存子进程不会反应出来(操作系统会维护共享页面和修改后的分离页面),所以在快照持久化期间修改的数据不会被保存,可能丢失数据。

    日志aof:

    AOF重写就是开启一个子进程将内存遍历转换成一系列的redis操作指令序列化一个新的AOF日志文件中,然后再把转换期间的增量AOF日志追加到新的日志文件,然后替换原来的AOF文件即完成了AOF重写。写AOF文件时其实是写到一个内存缓存中然后异步地将脏数据刷新到磁盘,刷新频率一般为1s一次。aof会降低redis的性能

    查看各节点codis-server的配置

      protected-mode no
      daemonize yes
      pidfile "/var/run/redis_6379.pid"
      port 6379
      save 900 1
      save 300 10
      save 60 10000
      tcp-backlog 511
      timeout 300
      tcp-keepalive 0
      loglevel notice
      logfile "/appl/codis_server/logs/redis_6379.log"
      databases 16
      stop-writes-on-bgsave-error yes
      rdbcompression yes
      rdbchecksum yes
      dbfilename "redis_6379.rdb"
      dir "/appl/codis_server/data"
      slave-serve-stale-data yes
      slave-read-only yes
      repl-disable-tcp-nodelay no
      slave-priority 100
      maxclients 30000
      maxmemory 30gb
      maxmemory-policy allkeys-lru
      appendonly yes
      appendfilename "6379_appendonly.aof"
      appendfsync everysec
      no-appendfsync-on-rewrite no
      auto-aof-rewrite-percentage 100
      auto-aof-rewrite-min-size 64mb
      lua-time-limit 5000
      slowlog-log-slower-than 10000
      slowlog-max-len 128
      latency-monitor-threshold 0
      notify-keyspace-events ""

      果然rdb和aof两种持久化方式都开启了,并且由于资源问题,在每台服务器上启动了两个codis-server,互为主备,更加大了io的压力。

      解决:

      1. 各节点动态关闭aof持久化

        127.0.0.1:6379> config set appendonly no

        2. 修改配置文件,关闭aof持久化(防止节点重启后动态配置失效)

          appendonly no

          关注server的日志情况,告警不在出现,问题解决。

          文章转载自数据库这点小事,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

          评论