问题:
生成上线codis之后,通过redis-sync完成了单点到codis的数据同步,
在检查环境切换后的状态情况时,发现codis-server各节点的对应日志里大量的刷磁盘繁忙的状态,期间codis-server有间断性的timeout出现
209:M 28 Nov 01:36:01.028 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:04.029 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:06.030 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:09.029 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:12.033 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:15.046 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:19.094 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.9209:M 28 Nov 01:36:22.019 * Asynchronous AOF fsync is taking too long (disk is busy?). Writing the AOF buffer without waiting for fsync to complete, this may slow down Redis.
分析:
从告警看,在做aof持久化的过程中,产生了大量的写磁盘操作,影响了codis整个集群的性能。
redis的两种持久化方式:
快照rdb:
rdb方式可能会丢失数据,根据配置文件里设置的触发rdb持久化的机制,当触发rdb进行持久化时,会进行一次全量备份,存储的是内存数据的二进制序列化形式,存储上非常紧凑。当进行快照持久化时,会开启一个子进程专门负责快照持久化,子进程会拥有父进程的内存数据,父进程修改内存子进程不会反应出来(操作系统会维护共享页面和修改后的分离页面),所以在快照持久化期间修改的数据不会被保存,可能丢失数据。
日志aof:
AOF重写就是开启一个子进程将内存遍历转换成一系列的redis操作指令序列化一个新的AOF日志文件中,然后再把转换期间的增量AOF日志追加到新的日志文件,然后替换原来的AOF文件即完成了AOF重写。写AOF文件时其实是写到一个内存缓存中然后异步地将脏数据刷新到磁盘,刷新频率一般为1s一次。aof会降低redis的性能
查看各节点codis-server的配置
protected-mode nodaemonize yespidfile "/var/run/redis_6379.pid"port 6379save 900 1save 300 10save 60 10000tcp-backlog 511timeout 300tcp-keepalive 0loglevel noticelogfile "/appl/codis_server/logs/redis_6379.log"databases 16stop-writes-on-bgsave-error yesrdbcompression yesrdbchecksum yesdbfilename "redis_6379.rdb"dir "/appl/codis_server/data"slave-serve-stale-data yesslave-read-only yesrepl-disable-tcp-nodelay noslave-priority 100maxclients 30000maxmemory 30gbmaxmemory-policy allkeys-lruappendonly yesappendfilename "6379_appendonly.aof"appendfsync everysecno-appendfsync-on-rewrite noauto-aof-rewrite-percentage 100auto-aof-rewrite-min-size 64mblua-time-limit 5000slowlog-log-slower-than 10000slowlog-max-len 128latency-monitor-threshold 0notify-keyspace-events ""
果然rdb和aof两种持久化方式都开启了,并且由于资源问题,在每台服务器上启动了两个codis-server,互为主备,更加大了io的压力。
解决:
1. 各节点动态关闭aof持久化
127.0.0.1:6379> config set appendonly no
2. 修改配置文件,关闭aof持久化(防止节点重启后动态配置失效)
appendonly no
关注server的日志情况,告警不在出现,问题解决。
文章转载自数据库这点小事,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




