当前redis集群版本为redis4.0.2,在日常生产中发现当前集群是个伪集群,停任意一个节点应用不可用且当前版本存在安全漏洞隐患,故将当前集群升级为redis5.0.8并处理伪集群。
1. 集群及节点规划
IP 135.10.xx.240 135.10.xx.241 135.10.xx.242 135.10.xx.243 135.10.xx.244 135.10.xx.245
架构 集群高可用.
2. 系统环境
3. redis高可用架构图
Redis三主三从架构图:

1. 收集集群状态信息
[shsnc@vjynaguan6 bin]$./redis-cli -c -p 16300 -h 135.10.xx.245 -a "reaQAv3wsx#gzywxk" cluster nodes
9d47a5549ef8d965467d355e399ab9a41f4129fa 135.10.xx.240:16300@26300 master - 0 1682041897000 14 connected 0-5460
1c74ab57790b2d54e4338987fb437f127686431f 135.10.xx.245:16300@26300 myself,slave efe7b4a5f9cdd75a55e878e08cc6aae1a432cb6a 0 1682041899000 11 connected
efe7b4a5f9cdd75a55e878e08cc6aae1a432cb6a 135.10.xx.242:16300@26300 master - 0 1682041900966 15 connected 10923-16383
52fb87fd535c502fb26ebe61f0e6a87b7732292a 135.10.xx.241:16300@26300 master - 0 1682041899000 13 connected 5461-10922
5308f0e0ad14cef0cfd69c51c99c29eab18efdac 135.10.xx.244:16300@26300 slave 52fb87fd535c502fb26ebe61f0e6a87b7732292a 0 1682041899965 13 connected
43461d857109b280bb16a3a6ab9e7cee3a2b264b 135.10.xx.243:16300@26300 slave 9d47a5549ef8d965467d355e399ab9a41f4129fa 0 1682041900000 14 connected
2. 统计当前集群数据
135.10.xx.245:16300> dbsize
(integer) 1012881
135.10.xx.245:16300> info keyspace
# Keyspace
db0:keys=1012875,expires=36888,avg_ttl=242500
1. 升级slave节点(三个slave节点都执行)
./redis-cli -c -p 16300 -h 135.10.xx.245 -a "reaQAv3wsx#gzywxk" shutdown
ps -ef|grep redis
mv redis redis0421
make install PREFIX=/home/shsnc/snc_product/redis5.0.8
cp -r data etc ../redis
mkdir log
/home/shsnc/snc_product/redis5.0.8/bin/redis-server
/home/shsnc/snc_product/redis5.0.8/etc/redis-6379.conf
2. 升级master节点(三个master节点都执行)
./redis-cli -c -p 16300 -h 135.10.xx.240 -a "reaQAv3wsx#gzywxk" shutdown
ps -ef|grep redis
mv redis redis0421
make install PREFIX=/home/shsnc/snc_product/redis5.0.8
cp -r data etc ../redis
mkdir log
/home/shsnc/snc_product/redis5.0.8/bin/redis-server home/shsnc/snc_product/redis5.0.8/etc/redis-6379.conf
3. 检查集群状态以及应用
./redis-cli -c -p 6379 -h 135.10.xx.245 -a "reaQAv3wsx#gzywxk" cluster nodes


135.10.xx.245:16300> dbsize
(integer) 1012881
135.10.xx.245:16300> info keyspace
# Keyspace
db0:keys=1012875,expires=36888,avg_ttl=242500
4. 集群高可用验证
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
3c271bc1136e65df79e2cf9184fd592e08f5231d 135.10.xx.242:6379@16379 slave 8c3970c8622acab861a8e820be0199a4b1ec13d1 0 1683511344521 9 connected
c0c693f0dc349446a1f49a07a8f261dd43f1c46a 135.10.xx.244:6379@16379 slave fail 5138e65a710e53164618b9a1eec5ff64f5834537 0 1683511346528 5 connected
5138e65a710e53164618b9a1eec5ff64f5834537 135.10.xx.240:6379@16379 master - 0 1683511345529 1 connected 0-5460
f5cffd9d340af081d5042e047280baee27916c9e 135.10.xx.241:6379@16379 master - 0 1683511346000 10 connected 5461-10922
8c3970c8622acab861a8e820be0199a4b1ec13d1 135.10.xx.243:6379@16379 master - 0 1683511347534 9 connected 10923-16383
31e3d76e147557d6874f7a631243cffaa827e8bf 135.10.xx.245:6379@16379 myself,slave f5cffd9d340af081d5042e047280baee27916c9e 0 1683511345000

根据以上依次验证master节点。
1. 数据存在断续
升级几天后发现上报集团数据存在断续,检查发现应用连接redis集群存在报错,查看redis日志发现存在大量FAIL message received from报错,并且存在大量的重写:
304110:C 06 May 2023 01:50:02.374 * DB saved on disk
304110:C 06 May 2023 01:50:02.391 * RDB: 26 MB of memory used by copy-on-write
7061:M 06 May 2023 01:50:02.455 * Background saving terminated with success
7061:M 06 May 2023 01:50:32.395 * Starting automatic rewriting of AOF on 80% growth
7061:M 06 May 2023 01:50:32.406 * Background append only file rewriting started by pid 306016
7061:M 06 May 2023 01:50:37.474 * AOF rewrite child asks to stop sending diffs.
306016:C 06 May 2023 01:50:37.475 * Parent agreed to stop sending diffs. Finalizing AOF...
306016:C 06 May 2023 01:50:37.475 * Concatenating 8.04 MB of AOF diff received from parent.
306016:C 06 May 2023 01:50:37.537 * SYNC append only file rewrite performed
306016:C 06 May 2023 01:50:37.557 * AOF rewrite: 144 MB of memory used by copy-on-write
7061:M 06 May 2023 01:50:37.715 * Background AOF rewrite terminated with success
7061:M 06 May 2023 01:50:37.717 * Residual parent diff successfully flushed to the rewritten AOF (0.40 MB)
7061:M 06 May 2023 01:50:37.717 * Background AOF rewrite finished successfully
优化rdb刷盘参数 将save 900 1 save 300 10 save 60 10000修改为:save 900 10 save 300 100 save 60 100000; 优化数据同步时缓冲区上限 client-output-buffer-limit slave 512mb 128mb 120; 优化aof重写文件大小 auto-aof-rewrite-min-size 512mb; 将AOF重写比例从80%改为100%; 由于当前数据量比较大 把no-appendfsync-on-rewrite no 改为 yes, 因为no会造成数据阻塞,频繁刷盘。
总 结:

本文作者:事业二部(上海新炬中北团队)
本文来源:“IT那活儿”公众号

文章转载自IT那活儿,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




