redis6 cluster failover

原创 huayumicheng 2022-12-20

1635

手动故障转移

Redis集群提供了手动故障转移命令cluster failover，指定从节点发起转移，主从节点角色进行互换，过程如下：

从节点通知主节点停止处理所有客户端请求。
主节点发送对应从节点延迟复制的数据。
从节点接收复制延迟的数据，直到主从复制偏移量一致。
从节点立刻发起投票选举，选举成功后断开复制变为新的主节点，之后向集群广播。
原主节点接收消息后更新自身配置变为从节点，解除所有客户端请求阻塞，重定向到新的主节点。
原主节点变为从节点后，向新的主节点发起部分复制请求(Redis4.0以前是全量复制)


Redis集群还提供了强制故障转移的方法：

cluster failover force - 用于主节点宕机且无法自动完成故障转移的情况。
cluster failover takeover - 用于集群内一半以上主节点故障的场景，从节点无法收到半数以上主节点投票，无法完成选举过程(慎用)。

查看当前的主从关系

[root@rabbitmq1 ~]# redis-cli -c -h 192.168.100.62 -p 6379 -a 123456 cluster  slots 2>/dev/null | xargs  -n8 | awk '{print $3":"$4"->"$6":"$7}' | sort
192.168.100.61:6379->192.168.100.62:6380
192.168.100.62:6379->192.168.100.63:6380
192.168.100.63:6379->192.168.100.61:6380

[root@rabbitmq1 ~]# redis-cli --cluster check 192.168.100.62:6379 -a 123456 2>/dev/null
192.168.100.62:6379 (f0791896...) -> 6669 keys | 5462 slots | 1 slaves.
192.168.100.61:6379 (e3bbc270...) -> 6651 keys | 5461 slots | 1 slaves.
192.168.100.63:6379 (0f0cc57e...) -> 6681 keys | 5461 slots | 1 slaves.
[OK] 20001 keys in 3 masters.
1.22 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.100.62:6379)
M: f07918968287b7de04b9e2244606bc2278287ced 192.168.100.62:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: bb7973c9d6620a3e71834539bc2f7dde117157b3 192.168.100.63:6380
   slots: (0 slots) slave
   replicates f07918968287b7de04b9e2244606bc2278287ced
S: 408855217845659f7e10da974dc32baaa97fb282 192.168.100.62:6380
   slots: (0 slots) slave
   replicates e3bbc270cc0e30f065636249d2820129bc72f686
M: e3bbc270cc0e30f065636249d2820129bc72f686 192.168.100.61:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 0f0cc57eaa739db6183755924314d6c8deb6afda 192.168.100.63:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 8180cd8a8a3c13394bf271b36030d1e4955e9cfd 192.168.100.61:6380
   slots: (0 slots) slave
   replicates 0f0cc57eaa739db6183755924314d6c8deb6afda
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

--将192.168.100.63:6380 切换为主

[root@rabbitmq1 ~]# redis-cli -c -h 192.168.100.63 -p 6380 -a 123456 cluster failover 2>/dev/null 
OK

查看新的主从关系

[root@rabbitmq1 ~]# redis-cli --cluster check 192.168.100.62:6379 -a 123456 2>/dev/null
192.168.100.63:6380 (bb7973c9...) -> 6669 keys | 5462 slots | 1 slaves.
192.168.100.61:6379 (e3bbc270...) -> 6651 keys | 5461 slots | 1 slaves.
192.168.100.63:6379 (0f0cc57e...) -> 6681 keys | 5461 slots | 1 slaves.
[OK] 20001 keys in 3 masters.
1.22 keys per slot on average.
>>> Performing Cluster Check (using node 192.168.100.62:6379)
S: f07918968287b7de04b9e2244606bc2278287ced 192.168.100.62:6379
   slots: (0 slots) slave
   replicates bb7973c9d6620a3e71834539bc2f7dde117157b3
M: bb7973c9d6620a3e71834539bc2f7dde117157b3 192.168.100.63:6380
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 408855217845659f7e10da974dc32baaa97fb282 192.168.100.62:6380
   slots: (0 slots) slave
   replicates e3bbc270cc0e30f065636249d2820129bc72f686
M: e3bbc270cc0e30f065636249d2820129bc72f686 192.168.100.61:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 0f0cc57eaa739db6183755924314d6c8deb6afda 192.168.100.63:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 8180cd8a8a3c13394bf271b36030d1e4955e9cfd 192.168.100.61:6380
   slots: (0 slots) slave
   replicates 0f0cc57eaa739db6183755924314d6c8deb6afda
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.


[root@rabbitmq1 ~]# redis-cli -c -h 192.168.100.62 -p 6379 -a 123456 cluster  slots 2>/dev/null | xargs  -n8 | awk '{print $3":"$4"->"$6":"$7}' | sort
192.168.100.61:6379->192.168.100.62:6380
192.168.100.63:6379->192.168.100.61:6380
192.168.100.63:6380->192.168.100.62:6379


--查看原从  192.168.100.63:6380的切换日志

[root@rabbitmq3 ~]# tail -fn 200 /data/redis/6380/log/redis_6380.log 

1717:S 20 Dec 2022 21:48:00.392 # Manual failover user request accepted.
1717:S 20 Dec 2022 21:48:00.392 # Received replication offset for paused master manual failover: 1246
1717:S 20 Dec 2022 21:48:00.392 # All master replication stream processed, manual failover can start.
1717:S 20 Dec 2022 21:48:00.392 # Start of election delayed for 0 milliseconds (rank #0, offset 1246).
1717:S 20 Dec 2022 21:48:00.393 # Starting a failover election for epoch 7.
1717:S 20 Dec 2022 21:48:00.394 # Failover election won: I'm the new master.
1717:S 20 Dec 2022 21:48:00.394 # configEpoch set to 7 after successful failover
1717:M 20 Dec 2022 21:48:00.394 # Connection with master lost.
1717:M 20 Dec 2022 21:48:00.394 * Caching the disconnected master state.
1717:M 20 Dec 2022 21:48:00.394 * Discarding previously cached master state.
1717:M 20 Dec 2022 21:48:00.394 # Setting secondary replication ID to c6ae8e1387feadf53693a1e0ea7f11697daa3791, valid up to offset: 1247. New replication ID is 5cb3be604a976e25e75afb642ea39a034c25b932
1717:M 20 Dec 2022 21:48:00.396 * Replica 192.168.100.62:6379 asks for synchronization
1717:M 20 Dec 2022 21:48:00.396 * Partial resynchronization request from 192.168.100.62:6379 accepted. Sending 0 bytes of backlog starting from offset 1247.


--查看原主  192.168.100.62:6379的切换日志


tail -fn 200 /data/redis/6379/log/redis_6379.log 


1693:M 20 Dec 2022 21:48:00.398 # Manual failover requested by replica bb7973c9d6620a3e71834539bc2f7dde117157b3.
1693:M 20 Dec 2022 21:48:00.398 # Failover auth granted to bb7973c9d6620a3e71834539bc2f7dde117157b3 for epoch 7
1693:M 20 Dec 2022 21:48:00.399 # Connection with replica 192.168.100.63:6380 lost.
1693:M 20 Dec 2022 21:48:00.400 # Configuration change detected. Reconfiguring myself as a replica of bb7973c9d6620a3e71834539bc2f7dde117157b3
1693:S 20 Dec 2022 21:48:00.400 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1693:S 20 Dec 2022 21:48:00.400 * Connecting to MASTER 192.168.100.63:6380
1693:S 20 Dec 2022 21:48:00.400 * MASTER <-> REPLICA sync started
1693:S 20 Dec 2022 21:48:00.401 * Non blocking connect for SYNC fired the event.
1693:S 20 Dec 2022 21:48:00.401 * Master replied to PING, replication can continue...
1693:S 20 Dec 2022 21:48:00.401 * Trying a partial resynchronization (request c6ae8e1387feadf53693a1e0ea7f11697daa3791:1247).
1693:S 20 Dec 2022 21:48:00.401 * Successful partial resynchronization with master.
1693:S 20 Dec 2022 21:48:00.401 # Master replication ID changed to 5cb3be604a976e25e75afb642ea39a034c25b932
1693:S 20 Dec 2022 21:48:00.401 * MASTER <-> REPLICA sync: Master accepted a Partial Resynchronization.

redis

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

redis6 cluster failover

评论