暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

AntDB 数据库分布式运维手册-集群节点管理8

tocata 2024-09-04
229

快速恢复备节点

如果 master 出现故障,执行了 failover 命令,slave 升级为 master 后,用户想给新 master 重新添加新 slave 时,可以有两个选择:

方法一:使用 append 命令添加一个全新的 slave 节点,需要进行数据同步,保证主备一致;

方法二:将原来移除的 master 以 slave 节点的形式加入到集群中。

第二种方式由于原 master 节点已经有大量的数据,重新加入到集群中后只需要进行少量的数据同步即可,时间短。rewind 的功能正是如此,可以实现快速恢复备节点。具体可以参考 rewind 章节。

注意:要使用 rewind 功能,需要在 datanode 上将 wal_log_hints 和 full_page_writes 设置为 on。

SET DATANODE ALL(wal_log_hints=on, full_page_writes=on); 
REWIND DATANODE SLAVE dm1; 

举例:

antdb=# ADD DATANODE SLAVE dn1_1 for dn1_3 (host=adb01,port=52531,path='/data/antdb/data/adb50/d1/dn1_1'); 
ADD NODE 
antdb=# REWIND DATANODE SLAVE dn1_1; 
NOTICE:  pg_ctl restart datanode slave "dn1_1" 
NOTICE:  10.21.20.175, pg_ctl  restart -D /data/antdb/data/adb50/d1/dn1_1 -Z datanode -m fast -o -i -w -c -l /data/antdb/data/adb50/d1/dn1_1/logfile 
NOTICE:  wait max 90 seconds to check datanode slave "dn1_1" running normal 
NOTICE:  pg_ctl stop datanode slave "dn1_1" with fast mode 
NOTICE:  10.21.20.175, pg_ctl  stop -D /data/antdb/data/adb50/d1/dn1_1 -Z datanode -m fast -o -i -w -c 
NOTICE:  wait max 90 seconds to check datanode slave "dn1_1" stop complete 
NOTICE:  update gtmcoord master "gcn1" pg_hba.conf for the rewind node dn1_1 
NOTICE:  update gtmcoord slave "gcn2" pg_hba.conf for the rewind node dn1_1 
NOTICE:  update datanode master "dn1_3" pg_hba.conf for the rewind node dn1_1 
NOTICE:  update datanode slave "dn1_2" pg_hba.conf for the rewind node dn1_1 
NOTICE:  on datanode master "dn1_3" execute "checkpoint" 
NOTICE:  10.21.20.175,  /data/antdb/app/adb50/bin/pg_controldata '/data/antdb/data/adb50/d2/dn1_3' | grep 'Minimum recovery ending location:' |awk '{print $5}' 
NOTICE:  receive msg: {"result":"0/0"} 
NOTICE:  10.21.20.175,  /data/antdb/app/adb50/bin/pg_controldata '/data/antdb/data/adb50/d2/dn1_3' |grep 'Min recovery ending loc' |awk '{print $6}' 
NOTICE:  receive msg: {"result":"0"} 
NOTICE:  10.21.20.175, adb_rewind  --target-pgdata /data/antdb/data/adb50/d1/dn1_1 --source-server='host=10.21.20.175 port=52533 user=antdb dbname=postgres' -T dn1_1 -S dn1_3 
NOTICE:  receive msg: servers diverged at WAL location 0/40001B0 on timeline 1 
rewinding from last common checkpoint at 0/4000140 on timeline 1 
Done! 
 
NOTICE:  refresh mastername of datanode slave "dn1_1" in the node table 
NOTICE:  set parameters in postgresql.conf of datanode slave "dn1_1" 
NOTICE:  refresh recovery.conf of datanode slave "dn1_1" 
NOTICE:  pg_ctl  start -Z datanode -D /data/antdb/data/adb50/d1/dn1_1 -o -i -w -c -l /data/antdb/data/adb50/d1/dn1_1/logfile 
NOTICE:  10.21.20.175, pg_ctl  start -Z datanode -D /data/antdb/data/adb50/d1/dn1_1 -o -i -w -c -l /data/antdb/data/adb50/d1/dn1_1/logfile 
NOTICE:  refresh datanode master "dn1_3" synchronous_standby_names='1 (dn1_2,dn1_1)' 
 mgr_failover_manual_rewind_func  
--------------------------------- 
 t 
(1 row) 
 
antdb=# list node dn1_3; 
 name  | host  |      type       | mastername | port  | sync_state |               path               | initialized | incluster | readonly | zone   
-------+-------+-----------------+------------+-------+------------+----------------------------------+-------------+-----------+----------+------- 
 dn1_3 | adb01 | datanode master |            | 52533 |            | /data/antdb/data/adb50/d2/dn1_3 | t           | t         | f        | local 
(1 row) 
 
antdb=# list node dn1_1; 
 name  | host  |      type      | mastername | port  | sync_state |               path               | initialized | incluster | readonly | zone   
-------+-------+----------------+------------+-------+------------+----------------------------------+-------------+-----------+----------+------- 
 dn1_1 | adb01 | datanode slave | dn1_3      | 52531 | potential  | /data/antdb/data/adb50/d1/dn1_1 | t           | t         | f        | local 
(1 row) 

再次说明,在最新版本中,如果配置了自愈doctor,以上操作均无需人工干预。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论