暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

Halo数据库主备故障案例-主库从库脑裂,从库提升为主库后再降为从库

原创 ... 2023-10-07
15339

0.原架构

主库:Halo161

备库:Halo162

Halo161和Halo162之间是物理复制的主备库


1.提升备库为主库

[halo@Halo162 ~] pg_ctl promote


2.数据写入备库

INSERT INTO  t1 select generate_series(1,1000000), md5(random()::text); 


3.查看时间线,比对Halo161和Halo162

[halo@Halo161 ~]$ pg_controldata | grep -i timeLatest checkpoint's TimeLineID:       1Latest checkpoint's PrevTimeLineID:   1Time of latest checkpoint:            Thu 17 Nov 2022 12:14:23 PM CSTMin recovery ending loc's timeline:   0track_commit_timestamp setting:       offDate/time type storage:               64-bit integers
[halo@Halo162 ~]$ pg_controldata | grep -i timeLatest checkpoint's TimeLineID: 1Latest checkpoint's PrevTimeLineID: 1Time of latest checkpoint: Thu 17 Nov 2022 12:09:23 PM CSTMin recovery ending loc's timeline: 2track_commit_timestamp setting: offDate/time type storage: 64-bit integers


4.查看切换点

查看备库:

[halo@Halo162 ~]$ ll /data/halo/pg_wal/*.history */-rw------- 1 halo halo   41 Nov 17 12:15 00000002.history[halo@Halo162 ~]$ cat /data/halo/pg_wal/00000002.history1       0/F03EE68       no recovery target specified


5.备库激活时间开始产生的WAL必须全部在pg_wal目录中

[halo@Halo162 ~]$ ll -l /data/halo/pg_wal/     total 225M-rw------- 1 halo halo  16M Nov 17 12:10 000000010000000000000008-rw------- 1 halo halo  16M Nov 17 12:13 000000010000000000000009-rw------- 1 halo halo  16M Nov 17 12:13 00000001000000000000000A-rw------- 1 halo halo  16M Nov 17 12:13 00000001000000000000000B-rw------- 1 halo halo  16M Nov 17 12:14 00000001000000000000000C-rw------- 1 halo halo  16M Nov 17 12:14 00000001000000000000000D-rw------- 1 halo halo  16M Nov 17 12:14 00000001000000000000000E-rw------- 1 halo halo  16M Nov 17 12:15 00000001000000000000000F.partial-rw------- 1 halo halo  16M Nov 17 12:15 00000002000000000000000F-rw------- 1 halo halo  16M Nov 17 12:15 000000020000000000000010-rw------- 1 halo halo  16M Nov 17 12:15 000000020000000000000011-rw------- 1 halo halo  16M Nov 17 12:15 000000020000000000000012-rw------- 1 halo halo  16M Nov 17 12:15 000000020000000000000013-rw------- 1 halo halo  16M Nov 17 12:15 000000020000000000000014-rw------- 1 halo halo   41 Nov 17 12:15 00000002.historydrwx------ 2 halo halo 4.0K Nov 17 12:15 archive_status[halo@Halo162 ~]$ ll -l /data/halo/archivedir/total 97M-rw------- 1 halo halo 16M Nov 17 12:15 00000001000000000000000F.partial-rw------- 1 halo halo 16M Nov 17 12:15 00000002000000000000000F-rw------- 1 halo halo 16M Nov 17 12:15 000000020000000000000010-rw------- 1 halo halo 16M Nov 17 12:15 000000020000000000000011-rw------- 1 halo halo 16M Nov 17 12:15 000000020000000000000012-rw------- 1 halo halo 16M Nov 17 12:15 000000020000000000000013-rw------- 1 halo halo  41 Nov 17 12:15 00000002.history注意:所有的wal必须存在从库pg_wal目录中。如果已经覆盖了,必须从归档目录拷贝到从库pg_wal目录中


6.在备库创建检查点

checkpoint;


7.停止备库

[halo@Halo162 ~]$ pg_ctl stop


8.在备库尝试修复

[halo@Halo162 ~]$ pg_rewind -n -D /data/halo  --source-server="host=halo161 user=pgrewind password=12345678 dbname=halo0root" -P -R -c --debugpg_rewind: creating backup label and updating control filepg_rewind: syncing target data directorypg_rewind: Done!


9.在备库尝试正常,说明可以修复,实施修复

[halo@Halo162 ~]$ pg_rewind -D /data/halo  --source-server="host=halo161 user=pgrewind password=12345678 dbname=halo0root" -P -R -c --debugpg_rewind: creating backup label and updating control filepg_rewind: syncing target data directorypg_rewind: Done!


10.确保 primary_conninfo 参数正确,可能需要将用户从 pgrewind 改成 replica

增加 primary_slot_name 参数

[halo@Halo162 ~]$ vi /data/halo/postgresql.auto.conf修改user=replica添加primary_slot_name = 'halo162'


11.备库库启动

[halo@Halo162 ~]$ pg_ctl start


12.检查从库日志


13.查看主库复制流和延迟

SELECT * FROM pg_stat_archiver ;SELECT * FROM pg_stat_replication;
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论