模拟主库宕机,一时不能恢复的情况下,备库强制failover切换为主库,待原主库恢复后,重新加入主备集群,最后再正常switchover切换。
1、初始环境:gsdb01 为主库,gsdb02为备库,修改testdb库下t1表,模拟故障前后数据的变化。
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Normal
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+--------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
(1 row)
testdb=> insert into t1 values(2,'failover-before',now());
INSERT 0 1
testdb=>
testdb=> select * from t1;
id | name | dt
----+-----------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
(2 rows)
testdb=> \q
[omm@gsdb01 ~]$
备库查看集群初始状态和t1表数据,主备集群同步正常。
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Normal
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-----------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
(2 rows)
testdb=> \q
[omm@gsdb02 ~]$
2、模拟主库异常宕机,短时间故障无法恢复。
[omm@gsdb01 ~]$ gs_ctl stop -D /u01/openGauss/data/db1
[2020-07-16 09:51:28.691][14433][][gs_ctl]: gs_ctl stopped ,datadir is -D "/u01/openGauss/data/db1"
waiting for server to shut down...... done
server stopped
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Down Manually stopped | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(Disconnected)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$
备库查看集群状态,集群不可用Unavailable,备库实例Need repair(Disconnected)
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Down Manually stopped | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Need repair(Disconnected)
[omm@gsdb02 ~]$
由于主库故障暂时无法恢复,因此备库强制failover切换,备库failover切换后,备库转换为primary新主库,提供对外服务。
[omm@gsdb02 ~]$ gs_ctl failover -D /u01/openGauss/data/db1
[2020-07-16 09:52:32.055][13523][][gs_ctl]: gs_ctl failover ,datadir is -D "/u01/openGauss/data/db1"
[2020-07-16 09:52:32.055][13523][][gs_ctl]: failover term (1)
[2020-07-16 09:52:32.068][13523][][gs_ctl]: waiting for server to failover....
[2020-07-16 09:52:33.095][13523][][gs_ctl]: done
[2020-07-16 09:52:33.095][13523][][gs_ctl]: failover completed (/u01/openGauss/data/db1)
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Down Manually stopped | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 P Primary Normal
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$
备库下连接数据库,操作t1表,模拟新数据变动。
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-----------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
(2 rows)
testdb=> insert into t1 values(3,'failover-after',now());
INSERT 0 1
testdb=>
testdb=> select * from t1;
id | name | dt
----+-----------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
(3 rows)
testdb=> \q
[omm@gsdb02 ~]$
至此,主库意外宕机无法访问,备库执行FailOver切换成主库,对外提供服务。
后续修复原主库后,可重新加入HA主备环境,如下:
原主库恢复,重新以standby模式启动,自动加入集群,数据同步正常。
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Down Manually stopped | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 P Primary Normal
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_ctl start -D /u01/openGauss/data/db1 -M standby
[2020-07-16 09:54:33.134][15234][][gs_ctl]: gs_ctl started,datadir is -D "/u01/openGauss/data/db1"
[2020-07-16 09:54:33.185][15234][][gs_ctl]: waiting for server to start...
.0 [BACKEND] LOG: Begin to start openGauss Database.
2020-07-16 09:54:33.285 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: Recovery parallelism, cpu count = 4, max = 4, actual = 4
2020-07-16 09:54:33.285 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 DB001 0 [REDO] LOG: ConfigRecoveryParallelism, true_max_recovery_parallelism:4, max_recovery_parallelism:4
2020-07-16 09:54:33.286 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Transparent encryption disabled.
2020-07-16 09:54:33.303 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: InitNuma numaNodeNum: 1 numa_distribute_mode: none inheritThreadPool: 0.
2020-07-16 09:54:33.303 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (1024 Mbytes) or shared memory (4213 Mbytes) is larger.
2020-07-16 09:54:33.383 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set data cache size(805306368)
2020-07-16 09:54:33.413 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [CACHE] LOG: set metadata cache size(268435456)
2020-07-16 09:54:33.662 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: gaussdb: fsync file "/u01/openGauss/data/db1/gaussdb.state.temp" success
2020-07-16 09:54:33.663 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: create gaussdb state file success: db state(STARTING_STATE), server mode(Standby)
2020-07-16 09:54:33.687 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: max_safe_fds = 978, usable_fds = 1000, already_open = 12
2020-07-16 09:54:33.688 5f0fb359.1 [unknown] 140639763324672 [unknown] 0 dn_6001_6002 00000 0 [BACKEND] LOG: Success to start openGauss Database, please press any key to exit...
[2020-07-16 09:54:34.203][15234][][gs_ctl]: done
[2020-07-16 09:54:34.203][15234][][gs_ctl]: server started (/u01/openGauss/data/db1)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 S Standby Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 P Primary Normal
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-----------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
(3 rows)
testdb=> \q
[omm@gsdb01 ~]$
最后重新主备角色正常switchover切换。
操作t1表,模拟新数据变动。
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 S Standby Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 P Primary Normal
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-----------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
(3 rows)
testdb=> insert into t1 values(4,'switchover-before',now());
INSERT 0 1
testdb=>
testdb=> select * from t1;
id | name | dt
----+-------------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
4 | switchover-before | 2020-07-16 10:15:24
(4 rows)
testdb=> \q
[omm@gsdb02 ~]$
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 S Standby Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 P Primary Normal
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-------------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
4 | switchover-before | 2020-07-16 10:15:24
(4 rows)
testdb=> \q
[omm@gsdb01 ~]$
开始进行switchover切换,数据变动正常,集群恢复,数据同步OK.
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_ctl switchover -D /u01/openGauss/data/db1
[2020-07-16 10:16:22.315][16634][][gs_ctl]: gs_ctl switchover ,datadir is -D "/u01/openGauss/data/db1"
[2020-07-16 10:16:22.315][16634][][gs_ctl]: switchover term (1)
[2020-07-16 10:16:22.334][16634][][gs_ctl]: waiting for server to switchover.................
[2020-07-16 10:16:36.539][16634][][gs_ctl]: done
[2020-07-16 10:16:36.539][16634][][gs_ctl]: switchover completed (/u01/openGauss/data/db1)
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t refreshconf
Generating dynamic configuration file for all nodes.
Successfully generated dynamic configuration file.
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Normal
[omm@gsdb01 ~]$
[omm@gsdb01 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-------------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
4 | switchover-before | 2020-07-16 10:15:24
(4 rows)
testdb=> insert into t1 values(5,'switchover-after',now());
INSERT 0 1
testdb=>
testdb=> select * from t1;
id | name | dt
----+-------------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
4 | switchover-before | 2020-07-16 10:15:24
5 | switchover-after | 2020-07-16 10:22:09
(5 rows)
testdb=> \q
[omm@gsdb01 ~]$
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Normal
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip instance state | node node_ip instance state
------------------------------------------------------------------------------------------------------------------------------------------------------------
1 gsdb01 192.168.0.195 6001 /u01/openGauss/data/db1 P Primary Normal | 2 gsdb02 192.168.0.96 6002 /u01/openGauss/data/db1 S Standby Normal
[omm@gsdb02 ~]$
[omm@gsdb02 ~]$ gsql -d testdb -U aps2 -p 40000 -W aps2#12345
gsql ((openGauss 1.0.0 build 0bd0ce80) compiled at 2020-06-30 18:19:27 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.
testdb=> select * from t1;
id | name | dt
----+-------------------+---------------------
1 | openGauss195 | 2020-07-10 15:30:37
2 | failover-before | 2020-07-16 09:50:17
3 | failover-after | 2020-07-16 09:53:20
4 | switchover-before | 2020-07-16 10:15:24
5 | switchover-after | 2020-07-16 10:22:09
(5 rows)
testdb=> \q
[omm@gsdb02 ~]$
最后修改时间:2020-07-24 17:05:39
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




