容灾关系正常时解除
主集群主节点执行:
gs_sdr -t stop -X /soft/sdr_main.xml -U sdr_test -W sdr3@test
备集群主节点执行:
gs_sdr -t failover
备集群主节点不能使用 gs_sdr -t stop,会报错:
[omm@bpanweidb02 ~]$ gs_sdr -t stop -X /database/panweidb/soft/cluster_config.xml
--------------------------------------------------------------------------------
Streaming disaster recovery stop c48a0dce1d9911efa694000c29a8e394
--------------------------------------------------------------------------------
Start remove streaming disaster relationship.
Got the step for action:[stop].
Successfully check cluster status is: Normal.
[GAUSS-51632] : Failed to check cluster type, standby cluster is not supported for stop.
容灾关系异常时解除
如果 gs_sdr -t stop 以及 gs_sdr -t failover 均无法使本集群的 gs_sdr -t query 变为 normal,可尝试下面的方法。
1. 相关参数介绍
agent_backup_open
- 0:CM 不按照灾备集群模式运行(未搭建容灾、容灾主集群);
- 2:CM 按照灾备集群模式运行(容灾备集群);
- 重启 cm_agent 生效;
backup_open
- 0:CM 不按照灾备集群模式运行(未搭建容灾、容灾主集群);
- 1:CM 按照灾备集群模式运行(容灾备集群);
- 重启 cm_server 生效;
stream_cluster_run_mode
- cluster_primary:表示节点是主实例的节点(未搭建容灾、容灾主集群);
- cluster_standby:表示节点是备实例的节点(容灾备集群);
- postmaster 类型。
/omm/CMServer/backup_open
- 0:容灾主集群;
- 2:容灾备集群;
- omm 替换成数据库安装用户名。
2. 步骤
2.1 如果是备集群
- 设置参数
- 单个节点执行;
- 注意 omm 替换成数据库安装用户名。
cm_ctl set --param --agent -k 'agent_backup_open=0' && cm_ctl set --param --server -k 'backup_open=0' && cm_ctl ddb --put /omm/CMServer/backup_open 0 && gs_guc set -I all -N all -c "stream_cluster_run_mode='cluster_primary'"
- 测试:
[omm@panwei-b1 ~]$ cm_ctl set --param --agent -k 'agent_backup_open=0' && cm_ctl set --param --server -k 'backup_open=0' && cm_ctl ddb --put /omm/CMServer/backup_open 0 && gs_guc set -I all -N all -c "stream_cluster_run_mode='cluster_primary'"
cm_ctl: set cm_agent.conf success.
cm_ctl: set cm_server.conf success.
cm_ctl: exec ddb --put command success.
cm_ctl:
The pw_guc run with the following arguments: [gs_guc -I all -N all -c stream_cluster_run_mode='cluster_primary' set ].
Begin to perform the total nodes: 3.
Popen count is 3, Popen success count is 3, Popen failure count is 0.
Begin to perform gs_guc for datanodes.
Command count is 3, Command success count is 3, Command failure count is 0.
Total instances: 3. Failed instances: 0.
ALL: Success to perform gs_guc!
- 检查是否设置成功
- 单个节点执行;
- 前三条均为0,最后一条所有节点均为 cluster_primary,表示参数为独立集群状态。
(cm_ctl list --param --agent | grep 'agent_backup_open') && (cm_ctl list --param --server | grep 'backup_open') && cm_ctl ddb --get /omm/CMServer/backup_open && gs_guc check -I all -N all -c "stream_cluster_run_mode"
- 测试:
[omm@panwei-b1 ~]$ (cm_ctl list --param --agent | grep 'agent_backup_open') && (cm_ctl list --param --server | grep 'backup_open') && cm_ctl ddb --get /omm/CMServer/backup_open && gs_guc check -I all -N all -c "stream_cluster_run_mode"
agent_backup_open = 0
agent_backup_open = 0
agent_backup_open = 0
backup_open = 0
backup_open = 0
backup_open = 0
cm_ctl: exec ddb --get command success.
cm_ctl: 0
The pw_guc run with the following arguments: [gs_guc -I all -N all -c stream_cluster_run_mode check ].
Total GUC values: 3. Failed GUC values: 0.
The value of parameter stream_cluster_run_mode is same on all instances.
stream_cluster_run_mode='cluster_primary'
- 移除 streaming_cabin 和 streaming_lock 文件
- 单个节点执行;
- streaming_lock 文件不一定存在,正常情况下会自动删除。
gs_ssh -c "mkdir $PGHOST/sdr_file_backup_1054 ; mv $PGHOST/streaming_cabin $PGHOST/streaming_lock* $PGHOST/sdr_file_backup_1054/"
- 测试:
[omm@panwei-b1 ~]$ gs_ssh -c "mkdir $PGHOST/sdr_file_backup_1054 ; mv $PGHOST/streaming_cabin $PGHOST/streaming_lock* $PGHOST/sdr_file_backup_1054/"
Failed to execute command on all nodes.
Output:
[GAUSS-51400] : Failed to execute the command: sh /database/panweidb/tmp/ClusterCall_4012597.sh. Error:
[FAILURE] panwei-b1:
mv: cannot stat '/database/panweidb/tmp/streaming_lock*': No such file or directory
[FAILURE] panwei-b2:
mv: cannot stat '/database/panweidb/tmp/streaming_lock*': No such file or directory
[FAILURE] panwei-b3:
mv: cannot stat '/database/panweidb/tmp/streaming_lock*': No such file or directory
.
- 重启集群
- stream_cluster_run_mode 参数为 postmaster 类型,所以需要重启。
date && gs_om -t stop && gs_om -t start
- 检查状态
gs_om -t status --detail
gs_ssh -c "gs_sdr -t query"
2.2 如果是主集群
在上面的步骤中,直接执行步骤2,如果参数均为独立集群状态,执行步骤3即可解除容灾关系,无需重启集群。
(注意:如果备集群此时没有解除容灾关系,两集群仍会存在流复制关系,备集群变为独立集群后流复制会自行中断。)
2.3 如果不需要重新搭建容灾集群
两个集群需要手动删除 replconninfo、复制槽、编辑 pg_hba.conf 等。
最后修改时间:2025-03-10 11:38:59
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




