gs_replace
数据库集群是由多台服务器组成的,当集群中某些服务器故障或者服务器上的某些实例发生故障后,为了使GaussDB 100快速地恢复正常,用户可以使用gs_replace工具将发生故障的服务器替换为正常服务器,将发生故障的实例替换为正常实例。
背景信息
其中替换服务器根据使用场景不同,可分为如下两种:
- 服务器替换前后IP一致:将新服务器的名称、IP与故障服务器的名称、IP修改一致,然后做替换。适用于用户保证替换服务器的名称、IP和被替换服务器的名称、IP一致的场景,使用此方案时要求的前置条件少,能快速实现替换。
- 服务器替换前后IP不一致:无需修改新服务器的名称、IP,直接做替换。适用于替换服务器的名称、IP和被替换服务器的名称、IP不一致的场景,也可用于将一个高性能的机器替换一个性能较差但其上实例状态正常的场景,使用此方案时要求的前置条件较多。
“不换IP”和“换IP”是从集群配置角度的差异,“不换IP”指不修改集群配置文件的IP和hostname信息,“换IP”是指需要修改集群配置文件的IP和hostname信息。
前提条件
- 分布式部署模式下替换故障实例
- 故障服务器不能存在CN实例。
- 集群状态可查,主CM Server状态正常,集群状态不能为Unavailable。
- 替换前不能锁定集群。
- 实例替换必须在一个没有完全损坏的服务器上执行。
- 在替换实例时,各服务器必须互信正常。
- 替换实例操作会修复故障服务器下的全部故障实例。
- 替换实例期间集群不能执行DDL和DML操作。
- 集群状态必须异常。
- 分布式部署模式下替换服务器(不换IP)
- 故障服务器不能存在CN实例。
- 集群状态可查,主CM Server状态正常,集群状态不能为Unavailable。
- 替换前不能锁定集群。
- 节点替换必须在一个没有完全损坏的服务器上执行。
- 在节点替换前,基于当前集群配置文件执行前置成功。
- 新服务器的IP、主机名等信息要和损坏的服务器一致。
- 需保证新服务器和集群其他服务器时间一致。
- 替换服务器期间集群不能执行DDL和DML操作。
- 已经卸载故障节点。
- 集群状态必须异常。
- 分布式部署模式下替换服务器(换IP)
替换主机-换IP的方案相当于集群温备方案,即无需修改新机IP、hostname的情况下,将处于运行态的服务器直接替换为新服务器。
- 故障服务器不能存在CN实例。
- 集群状态可查,主CM Server状态正常,集群状态不能为Unavailable。
- 替换前不能锁定集群。
- 节点替换必须在一个没有完全损坏的服务器上执行。
- 新服务器已经准备好。
- 需保证新服务器和集群其他服务器时间一致。
- 基于新服务器IP的集群配置文件已准备好,与老的集群配置文件相比,仅修改了IP和hostname。
- 在节点替换(换IP)前,基于新集群配置文件执行前置成功。
- 在执行过程中不能执行DDL和DML语句。
- 已经卸载故障节点。
- 集群状态必须异常。
- 主备部署模式下替换故障实例
- 集群状态可查,且主CM Server状态正常。
- 对于故障服务器上的DN所在的group,HA集群需保证主DN正常;Z-Paxos集群除了要保证DN主正常外,还需保证每个group中正常DN个数大于等于总数(不包含passive DN)的1/2+1(向下取整)个。
- 替换前,故障服务器上的DN所在的group不能被锁定。
- 实例替换必须在一个没有完全损坏的服务器上执行。
- 在替换实例时,各服务器必须互信正常。
- 替换实例操作会修复故障节点下的全部故障实例。
- 替换实例期间故障服务器上的DN所在的group不能执行DDL和DML操作。
- 集群状态必须异常。
- 主备部署模式下替换节点(不换IP)
- 集群状态可查,且主CM Server状态正常。
- 对于故障服务器上的DN所在的group,HA集群需保证DN主正常;Z-Paxos集群除了要保证DN主正常外,还需保证每个group中正常DN个数大于等于总数(不包含passive DN)的1/2+1(向下取整)个。
- 替换前,故障服务器上的DN所在的group不能被锁定。
- 实例替换必须在一个没有完全损坏的主机上执行。
- 新服务器的IP、服务器名等信息要和损坏的服务器一致。
- 需保证新服务器和集群其他服务器时间一致。
- 在节点替换前,基于当前集群配置文件执行前置成功。
- 替换节点期间故障服务器上的DN所在的group不能执行DDL和DML操作。
- 已经卸载故障节点。
- 集群状态必须异常。
- 主备部署模式下替换节点(换IP)
- 集群状态可查,且主CM Server状态正常。
- 对于故障服务器上的DN所在的group,HA集群需保证DN主正常;Z-Paxos集群除了要保证主DN正常外,还需保证每个group中正常DN个数大于等于总数(不包含passive DN)的1/2+1(向下取整)个。
- 节点替换必须在一个没有完全损坏的服务器上执行。
- 新服务器已经准备好。
- 需保证新服务器和集群其他服务器时间一致。
- 集群状态可查,且主CM Server状态正常。
- 基于新服务器IP的集群配置文件已准备好,与老的集群配置文件相比,仅修改了IP和hostname。
- 在节点替换(换IP)前,基于新集群配置文件执行前置成功。
- 替换节点期间故障服务器上的DN所在的group不能执行DDL和DML操作。
- 已经卸载故障节点。
- 集群状态必须异常。
语法
- 新增服务器
gs_replace -t install -h HOSTNAME [-l LOGFILE]
- 配置实例、服务器
gs_replace -t config -h HOSTNAME [-l LOGFILE]
- 启动新增服务器
gs_replace -t start -h HOSTNAME [-l LOGFILE]
- 替换服务器(换IP)
gs_replace -t warm-standby -X XMLFILE [-l LOGFILE]
- 显示帮助信息
gs_replace -? | --help
- 显示版本号信息
gs_replace -V | --version
参数说明
gs_replace参数可以分为如下几类:
- 通用参数:
- -t
om命令的类型。
取值范围:warm-standby、install、config和start.
说明:
进行过主备切换后,如果当前集群状态中balanced字段为false,再继续执行warm-standby,集群状态中balanced字段会显示为true,但实际集群DN主备状态并未改变。
- -l
指定日志文件及存放路径。
默认值:$GAUSSLOG/om/gs_replace-YYYY-MM-DD_hhmmss.log
- -t
- 安装新增实例、主机参数:
-h
指定新增服务器或新增实例所在服务器的名称。可以指定多个服务器名称,以逗号分隔。
取值范围:服务器名称。
- 配置新增实例、主机参数:
-h
指定替换服务器或替换实例所在服务器的名称。可以指定多个服务器名称,以逗号分隔。
取值范围:服务器名称。
- 启动新增实例、服务器参数:
-h
指定需要启动服务器或实例所在服务器的名称。可以指定多个服务器名称,以逗号分隔。
取值范围:服务器名称。
- 替换节点(换IP)参数:
- -X
集群配置文件路径。
取值范围:clusterconfig.xml的路径。
- -l
指定日志文件及存放路径。
在内部会自动给日志名添加一个时间戳。
- 当既不明确指定-l,又不在XML文件中配置gaussdbLogPath时,默认值为“$GAUSSLOG/用户名/om/gs_replace-YYYY-MM-DD_hhmmss.log”。
- 当不明确指定-l,但在XML文件中配置了gaussdbLogPath时,默认值为“gaussdbLogPath的值、用户名和om/gs_replace-YYYY-MM-DD_hhmmss.log”的组合。
- -X
- 其他参数:
- -?,--help
显示帮助信息。
- -V,--version
显示版本号信息。
- -?,--help
示例
- 替换故障实例。
- 配置替换的实例。
omm@plat1:/opt/software/gaussdb/script> gs_replace -t config -h plat1,plat2 Check cluster status for replace. Check the status of ETCD cluster. Successfully check the status of ETCD cluster. Successfully check cluster status. Filter out all valid hosts for replacing. Distributing configuration to remote host. Successfully distributed configuration to remote host. Configuring Stopping replace instances. Successfully stopped replace instances. Waiting for upgrading standby instances. Successfully upgraded standby instances. Configuring replacement instances. Delete broken instances for primary instances' raft. Successfully delete broken instances for primary instances' raft. Config replace instances. Successfully config replace instances. Add cover instances for primary instances' raft. Successfully add broken instances for primary instances' raft. Config standby DN of new instances. ....................143s Config standby DN of new instances successfully. Successfully configured replacement instances. Configuration succeeded. ============================== Time statistics: Config replacement nodes: 264s total: 264s
- 启动替换实例所在服务器。
omm@plat1:/opt/software/gaussdb/script> gs_replace -t start -h plat1,plat2 Starting. ============================== ..2s Start cm agent on new nodes. Successfully start cm agent for new nodes. Starting the cluster. .1s Successfully started instance process. Waiting to become Normal. ============================== Successfully started cluster. ============================== Time statistics: Start replacement nodes: 27s total: 27s
- 替换服务器(不换IP)。
- 在故障节点执行卸载当前节点命令,卸载故障节点。
omm@plat4:/opt/software/gaussdb/script> gs_uninstall --delete-data -L Check preinstall on every node. Successfully checked preinstall on every node. Stop cluster. Check logfile path. Clean crontab. Clean crontab successfully. Kill process for components. Kill process for components successfully. Uninstall components Uninstall components successfully. Modifying user's environmental variable. Successfully modified user's environmental variable. Clean tmp files and logs. Successfully clean cluster's tmp and logs. Successful uninstallation
- 准备集群环境。
plat1:/opt/software/gaussdb/script #./gs_preinstall -U omm -G dbgrp -X /opt/software/gaussdb/clusterconfig.xml Parsing the configuration file. Successfully parsed the configuration file. Installing the tools on the local node. Successfully installed the tools on the local node. Are you sure you want to create trust for root (yes/no)? yes Please enter password for root. Password: Creating SSH trust for the root permission user. Checking network information. All nodes in the network are Normal. Successfully checked network information. Creating SSH trust. Creating the local key file. Successfully created the local key files. Appending local ID to authorized_keys. Successfully appended local ID to authorized_keys. Updating the known_hosts file. Successfully updated the known_hosts file. Appending authorized_key on the remote node. Successfully appended authorized_key on all remote node. Checking common authentication file content. Successfully checked common authentication content. Distributing SSH trust file to all node. Successfully distributed SSH trust file to all node. Verifying SSH trust on all hosts. Successfully verified SSH trust on all hosts. Successfully created SSH trust. Successfully created SSH trust for the root permission user. Pass over configuring LVM Distributing package. Successfully distributed package. Are you sure you want to create the user[omm] and create trust for it (yes/no)? yes Please enter password for cluster user. Password: Please enter password for cluster user again. Password: Creating [omm] user on all nodes. Successfully created [omm] user on all nodes. Installing the tools in the cluster. Successfully installed the tools in the cluster. Checking hostname mapping. Successfully checked hostname mapping. Creating SSH trust for [omm] user. Please enter password for current user[omm]. Password: Checking network information. All nodes in the network are Normal. Successfully checked network information. Creating SSH trust. Creating the local key file. Successfully created the local key files. Appending local ID to authorized_keys. Successfully appended local ID to authorized_keys. Updating the known_hosts file. Successfully updated the known_hosts file. Appending authorized_key on the remote node. Successfully appended authorized_key on all remote node. Checking common authentication file content. Successfully checked common authentication content. Distributing SSH trust file to all node. Successfully distributed SSH trust file to all node. Verifying SSH trust on all hosts. Successfully verified SSH trust on all hosts. Successfully created SSH trust. Successfully created SSH trust for [omm] user. Checking OS version. Successfully checked OS version. Creating cluster's path. Successfully created cluster's path. Setting SCTP service. Successfully set SCTP service. Set and check OS parameter. Setting OS parameters. Successfully set OS parameters. Set and check OS parameter completed. Preparing CRON service. Successfully prepared CRON service. Preparing SSH service. Successfully prepared SSH service. Setting user environmental variables. Successfully set user environmental variables. Configuring alarms on the cluster nodes. Successfully configured alarms on the cluster nodes. Setting the dynamic link library. Successfully set the dynamic link library. Fixing server package owner. Successfully fixed server package owner. Create logrotate service. Successfully create logrotate service. Setting finish flag. Successfully set finish flag. Preinstallation succeeded.
- 安装新增的服务器。
omm@plat1:/opt/software/gaussdb/script> gs_replace -t install -h plat3 Distributing configuration to remote host. Successfully distributed configuration to remote host. Installing. Checking installation environment on nodes. Successfully checking installation environment on nodes. Installing applications on nodes. Installation is completed. ============================== Time statistics: Install replacement nodes: 9s total: 9s
- 配置新增的服务器。
omm@plat1:/opt/software/gaussdb/script> gs_replace -t config -h plat3 Check cluster status for replace. Check the status of ETCD cluster. Successfully check the status of ETCD cluster. Successfully check cluster status. Filter out all valid hosts for replacing. Distributing configuration to remote host. Successfully distributed configuration to remote host. Configuring Stopping replace instances. Successfully stopped replace instances. Waiting for upgrading standby instances. Successfully upgraded standby instances. Configuring replacement instances. Delete broken instances for primary instances' raft. Successfully delete broken instances for primary instances' raft. Config replace instances. Successfully config replace instances. Add cover instances for primary instances' raft. Successfully add broken instances for primary instances' raft. Config standby DN of new instances. ...................112s Config standby DN of new instances successfully. Successfully configured replacement instances. Configuration succeeded. ============================== Time statistics: Config replacement nodes: 268s total: 268s
- 启动新增的服务器。
omm@plat1:/opt/software/gaussdb/script> gs_replace -t start -h plat3 Starting. ============================== ..2s Start cm agent on new nodes. Successfully start cm agent for new nodes. Starting the cluster. ......6s Successfully started instance process. Waiting to become Normal. ============================== Successfully started cluster. ============================== Time statistics: Start replacement nodes: 35s total: 35s
- 替换服务器(换IP)。
- 在故障节点执行卸载当前节点命令,卸载故障节点。
omm@plat4:/opt/software/gaussdb/script> gs_uninstall --delete-data -L Check preinstall on every node. Successfully checked preinstall on every node. Stop cluster. Check logfile path. Clean crontab. Clean crontab successfully. Kill process for components. Kill process for components successfully. Uninstall components Uninstall components successfully. Modifying user's environmental variable. Successfully modified user's environmental variable. Clean tmp files and logs. Successfully clean cluster's tmp and logs. Successful uninstallation
- 在集群配置文件中将plat4替换为plat5,然后执行如下命令完成前置操作。
plat1:/opt/software/gaussdb/script #./gs_preinstall -U omm -G dbgrp -X /opt/software/gaussdb/clusterconfig.xml Parsing the configuration file. Successfully parsed the configuration file. Installing the tools on the local node. Successfully installed the tools on the local node. Are you sure you want to create trust for root (yes/no)? yes Please enter password for root. Password: Creating SSH trust for the root permission user. Checking network information. All nodes in the network are Normal. Successfully checked network information. Creating SSH trust. Creating the local key file. Successfully created the local key files. Appending local ID to authorized_keys. Successfully appended local ID to authorized_keys. Updating the known_hosts file. Successfully updated the known_hosts file. Appending authorized_key on the remote node. Successfully appended authorized_key on all remote node. Checking common authentication file content. Successfully checked common authentication content. Distributing SSH trust file to all node. Successfully distributed SSH trust file to all node. Verifying SSH trust on all hosts. Successfully verified SSH trust on all hosts. Successfully created SSH trust. Successfully created SSH trust for the root permission user. Pass over configuring LVM Distributing package. Successfully distributed package. Are you sure you want to create the user[omm] and create trust for it (yes/no)? yes Please enter password for cluster user. Password: Please enter password for cluster user again. Password: Creating [omm] user on all nodes. Successfully created [omm] user on all nodes. Installing the tools in the cluster. Successfully installed the tools in the cluster. Checking hostname mapping. Successfully checked hostname mapping. Creating SSH trust for [omm] user. Please enter password for current user[omm]. Password: Checking network information. All nodes in the network are Normal. Successfully checked network information. Creating SSH trust. Creating the local key file. Successfully created the local key files. Appending local ID to authorized_keys. Successfully appended local ID to authorized_keys. Updating the known_hosts file. Successfully updated the known_hosts file. Appending authorized_key on the remote node. Successfully appended authorized_key on all remote node. Checking common authentication file content. Successfully checked common authentication content. Distributing SSH trust file to all node. Successfully distributed SSH trust file to all node. Verifying SSH trust on all hosts. Successfully verified SSH trust on all hosts. Successfully created SSH trust. Successfully created SSH trust for [omm] user. Checking OS version. Successfully checked OS version. Creating cluster's path. Successfully created cluster's path. Setting SCTP service. Successfully set SCTP service. Set and check OS parameter. Setting OS parameters. Successfully set OS parameters. Set and check OS parameter completed. Preparing CRON service. Successfully prepared CRON service. Preparing SSH service. Successfully prepared SSH service. Setting user environmental variables. Successfully set user environmental variables. Configuring alarms on the cluster nodes. Successfully configured alarms on the cluster nodes. Setting the dynamic link library. Successfully set the dynamic link library. Fixing server package owner. Successfully fixed server package owner. Create logrotate service. Successfully create logrotate service. Setting finish flag. Successfully set finish flag. Preinstallation succeeded.
- 执行如下命令进行替换服务器操作。
omm@plat1:/opt/software/gaussdb/script> gs_replace -t warm-standby -X /opt/software/gaussdb/clusterconfig.xml Checking the cluster configuration differences. Successfully checked the cluster configuration differences. Check cluster status for replace. Check the status of ETCD cluster. Successfully check the status of ETCD cluster. Successfully check cluster status. Checking replace condition. Successfully checked replace condition. Check Preinstall Flag. Creating the backup directory. Successfully created backup directory. Changing replace IP. Changing instance configuration files. Successfully changed instance configuration files. Updating cluster database node info. Successfully update cluster database node info. Successfully changed replacement IP addresses. Install GaussDB 100 on plat11. ..............14s Successfully install GaussDB 100 on warm-standby nodes. Config GaussDB 100 on plat11. ..................167s Successfully config GaussDB 100 on warm-standby nodes. Start GaussDB 100 on plat11. ...........54s Successfully started warm-standby nodes. Warm standby is completed. ============================== Time statistics: Update cluster configuration file.: 381s total: 381s
相关命令
gs_preinstall,gs_install,gs_uninstall
「喜欢文章,快来给作者赞赏墨值吧」【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。评论
- 在集群配置文件中将plat4替换为plat5,然后执行如下命令完成前置操作。
- 在故障节点执行卸载当前节点命令,卸载故障节点。
- 配置新增的服务器。
- 安装新增的服务器。
- 准备集群环境。
- 在故障节点执行卸载当前节点命令,卸载故障节点。
- 启动替换实例所在服务器。
- 配置替换的实例。
- 配置实例、服务器