暂无图片
暂无图片
9
暂无图片
暂无图片
暂无图片

金仓 KES V9 主备集群一键在线扩容 - 命令行方式

原创 Lucifer三思而后行 2025-02-27
649

大家好,这里是公众号 DBA学习之路,分享一些学习国产数据库路上的知识和经验。

目录

前言

记得在金仓公众号看到过一篇文章:重生之我在甲方做DBA,从单机到集群,易如反掌~,写的比较有意思。

这篇文章介绍的是金仓数据库单机转集群的一些策略以及大致步骤,单机数据库在线扩展成集群,从单打独斗到同舟共“集”。告别单机瓶颈,轻松拥抱高可用!

为了避免绩效评定为 D,我打算学习一下!

介绍

KingbaseES 提供数据库扩缩容工具进行数据库集群的在线扩缩容。对于不支持图形化(GUI)的服务器,KingbaseES 提供基于命令行操作的集群扩缩容方式,本文主要用于在不支持 GUI 的服务器上的 KingbaseES 版本集群扩缩容工作。

金仓数据库为我们提供了 3 种单机扩集群的方式:

  • 数据迁移
  • 离线扩展
  • 在线扩展

我们可以根据业务需求、资源可用性、成本预算和技术支持等因素进行综合考虑。

我这里演示的是金仓主备集群使用命令行静默一键扩容的方式。

前置准备

环境信息:

角色 主机名 IP 版本 CPU 内存 硬盘
kesrwc01 192.168.6.87 银河麒麟 Kylin V10 x86 8G 100G
kesrwc02 192.168.6.88 银河麒麟 Kylin V10 x86 8G 100G
待扩节点 kesv9 192.168.6.98 银河麒麟 Kylin V10 x86 8G 100G

在正式开始集群扩容之前,需要做好以下几个准备:

1、已部署好一主一备集群,可参考文章:金仓 KES RWC(读写分离集群)静默安装部署指南

2、已准备好进行扩容操作的节点,配置好系统即可,不需要安装软件,可参考文章:金仓数据库 KingbaseES V9 单机安装指南

3、待扩容或缩容节点需要从集群主节点上获取如下文件:

  • 集群安装目录 /KingbaseES/V9/ClientTools/guitools/DeployTools/zip/ 中的 db.zip
  • 集群安装目录 /install 下的一键部署脚本 cluster_install.shinstall.conftrust_cluster.sh
  • 如果有 license.dat 授权文件也需要

从主备集群中收集以上文件存放到 /soft 目录中:

[root@kesrwc01 ~]# mkdir /soft [root@kesrwc01 ~]# cp /KingbaseES/V9/ClientTools/guitools/DeployTools/zip/db.zip /soft/ [root@kesrwc01 ~]# cp /install/cluster_install.sh /soft/ [root@kesrwc01 ~]# cp /install/install.conf /soft/ [root@kesrwc01 ~]# cp /install/trust_cluster.sh /soft/ [root@kesrwc01 ~]# ll /soft/ 总用量 319872 -rwxr-xr-x 1 root root 252402 2月 26 16:26 cluster_install.sh -rw-r--r-- 1 root root 327258132 2月 26 16:23 db.zip -rw-r--r-- 1 root root 19430 2月 26 16:26 install.conf -rwxr-xr-x 1 root root 9677 2月 26 16:26 trust_cluster.sh

然后拷贝到需要扩容的单机主机上:

[root@kesrwc01 ~]# scp /soft/* 192.168.6.98:/soft

使用集群安装用户在待扩容节点,将上面从主备集群中获取到的文件拷贝至单机目录 /install 下:

[root@kesv9 soft]# ll 总用量 319872 -rwxr-xr-x 1 root root 252402 2月 26 16:29 cluster_install.sh -rw-r--r-- 1 root root 327258132 2月 26 16:29 db.zip -rw-r--r-- 1 root root 19430 2月 26 16:29 install.conf -rwxr-xr-x 1 root root 9677 2月 26 16:29 trust_cluster.sh [root@kesv9 soft]# cp * /install/ [root@kesv9 soft]# chown -R kingbase:kingbase /install/

确保文件权限属主需为集群安装用户即可。

集群扩容

配置 install.conf(单机)

待扩容节点配置 install.conf 文件,首先需要修改 all_ip 选项:

## 互信需要所有节点 IP,包括扩容的节点 all_ip=(192.168.6.87 192.168.6.88 192.168.6.98)

然后填写 expand 标签下各参数信息**:

## config of create a standby/witness node. ## when the cluster is in quorum or sync mode and expand sync standby node, ## it may automatically adjust synchronous_node and synchronous_standby_count parameters. [expand] expand_type="0" # The node type of standby/witness node, which would be add to cluster. 0:standby 1:witness primary_ip="192.168.6.87" # The ip addr of cluster primary node, which need to expand a standby/witness node. expand_ip="192.168.6.98" # The ip addr of standby/witness node, which would be add to cluster. node_id="3" # The node_id of standby/witness node, which would be add to cluster. It does not the same with any one in cluster node # for example: node_id="3" sync_type="" # the sync_type parameter is used to specify the sync type for expand node. 0:sync 1:potential 2:async # this parameter is only valid when expand_type="0" and the synchronous parameter of the cluster is set to custom mode. ## Specific instructions ,see it under [install] install_dir="/KingbaseES/V9/cluster" # the last layer of directory could not add '/' zip_package="/install/db.zip" net_device=(ens33) # if virtual_ip set,it must be set net_device_ip=(192.168.6.98) # if virtual_ip set,it must be set license_file=() deploy_by_sshd="1" ssh_port="22" scmd_port="8890"

这里需要修改的信息可以参考下图,修改要求可以参考注释即可:

SSH 免密配置(单机)

静默安装需要配置免密,使用 root 用户执行 trust_cluster.sh 脚本即可:

[root@kesv9 ~]# cd /install/ [root@kesv9 install]# sh trust_cluster.sh [INFO] set password-free between root and kingbase Generating public/private rsa key pair. ... ... ... ## 过程中根据提示输入密码即可 root@192.168.6.87's password: id_rsa 100% 2590 1.3MB/s 00:00 id_rsa.pub 100% 564 486.4KB/s 00:00 authorized_keys 100% 564 590.3KB/s 00:00 known_hosts ... ... ... connect to "192.168.6.87" from current node by 'ssh' kingbase:0..... OK connect to "192.168.6.87" from current node by 'ssh' root:0..... OK connect to "192.168.6.88" from "192.168.6.87" by 'ssh' kingbase->kingbase:0 .... OK connect to "192.168.6.88" from "192.168.6.87" by 'ssh' root->root:0 root->kingbase:0 kingbase->root:0.... OK connect to "192.168.6.88" from current node by 'ssh' kingbase:0..... OK connect to "192.168.6.88" from current node by 'ssh' root:0..... OK connect to "192.168.6.98" from "192.168.6.88" by 'ssh' kingbase->kingbase:0 .... OK connect to "192.168.6.98" from "192.168.6.88" by 'ssh' root->root:0 root->kingbase:0 kingbase->root:0.... OK connect to "192.168.6.98" from current node by 'ssh' kingbase:0..... OK connect to "192.168.6.98" from current node by 'ssh' root:0..... OK connect to "192.168.6.87" from "192.168.6.98" by 'ssh' kingbase->kingbase:0 .... OK connect to "192.168.6.87" from "192.168.6.98" by 'ssh' root->root:0 root->kingbase:0 kingbase->root:0.... OK check ssh connection success!

互信成功即可。

扩容前准备

📢 注意:待扩容节点不需要建库,否则在扩容过程中会提示关闭数据库,而且加入集群后使用的也是集群库的数据,所以待扩容节点建库是多此一举:

如果待扩容节点已经建库,那就要手动关闭数据库:

[kingbase@kesv9:/install]$ sys_ctl stop 等待服务器进程关闭 .... 完成 服务器进程已经关闭

如果已经建库并且数据存放目录与集群目录一样,需要先将待扩容节点的数据存放目录备份改名:

[root@kesv9 ~]# mv /data/ /databak [root@kesv9 ~]# mkdir /data [root@kesv9 ~]# chown kingbase:kingbase /data

防止扩容过程中提示冲突:

有一些集群的参数与单机配置有区别,所以需要提前修改操作系统参数,否则扩容过程中会有一些告警和报错:

修改参数:

[root@kesv9 ~]# vi /etc/sysctl.conf ## 修改为如下配置 kernel.sem= 5010 641280 5010 256 ## 然后执行生效 [root@kesv9 ~]# sysctl -p

确保准备都好了之后就可以开始扩容了。

执行扩容(单机)

2、待扩容节点执行扩容 standby 节点操作

[kingbase@kesv9:/install]$ sh cluster_install.sh expand [CONFIG_CHECK] will deploy the cluster of [RUNNING] success connect to the target "192.168.6.98" ..... OK [RUNNING] success connect to "192.168.6.98" from current node by 'ssh' ..... OK [RUNNING] success connect to the target "192.168.6.87" ..... OK [RUNNING] success connect to "192.168.6.87" from current node by 'ssh' ..... OK [RUNNING] Primary node ip is 192.168.6.87 ... [RUNNING] Primary node ip is 192.168.6.87 ... OK [CONFIG_CHECK] set install_with_root=1 [RUNNING] success connect to the target "192.168.6.98" ..... OK [RUNNING] success connect to "192.168.6.98" from current node by 'ssh' ..... OK [RUNNING] success connect to the target "192.168.6.87" ..... OK [RUNNING] success connect to "192.168.6.87" from current node by 'ssh' ..... OK [INSTALL] load config from cluster..... [INFO] db_user=system [INFO] db_port=54321 [INFO] use_scmd=1 [INFO] data_directory=/data [INFO] scmd_port=8890 [INFO] recovery=standby [INFO] use_check_disk=off [INFO] trusted_servers=192.168.6.254 [INFO] reconnect_attempts=10 [INFO] reconnect_interval=6 [INFO] auto_cluster_recovery_level=1 [INFO] synchronous=quorum [INSTALL] load config from cluster.....OK [CONFIG_CHECK] file format is correct ... OK [CONFIG_CHECK] check database connection ... [CONFIG_CHECK] check database connection ... OK [CONFIG_CHECK] expand_ip[192.168.6.98] is not used in the cluster ... [CONFIG_CHECK] expand_ip[192.168.6.98] is not used in the cluster ...ok [CONFIG_CHECK] The localhost is expand_ip:[192.168.6.98] ... [CONFIG_CHECK] The localhost is expand_ip:[192.168.6.98] ...ok [CONFIG_CHECK] check node_id is in cluster ... [CONFIG_CHECK] check node_id is in cluster ...OK [RUNNING] check the db is running or not... [RUNNING] the db is not running on "192.168.6.98:54321" ..... OK [RUNNING] the install dir is not exist on "192.168.6.98" ..... OK [RUNNING] check the sys_securecmdd is running or not... [RUNNING] the sys_securecmdd is not running on "192.168.6.98:8890" ..... OK [INFO] use_ssl=0 2025-02-26 17:27:36 [INFO] start to check system parameters on 192.168.6.98 ... 2025-02-26 17:27:36 [WARNING] [GSSAPIAuthentication] yes (should be: no) on 192.168.6.98 2025-02-26 17:27:36 [INFO] [UseDNS] is null on 192.168.6.98 2025-02-26 17:27:37 [INFO] [UsePAM] yes on 192.168.6.98 2025-02-26 17:27:37 [INFO] [ulimit.open files] 65536 on 192.168.6.98 2025-02-26 17:27:37 [INFO] [ulimit.open proc] 65536 on 192.168.6.98 2025-02-26 17:27:38 [INFO] [ulimit.core size] unlimited on 192.168.6.98 2025-02-26 17:27:38 [INFO] [ulimit.mem lock] 64 (less than 50000000) on 192.168.6.98 2025-02-26 17:27:38 [INFO] the value of [ulimit.mem lock] is wrong, now will change it on 192.168.6.98 ... 2025-02-26 17:27:38 [INFO] change ulimit.mem lock on 192.168.6.98 ... 2025-02-26 17:27:39 [INFO] change ulimit.mem lock on 192.168.6.98 ... Done 2025-02-26 17:27:39 [INFO] [ulimit.mem lock] 50000000 on 192.168.6.98 2025-02-26 17:27:40 [ERROR] [kernel.sem] 250 32000 100 128 (no less than: 5010 641280 5010 256) on 192.168.6.98 2025-02-26 17:27:40 [INFO] the value of [kernel.sem] is wrong, now will change it on 192.168.6.98 ... 2025-02-26 17:27:40 [INFO] change kernel.sem on 192.168.6.98 ... 2025-02-26 17:27:41 [INFO] change kernel.sem on 192.168.6.98 ... Done 2025-02-26 17:27:42 [INFO] [kernel.sem] 5010 641280 5010 256 on 192.168.6.98 2025-02-26 17:27:43 [INFO] [RemoveIPC] no on 192.168.6.98 2025-02-26 17:27:43 [INFO] [DefaultTasksAccounting] is null on 192.168.6.98 2025-02-26 17:27:43 [INFO] write file "/etc/udev/rules.d/kingbase.rules" on 192.168.6.98 2025-02-26 17:27:45 [INFO] [crontab] chmod /usr/bin/crontab ... 2025-02-26 17:27:45 [INFO] [crontab] chmod /usr/bin/crontab ... Done 2025-02-26 17:27:45 [INFO] [crontab access] OK 2025-02-26 17:27:46 [INFO] [cron.allow] add kingbase to cron.allow ... 2025-02-26 17:27:46 [INFO] [cron.allow] add kingbase to cron.allow ... Done 2025-02-26 17:27:47 [INFO] [crontab auth] crontab is accessible by kingbase now on 192.168.6.98 2025-02-26 17:27:47 [INFO] [SELINUX] disabled on 192.168.6.98 2025-02-26 17:27:48 [INFO] [firewall] down on 192.168.6.98 2025-02-26 17:27:48 [INFO] [The memory] OK on 192.168.6.98 2025-02-26 17:27:49 [INFO] [The hard disk] OK on 192.168.6.98 2025-02-26 17:27:49 [INFO] [ping] chmod /bin/ping ... 2025-02-26 17:27:49 [INFO] [ping] chmod /bin/ping ... Done 2025-02-26 17:27:50 [INFO] [ping access] OK 2025-02-26 17:27:50 [INFO] [/bin/cp --version] on 192.168.6.98 OK 2025-02-26 17:27:50 [INFO] [Virtual IP] Not configured on 192.168.6.98 [INSTALL] create the install dir "/KingbaseES/V9/cluster/kingbase" on 192.168.6.98 ... [INSTALL] success to create the install dir "/KingbaseES/V9/cluster/kingbase" on "192.168.6.98" ..... OK [INSTALL] try to copy the zip package "/install/db.zip" to /KingbaseES/V9/cluster/kingbase of "192.168.6.98" ..... [INSTALL] success to scp the zip package "/install/db.zip" /KingbaseES/V9/cluster/kingbase of to "192.168.6.98" ..... OK [INSTALL] decompress the "/KingbaseES/V9/cluster/kingbase" to "/KingbaseES/V9/cluster/kingbase" on 192.168.6.98 [INSTALL] success to decompress the "/KingbaseES/V9/cluster/kingbase/db.zip" to "/KingbaseES/V9/cluster/kingbase" on "192.168.6.98"..... OK [INSTALL] check license_file "default" [INSTALL] success to access license_file on 192.168.6.98: /KingbaseES/V9/cluster/kingbase/bin/license.dat [RUNNING] config sys_securecmdd and start it ... [RUNNING] config the sys_securecmdd port to 8890 ... [RUNNING] success to config the sys_securecmdd port on 192.168.6.98 ... OK successfully initialized the sys_securecmdd, please use "/KingbaseES/V9/cluster/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd [RUNNING] success to config sys_securecmdd on 192.168.6.98 ... OK Created symlink /etc/systemd/system/multi-user.target.wants/securecmdd.service → /etc/systemd/system/securecmdd.service. [RUNNING] success to start sys_securecmdd on 192.168.6.98 ... OK [INSTALL] success to access file: /KingbaseES/V9/cluster/kingbase/etc/all_nodes_tools.conf [INSTALL] success to scp the /KingbaseES/V9/cluster/kingbase/etc/repmgr.conf from 192.168.6.87 to "192.168.6.98"..... ok [INSTALL] success to scp the ~/.encpwd from 192.168.6.87 to "192.168.6.98"..... ok [INSTALL] success to scp /KingbaseES/V9/cluster/kingbase/etc/all_nodes_tools.conf from "192.168.6.87" to "192.168.6.98" ...ok [INSTALL] success to chmod 600 the ~/.encpwd on 192.168.6.98..... ok [INFO] parameter_name=node_id [INFO] parameter_values='3' [INFO] [parameter_name] para_exist=1 [INFO] sed -i "/[#]*node_id[ ]*=/cnode_id='3'" /KingbaseES/V9/cluster/kingbase/etc/repmgr.conf [INFO] parameter_name=node_name [INFO] parameter_values='node3' [INFO] [parameter_name] para_exist=1 [INFO] sed -i "/[#]*node_name[ ]*=/cnode_name='node3'" /KingbaseES/V9/cluster/kingbase/etc/repmgr.conf [INFO] parameter_name=conninfo [INFO] parameter_values='host [INFO] [parameter_name] para_exist=1 [INFO] sed -i "/[#]*conninfo[ ]*=/cconninfo='host=192.168.6.98 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000'" /KingbaseES/V9/cluster/kingbase/etc/repmgr.conf [INFO] parameter_name=ping_path [INFO] parameter_values='/bin' [INFO] [parameter_name] para_exist=1 [INFO] sed -i "/[#]*ping_path[ ]*=/cping_path='/bin'" /KingbaseES/V9/cluster/kingbase/etc/repmgr.conf [RUNNING] standby clone ... [WARNING] following problems with command line parameters detected: -D/--sysdata will be ignored if a repmgr configuration file is provided [NOTICE] destination directory "/data" provided [INFO] connecting to source node [DETAIL] connection string is: host=192.168.6.87 user=esrep port=54321 dbname=esrep [DETAIL] current installation size is 102 MB [NOTICE] checking for available walsenders on the source node (2 required) [NOTICE] checking replication connections can be made to the source server (2 required) [INFO] checking and correcting permissions on existing directory "/data" [INFO] creating replication slot as user "esrep" [NOTICE] starting backup (using sys_basebackup)... [INFO] executing: /KingbaseES/V9/cluster/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /data -h 192.168.6.87 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_3 [NOTICE] standby clone (using sys_basebackup) complete [NOTICE] you can now start your Kingbase server [HINT] for example: sys_ctl -D /data start [HINT] after starting the server, you need to register this standby with "repmgr standby register" [RUNNING] standby clone ...OK [RUNNING] db start ... waiting for server to start.... done server started [RUNNING] db start ...OK [INFO] connecting to local node "node3" (ID: 3) [INFO] connecting to primary database [WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1) [INFO] standby registration complete [NOTICE] standby node "node3" (ID: 3) successfully registered 2025-02-26 17:28:24 begin to start DB on "[localhost]". 2025-02-26 17:28:25 DB on "[localhost]" already started, connect to check it. 2025-02-26 17:28:26 DB on "[localhost]" start success. 2025-02-26 17:28:26 Ready to start local kbha daemon and repmgrd daemon ... 2025-02-26 17:28:26 begin to start repmgrd on "[localhost]". [2025-02-26 17:28:27] [NOTICE] using provided configuration file "/KingbaseES/V9/cluster/kingbase/bin/../etc/repmgr.conf" [2025-02-26 17:28:27] [INFO] creating directory "/KingbaseES/V9/cluster/kingbase/log"... [2025-02-26 17:28:27] [NOTICE] redirecting logging output to "/KingbaseES/V9/cluster/kingbase/log/hamgr.log" 2025-02-26 17:28:29 repmgrd on "[localhost]" start success. [2025-02-26 17:28:32] [NOTICE] redirecting logging output to "/KingbaseES/V9/cluster/kingbase/log/kbha.log" 2025-02-26 17:28:34 Done. ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.6.87 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.6.88 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.6.98 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 [RUNNING] query archive command at 192.168.6.87 ... [RUNNING] current cluster not config sys_rman,return.

这样就代表已经扩容成功。

检查集群

在集群主节点执行检查:

[kingbase@kesrwc01:/home/kingbase]$ repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string ----+-------+---------+-----------+----------+----------+----------+----------+---------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.6.87 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.6.88 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 3 | node3 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.6.98 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=2 keepalives_interval=2 keepalives_count=3 tcp_user_timeout=9000 [kingbase@kesrwc01:/home/kingbase]$ repmgr service status ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+-------+---------+-----------+----------+---------+-------+---------+-------------------- 1 | node1 | primary | * running | | running | 4639 | no | n/a 2 | node2 | standby | running | node1 | running | 5049 | no | 0 second(s) ago 3 | node3 | standby | running | node1 | running | 36501 | no | 1 second(s) ago

确认新扩容节点已正确加入集群,集群状态正常。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论