介绍
KingbaseES 读写分离集群支持扩缩容,可以通过界面或者命令行方式进行操作,界面操作相对比较容易,只需要在节点增加删除节点即可,本文介绍如何通过命令行方式进行扩缩容。
环境
操作系统版本: CentOS Linux release 7.6.1810
数据库版本:KingbaseES_V009R001C002B0014
IP 规划:
| 服务器IP | 角色 |
|---|---|
| 192.168.20.251 | 主节点 |
| 192.168.20.252 | 备节点 |
| 192.168.20.253 | witness节点 |
| 192.168.20.245 | 待扩缩容节点 |
实施步骤
服务器环境准备
检查操作系统信息
[root@kingbase04 /]# cat /etc/*release*
CentOS Linux release 7.6.1810 (Core)
Derived from Red Hat Enterprise Linux 7.6 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
CentOS Linux release 7.6.1810 (Core)
CentOS Linux release 7.6.1810 (Core)
cpe:/o:centos:centos:7
检查系统内存与存储空间
[root@ks01 /]# free -m
total used free shared buff/cache available
Mem: 23947 726 19761 102 3459 21295
Swap: 8063 0 8063
[root@dmdem kingbasev9]# df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 2.0G 0 2.0G 0% /dev
tmpfs 2.0G 0 2.0G 0% /dev/shm
tmpfs 2.0G 8.9M 2.0G 1% /run
tmpfs 2.0G 0 2.0G 0% /sys/fs/cgroup
/dev/mapper/centos-root 92G 32G 60G 35% /
/dev/sda1 197M 146M 51M 75% /boot
tmpfs 396M 0 396M 0% /run/user/2001
配置内核参数
vi /etc/sysctl.conf
fs.aio-max-nr= 1048576
fs.file-max= 6815744
kernel.shmall= 2097152
kernel.shmmax= 4294967295
kernel.shmmni= 4096
kernel.sem= 250 32000 100 128
net.ipv4.ip_local_port_range= 9000 65500
net.core.rmem_default= 262144
net.core.rmem_max= 4194304
net.core.wmem_default= 262144
net.core.wmem_max= 1048576
生效
/sbin/sysctl -p
/sbin/sysctl -a
- 资源使用参数
vi /etc/security/limits.conf
# *表示所有用户,可只设置root和kingbase用户
* soft nofile 65536
# 注意:设置nofile的hard limit不能大于/proc/sys/fs/nr_open,否则注销后将无法正常登陆
* hard nofile 65535
* soft nproc 65536
* hard nproc 65535
# unlimited表示无限制
* soft core unlimited
* hard core unlimited
- RemoveIPC参数
systemd-logind服务中引入的一个特性,是当一个用户退出系统后,会删除所有有关的IPC对象。该特性由/etc/systemd/logind.conf文件中的RemoveIPC参数控制。某些操作系统会默认打开,会造成程序信号丢失等问题(只有redhat7及以上和一些特殊的国产Linux的版本需要修改,改之前可先查看此项是否为默认yes)。设置RemoveIPC=no。 设置后重启服务
systemctl daemon-reload
systemctl restart systemd-logind.service
创建安装用户
[root@kingbase04 ~]# useradd -m kingbase
[root@kingbase04 ~]# id kingbase
uid=1003(kingbase) gid=1004(kingbase) groups=1004(kingbase)
[root@kingbase04 ~]# echo "es123456"|passwd --stdin kingbase
Changing password for user kingbase.
passwd: all authentication tokens updated successfully.
扩容操作
检查现有集群状态
[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | witness | * running | node1 | default | 0 | n/a | | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
准备扩容所需文件
从先前部署集群的节点(或数据库安装包的/ClientTools/guitools/DeployTools/中)获取如下文件
-
db.zip
-
cluster_install.sh
-
install.conf
-
trust_cluster.sh
-
所需的license.dat文件
找到上述文件后,上传至扩容节点
# 待扩容节点
[kingbase@kingbase04 ~]$ pwd
/home/kingbase
[kingbase@kingbase04 ~]$ ll
total 0
[kingbase@kingbase04 ~]$ mkdir soft
[kingbase@kingbase04 ~]$ cd soft
[kingbase@kingbase04 soft]$ pwd
/home/kingbase/soft
# 集群节点1通过scp 传输文件
[kingbase@ogauss1 zip]# pwd
/opt/Kingbase/ES/V9/ClientTools/guitools/DeployTools/zip
[kingbase@ogauss1 zip]# scp * 192.168.20.245:/home/kingbase/soft
[kingbase@ogauss1 ~]$ scp /home/kingbase/cluster/mykes/kcluster/kingbase/bin/license.dat 192.168.20.245:/home/kingbase/soft
license.dat
从先前部署集群的节点/home/kingbase/cluster/mykes/kcluster/kingbase/bin获取如下文件
-
root_env_init.sh
-
root_env_check.sh
-
arping
# 集群节点1通过scp 传输文件
[kingbase@ogauss1 bin]$ pwd
/home/kingbase/cluster/mykes/kcluster/kingbase/bin
[root@ogauss1 bin]# scp root_env_init.sh root_env_check.sh arping 192.168.20.245:/home/kingbase/soft
文件准备 完成如下
[kingbase@kingbase04 ~]$ cd soft/
[kingbase@kingbase04 soft]$ ll
total 322448
-rwxr-xr-x 1 kingbase kingbase 13544 Feb 26 16:05 arping
-rwxr-xr-x 1 kingbase kingbase 252402 Feb 26 16:02 cluster_install.sh
-rw-r--r-- 1 kingbase kingbase 327258132 Feb 26 16:02 db.zip
-rw-r--r-- 1 kingbase kingbase 19379 Feb 26 16:02 install.conf
-rwxr-xr-x 1 kingbase kingbase 11380 Feb 26 16:05 root_env_check.sh
-rwxr-xr-x 1 kingbase kingbase 10680 Feb 26 16:05 root_env_init.sh
-rw-r--r-- 1 kingbase kingbase 2595145 Feb 26 16:02 securecmdd.zip
-rwxr-xr-x 1 kingbase kingbase 9677 Feb 26 16:02 trust_cluster.sh
配置install.conf文件
[kingbase@kingbase04 soft]$ pwd
/home/kingbase/soft
[kingbase@kingbase04 soft]$ vi install.conf
[expand]
expand_type="0" # The node type of standby/witness node, which would be add to cluster. 0:standby 1:witness
primary_ip="192.168.20.251" # The ip addr of cluster primary node, which need to expand a standby/witness node.
expand_ip="192.168.20.245" # The ip addr of standby/witness node, which would be add to cluster.
node_id="4" # The node_id of standby/witness node, which would be add to cluster. It does not the same with any one in cluster node
# for example: node_id="3"
sync_type="" # the sync_type parameter is used to specify the sync type for expand node. 0:sync 1:potential 2:async
# this parameter is only valid when expand_type="0" and the synchronous parameter of the cluster is set to custom mode.
## Specific instructions ,see it under [install]
install_dir="/home/kingbase/cluster/mykes/kcluster/kingbase/bin" # the last layer of directory could not add '/'
zip_package="/home/kingbase/soft/db.zip"
net_device=() # if virtual_ip set,it must be set
net_device_ip=() # if virtual_ip set,it must be set
license_file=(license.dat)
deploy_by_sshd="1"
ssh_port="22"
scmd_port="8890"
SSH免密配置
首先使用root用户执行root_env_init.sh kingbase完成环境初始化。
# 待扩容节点
[root@kingbase04 soft]# ./root_env_init.sh kingbase
[Wed Feb 26 16:29:35 CST 2025] [INFO] change UsePAM ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change UsePAM ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change ulimit ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change ulimit ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change kernel.sem ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change kernel.sem ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] no need to change "/etc/profile"
[Wed Feb 26 16:29:35 CST 2025] [INFO] stop selinux ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] stop selinux ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change RemoveIPC ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change RemoveIPC ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change DefaultTasksAccounting ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change DefaultTasksAccounting ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping6 ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping6 ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /sbin/ip ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /sbin/ip ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] copy /opt/kes/bin/arping ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] copy /opt/kes/bin/arping ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /opt/kes/bin/arping ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /opt/kes/bin/arping ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /usr/bin/crontab ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /usr/bin/crontab ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] configuration to take effect ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] configuration to take effect ... Done
执行完成后,使用root用户执行root_env_check.sh kingbase,检查是否完成所有的初始化修改。
# 待扩容节点
[root@kingbase04 soft]# ./root_env_check.sh kingbase
[Wed Feb 26 16:29:51 CST 2025] [INFO] [su - kingbase -c "echo su_info_check"] su_info_check
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ulimit.open files] 655360
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ulimit.open proc] 655360
[Wed Feb 26 16:29:51 CST 2025] [INFO] [kernel.sem] 5010 641280 5010 256
[Wed Feb 26 16:29:51 CST 2025] [INFO] [RemoveIPC] no
[Wed Feb 26 16:29:51 CST 2025] [INFO] [DefaultTasksAccounting] no
[Wed Feb 26 16:29:51 CST 2025] [INFO] [crond] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [SELINUX] disabled
[Wed Feb 26 16:29:51 CST 2025] [INFO] [firewall] down
[Wed Feb 26 16:29:51 CST 2025] [INFO] [The memory] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping6 command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping6 access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [/bin/cp --version] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ip command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ip access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [arping command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [arping -U command] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [arping access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [crontab command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [crontab access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [kingbase crontab access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [sys_securecmdd dir] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [sys_securecmdd user dir] OK
集群部署用户(不依赖root部署的场景)执行命令"trust_cluster.sh"进行SSH免密配置
# 待扩容节点
[kingbase@kingbase04 soft]$ ./trust_cluster.sh
[ERROR] [all_ip] and [production_ip] both are empty, please check your [install.conf] file
# 调整如下参数
[kingbase@kingbase04 soft]$ vi install.conf
all_ip=(192.168.20.251 192.168.20.252 192.168.20.253 192.168.20.245)
install_with_root=0
[kingbase@kingbase04 soft]$ ./trust_cluster.sh
[INFO] set password-free only between kingbase
Generating public/private rsa key pair.
Your identification has been saved in /home/kingbase/.ssh/id_rsa.
Your public key has been saved in /home/kingbase/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Q/2F0MkynV/7ix8lvf/DoHpV+RS8e77lbIJ115v3eGQ kingbase@kingbase04
The key's randomart image is:
+---[RSA 2048]----+
| .+ o. |
| .o.*. o.|
| . .o....=|
| . . ..*.|
| S . o.B|
| . o.oE|
| oo+BB|
| o...XO|
| .o o*%|
+----[SHA256]-----+
Warning: Permanently added '192.168.20.251' (ECDSA) to the list of known hosts.
kingbase@192.168.20.251's password:
known_hosts 100% 347 45.1KB/s 00:00
id_rsa 100% 1679 283.6KB/s 00:00
id_rsa.pub 100% 401 303.4KB/s 00:00
authorized_keys 100% 401 202.8KB/s 00:00
Warning: Permanently added '192.168.20.252' (ECDSA) to the list of known hosts.
kingbase@192.168.20.252's password:
known_hosts 100% 523 85.1KB/s 00:00
id_rsa 100% 1679 278.9KB/s 00:00
id_rsa.pub 100% 401 327.1KB/s 00:00
authorized_keys 100% 401 333.0KB/s 00:00
Warning: Permanently added '192.168.20.253' (ECDSA) to the list of known hosts.
Password:
known_hosts 100% 699 1.1MB/s 00:00
id_rsa 100% 1679 2.9MB/s 00:00
id_rsa.pub 100% 401 745.8KB/s 00:00
authorized_keys 100% 401 760.4KB/s 00:00
Warning: Permanently added '192.168.20.245' (ECDSA) to the list of known hosts.
known_hosts 100% 875 1.6MB/s 00:00
id_rsa 100% 1679 3.8MB/s 00:00
id_rsa.pub 100% 401 744.3KB/s 00:00
authorized_keys 100% 401 889.3KB/s 00:00
connect to "192.168.20.251" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.252" from "192.168.20.251" by 'ssh' kingbase->kingbase:0 .... OK
connect to "192.168.20.252" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.253" from "192.168.20.252" by 'ssh' kingbase->kingbase:0 .... OK
connect to "192.168.20.253" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.245" from "192.168.20.253" by 'ssh' kingbase->kingbase:0 .... OK
connect to "192.168.20.245" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.251" from "192.168.20.245" by 'ssh' kingbase->kingbase:0 .... OK
check ssh connection success!
集群扩容
[kingbase@kingbase04 soft]$ ./cluster_install.sh expand
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.168.20.245" ..... OK
[RUNNING] success connect to "192.168.20.245" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.168.20.251" ..... OK
[RUNNING] success connect to "192.168.20.251" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.168.20.251 ...
[RUNNING] Primary node ip is 192.168.20.251 ... OK
[CONFIG_CHECK] set install_with_root=0
[INSTALL] load config from cluster.....
[INFO] db_user=system
[INFO] db_port=54321
[INFO] use_scmd=1
[INFO] data_directory=/home/kingbase/cluster/mykes/kcluster/kingbase/data
[INFO] scmd_port=8890
[INFO] recovery=standby
[INFO] use_check_disk=off
[INFO] trusted_servers=127.0.0.1
[INFO] reconnect_attempts=10
[INFO] reconnect_interval=6
[INFO] auto_cluster_recovery_level=0
[INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] success to access license_file: /home/kingbase/soft/license.dat
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] check database connection ...
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] expand_ip[192.168.20.245] is not used in the cluster ...
[CONFIG_CHECK] expand_ip[192.168.20.245] is not used in the cluster ...ok
[CONFIG_CHECK] The localhost is expand_ip:[192.168.20.245] ...
[CONFIG_CHECK] The localhost is expand_ip:[192.168.20.245] ...ok
[CONFIG_CHECK] check node_id is in cluster ...
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] check the db is running or not...
[RUNNING] the db is not running on "192.168.20.245:54321" ..... OK
[RUNNING] the install dir is not exist on "192.168.20.245" ..... OK
[RUNNING] check the sys_securecmdd is running or not...
[RUNNING] the sys_securecmdd is not running on "192.168.20.245:8890" ..... OK
[INFO] use_ssl=0
[INSTALL] create the install dir "/home/kingbase/cluster/mykes/kcluster/kingbase" on 192.168.20.245 ...
[INSTALL] success to create the install dir "/home/kingbase/cluster/mykes/kcluster/kingbase" on "192.168.20.245" ..... OK
[INSTALL] try to copy the zip package "/home/kingbase/soft/db.zip" to /home/kingbase/cluster/mykes/kcluster/kingbase of "192.168.20.245" .....
[INSTALL] success to scp the zip package "/home/kingbase/soft/db.zip" /home/kingbase/cluster/mykes/kcluster/kingbase of to "192.168.20.245" ..... OK
[INSTALL] decompress the "/home/kingbase/cluster/mykes/kcluster/kingbase" to "/home/kingbase/cluster/mykes/kcluster/kingbase" on 192.168.20.245
[INSTALL] success to decompress the "/home/kingbase/cluster/mykes/kcluster/kingbase/db.zip" to "/home/kingbase/cluster/mykes/kcluster/kingbase" on "192.168.20.245"..... OK
[INSTALL] check license_file "license.dat"
[INSTALL] Scp license to /home/kingbase/cluster/mykes/kcluster/kingbase/../license.dat on 192.168.20.245
[INSTALL] success to copy /home/kingbase/soft/license.dat to /home/kingbase/cluster/mykes/kcluster/kingbase/../ on 192.168.20.245
[RUNNING] config sys_securecmdd and start it ...
[RUNNING] config the sys_securecmdd port to 8890 ...
[RUNNING] success to config the sys_securecmdd port on 192.168.20.245 ... OK
successfully initialized the sys_securecmdd, please use "/home/kingbase/cluster/mykes/kcluster/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd
[RUNNING] success to config sys_securecmdd on 192.168.20.245 ... OK
[RUNNING] success to start sys_securecmdd on 192.168.20.245 ... OK
[INSTALL] success to access file: /home/kingbase/cluster/mykes/kcluster/kingbase/etc/all_nodes_tools.conf
[INSTALL] success to scp the /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf from 192.168.20.251 to "192.168.20.245"..... ok
[INSTALL] success to scp the ~/.encpwd from 192.168.20.251 to "192.168.20.245"..... ok
[INSTALL] success to scp /home/kingbase/cluster/mykes/kcluster/kingbase/etc/all_nodes_tools.conf from "192.168.20.251" to "192.168.20.245" ...ok
[INSTALL] success to chmod 600 the ~/.encpwd on 192.168.20.245..... ok
[INFO] parameter_name=node_id
[INFO] parameter_values='4'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*node_id[ ]*=/cnode_id='4'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
[INFO] parameter_name=node_name
[INFO] parameter_values='node4'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*node_name[ ]*=/cnode_name='node4'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
[INFO] parameter_name=conninfo
[INFO] parameter_values='host
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*conninfo[ ]*=/cconninfo='host=192.168.20.245 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
[INFO] parameter_name=ping_path
[INFO] parameter_values='/bin'
[INFO] [parameter_name] para_exist=1
[INFO] sed -i "/[#]*ping_path[ ]*=/cping_path='/bin'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
[RUNNING] standby clone ...
[WARNING] following problems with command line parameters detected:
-D/--sysdata will be ignored if a repmgr configuration file is provided
[NOTICE] destination directory "/home/kingbase/cluster/mykes/kcluster/kingbase/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=192.168.20.251 user=esrep port=54321 dbname=esrep
[DETAIL] current installation size is 88 MB
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] checking and correcting permissions on existing directory "/home/kingbase/cluster/mykes/kcluster/kingbase/data"
[INFO] creating replication slot as user "esrep"
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
/home/kingbase/cluster/mykes/kcluster/kingbase/bin/sys_basebackup -l "repmgr base backup" -D /home/kingbase/cluster/mykes/kcluster/kingbase/data -h 192.168.20.251 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_4
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /home/kingbase/cluster/mykes/kcluster/kingbase/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
[RUNNING] standby clone ...OK
[RUNNING] db start ...
waiting for server to start.... done
server started
[RUNNING] db start ...OK
[INFO] connecting to local node "node4" (ID: 4)
[INFO] connecting to primary database
[WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
[INFO] standby registration complete
[NOTICE] standby node "node4" (ID: 4) successfully registered
2025-02-26 19:31:50 begin to start DB on "[localhost]".
2025-02-26 19:31:51 DB on "[localhost]" already started, connect to check it.
2025-02-26 19:31:52 DB on "[localhost]" start success.
2025-02-26 19:31:52 Ready to start local kbha daemon and repmgrd daemon ...
2025-02-26 19:31:52 begin to start repmgrd on "[localhost]".
[2025-02-26 19:31:53] [NOTICE] using provided configuration file "/home/kingbase/cluster/mykes/kcluster/kingbase/bin/../etc/repmgr.conf"
[2025-02-26 19:31:53] [NOTICE] redirecting logging output to "/home/kingbase/cluster/mykes/kcluster/kingbase/log/hamgr.log"
2025-02-26 19:31:55 repmgrd on "[localhost]" start success.
[2025-02-26 19:31:58] [NOTICE] redirecting logging output to "/home/kingbase/cluster/mykes/kcluster/kingbase/log/kbha.log"
2025-02-26 19:31:59 Done.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | witness | * running | node1 | default | 0 | n/a | | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
4 | node4 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.245 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] query archive command at 192.168.20.251 ...
[RUNNING] current cluster not config sys_rman,return.
You have new mail in /var/spool/mail/kingbase
注:可能机器性能问题,调整了cluster_install.sh中关于ssh连接的参数,
ConnectTimeout=30
ServerAliveInterval=5
ServerAliveCountMax=5
ssh -q -o Batchmode=yes -o ConnectTimeout=30 -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -p 22 -o ServerAliveInterval=5 -o ServerAliveCountMax=5
扩容后检查
[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 19932 | no | n/a
2 | node2 | standby | running | node1 | running | 10930 | no | 1 second(s) ago
3 | node3 | witness | * running | node1 | running | 1905 | no | 1 second(s) ago
4 | node4 | standby | running | node1 | running | 51445 | no | 0 second(s) ago
缩容操作
检查现有集群状态
[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 19932 | no | n/a
2 | node2 | standby | running | node1 | running | 10930 | no | 1 second(s) ago
3 | node3 | witness | * running | node1 | running | 1905 | no | 1 second(s) ago
4 | node4 | standby | running | node1 | running | 51445 | no | 0 second(s) ago
准备缩容所需文件
[kingbase@kingbase04 soft]$ ll
total 322452
-rwxr-xr-x 1 kingbase kingbase 13544 Feb 26 16:05 arping
-rwxr-xr-x 1 kingbase kingbase 252324 Feb 26 19:21 cluster_install.sh <=========
-rw-r--r-- 1 kingbase kingbase 327258132 Feb 26 16:02 db.zip
-rw-r--r-- 1 kingbase kingbase 19598 Feb 26 19:05 install.conf <=========
-rwxr-xr-x 1 kingbase kingbase 3827 Feb 26 17:35 license.dat
-rwxr-xr-x 1 kingbase kingbase 11380 Feb 26 16:05 root_env_check.sh
-rwxr-xr-x 1 kingbase kingbase 10680 Feb 26 16:05 root_env_init.sh
-rw-r--r-- 1 kingbase kingbase 2595145 Feb 26 17:24 securecmdd.zip
-rwxr-xr-x 1 kingbase kingbase 9677 Feb 26 16:02 trust_cluster.sh
配置install.conf文件
[kingbase@kingbase04 soft]$ vi install.conf
[shrink]
shrink_type="0" # The node type of standby/witness node, which would be delete from cluster. 0:standby 1:witness
primary_ip="192.168.20.251" # The ip addr of cluster primary node, which need to shrink a standby/witness node.
shrink_ip="192.168.20.245" # The ip addr of standby/witness node, which would be delete from cluster.
node_id="4" # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in cluster node
# for example: node_id="3"
## Specific instructions ,see it under [install]
install_dir="/home/kingbase/cluster/mykes/kcluster" # the last layer of directory could not add '/'
ssh_port="22" # the port of ssh, default is 22
scmd_port="8890" # the port of sys_securecmd, default is 8890
集群缩容
[kingbase@kingbase04 soft]$ ./cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of
[RUNNING] success connect to the target "192.168.20.245" ..... OK
[RUNNING] success connect to "192.168.20.245" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.168.20.251" ..... OK
[RUNNING] success connect to "192.168.20.251" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.168.20.251 ...
[RUNNING] Primary node ip is 192.168.20.251 ... OK
[CONFIG_CHECK] set install_with_root=0
[INSTALL] load config from cluster.....
[INFO] db_user=system
[INFO] db_port=54321
[INFO] use_scmd=1
[INFO] auto_cluster_recovery_level=0
[INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] check database connection ...
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] shrink_ip[192.168.20.245] is a standby node IP in the cluster ...
[CONFIG_CHECK] shrink_ip[192.168.20.245] is a standby node IP in the cluster ...ok
[CONFIG_CHECK] The localhost is shrink_ip:[192.168.20.245] or primary_ip:[192.168.20.251]...
[CONFIG_CHECK] The localhost is shrink_ip:[192.168.20.245] or primary_ip:[192.168.20.251]...ok
[RUNNING] Primary node ip is 192.168.20.251 ...
[RUNNING] Primary node ip is 192.168.20.251 ... OK
[CONFIG_CHECK] check node_id is in cluster ...
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] The /home/kingbase/cluster/mykes/kcluster/kingbase/bin dir exist on "192.168.20.245" ...
[RUNNING] The /home/kingbase/cluster/mykes/kcluster/kingbase/bin dir exist on "192.168.20.245" ... OK
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | witness | * running | node1 | default | 0 | n/a | | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
4 | node4 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.245 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] Del node is standby ...
[INFO] node:192.168.20.245 can be deleted ... OK
[RUNNING] query archive command at 192.168.20.251 ...
[RUNNING] current cluster not config sys_rman,return.
[Wed Feb 26 19:39:19 CST 2025] [INFO] /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr standby unregister --node-id=4 ...
[INFO] connecting to local standby
[INFO] connecting to primary database
[NOTICE] unregistering node 4
[INFO] SET synchronous TO "quorum" on primary host
[INFO] change synchronous_standby_names from "ANY 1( node2,node4)" to "ANY 1( node2)"
[INFO] try to drop slot "repmgr_slot_4" of node 4 on primary node
[WARNING] replication slot "repmgr_slot_4" is still active on node 4
[INFO] standby unregistration complete
[Wed Feb 26 19:39:20 CST 2025] [INFO] /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr standby unregister --node-id=4 ...OK
[Wed Feb 26 19:39:20 CST 2025] [INFO] check db connection ...
[Wed Feb 26 19:39:20 CST 2025] [INFO] check db connection ...ok
2025-02-26 19:39:25 Ready to stop local kbha daemon and repmgrd daemon ...
2025-02-26 19:39:31 begin to stop repmgrd on "[localhost]".
2025-02-26 19:39:32 repmgrd on "[localhost]" stop success.
2025-02-26 19:39:32 Done.
2025-02-26 19:39:32 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2025-02-26 19:39:33 DB on "[localhost]" stop success.
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | witness | * running | node1 | default | 0 | n/a | | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[Wed Feb 26 19:39:33 CST 2025] [INFO] drop replication slot:repmgr_slot_4...
pg_drop_replication_slot
--------------------------
(1 row)
[Wed Feb 26 19:39:34 CST 2025] [INFO] drop replication slot:repmgr_slot_4...OK
[Wed Feb 26 19:39:34 CST 2025] [INFO] modify synchronous parameter configuration...
[Wed Feb 26 19:39:35 CST 2025] [INFO] modify synchronous parameter configuration...ok
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
2 | node2 | standby | running | node1 | default | 100 | 1 | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
3 | node3 | witness | * running | node1 | default | 0 | n/a | | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
检查现有集群状态
[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 19932 | no | n/a
2 | node2 | standby | running | node1 | running | 10930 | no | 1 second(s) ago
3 | node3 | witness | * running | node1 | running | 1905 | no | 1 second(s) ago
总结
不管是通过命令行或者图形化进行扩缩容,目的都是一样,命令行整体操作不算复杂,关键步骤就是配置文件的调整,另外利用命令行可以实现自动化,对接其他平台实现全链路。各个的好处,可以根据实际情况自行挑选合适的。




