暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

命令行方式--金仓读写分离集群扩缩容

349

介绍

KingbaseES 读写分离集群支持扩缩容,可以通过界面或者命令行方式进行操作,界面操作相对比较容易,只需要在节点增加删除节点即可,本文介绍如何通过命令行方式进行扩缩容。

环境

操作系统版本: CentOS Linux release 7.6.1810

数据库版本:KingbaseES_V009R001C002B0014

IP 规划:

服务器IP 角色
192.168.20.251 主节点
192.168.20.252 备节点
192.168.20.253 witness节点
192.168.20.245 待扩缩容节点

实施步骤

服务器环境准备

检查操作系统信息

[root@kingbase04 /]# cat /etc/*release*
CentOS Linux release 7.6.1810 (Core) 
Derived from Red Hat Enterprise Linux 7.6 (Source)
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

CentOS Linux release 7.6.1810 (Core) 
CentOS Linux release 7.6.1810 (Core) 
cpe:/o:centos:centos:7

检查系统内存与存储空间

[root@ks01 /]# free -m
              total        used        free      shared  buff/cache   available
Mem:          23947         726       19761         102        3459       21295
Swap:          8063           0        8063
[root@dmdem kingbasev9]# df -h
Filesystem               Size  Used Avail Use% Mounted on
devtmpfs                 2.0G     0  2.0G   0% /dev
tmpfs                    2.0G     0  2.0G   0% /dev/shm
tmpfs                    2.0G  8.9M  2.0G   1% /run
tmpfs                    2.0G     0  2.0G   0% /sys/fs/cgroup
/dev/mapper/centos-root   92G   32G   60G  35% /
/dev/sda1                197M  146M   51M  75% /boot
tmpfs                    396M     0  396M   0% /run/user/2001

配置内核参数

vi /etc/sysctl.conf

fs.aio-max-nr= 1048576
fs.file-max= 6815744
kernel.shmall= 2097152
kernel.shmmax= 4294967295
kernel.shmmni= 4096
kernel.sem= 250 32000 100 128
net.ipv4.ip_local_port_range= 9000 65500
net.core.rmem_default= 262144
net.core.rmem_max= 4194304
net.core.wmem_default= 262144
net.core.wmem_max= 1048576

生效

/sbin/sysctl -p
/sbin/sysctl -a
  • 资源使用参数
vi /etc/security/limits.conf

# *表示所有用户,可只设置root和kingbase用户
* soft nofile 65536
# 注意:设置nofile的hard limit不能大于/proc/sys/fs/nr_open,否则注销后将无法正常登陆
* hard nofile 65535
* soft nproc 65536
* hard nproc 65535
# unlimited表示无限制
* soft core unlimited
* hard core unlimited
  • RemoveIPC参数

systemd-logind服务中引入的一个特性,是当一个用户退出系统后,会删除所有有关的IPC对象。该特性由/etc/systemd/logind.conf文件中的RemoveIPC参数控制。某些操作系统会默认打开,会造成程序信号丢失等问题(只有redhat7及以上和一些特殊的国产Linux的版本需要修改,改之前可先查看此项是否为默认yes)。设置RemoveIPC=no。 设置后重启服务

systemctl daemon-reload
systemctl restart systemd-logind.service

创建安装用户

[root@kingbase04 ~]# useradd -m kingbase
[root@kingbase04 ~]# id kingbase
uid=1003(kingbase) gid=1004(kingbase) groups=1004(kingbase)
[root@kingbase04 ~]# echo "es123456"|passwd --stdin kingbase
Changing password for user kingbase.
passwd: all authentication tokens updated successfully.

扩容操作

检查现有集群状态

[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | witness | * running | node1    | default  | 0        | n/a      |         | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

准备扩容所需文件

从先前部署集群的节点(或数据库安装包的/ClientTools/guitools/DeployTools/中)获取如下文件

  • db.zip

  • cluster_install.sh

  • install.conf

  • trust_cluster.sh

  • 所需的license.dat文件

找到上述文件后,上传至扩容节点

# 待扩容节点
[kingbase@kingbase04 ~]$ pwd
/home/kingbase
[kingbase@kingbase04 ~]$ ll
total 0
[kingbase@kingbase04 ~]$ mkdir soft
[kingbase@kingbase04 ~]$ cd soft
[kingbase@kingbase04 soft]$ pwd
/home/kingbase/soft

# 集群节点1通过scp 传输文件
[kingbase@ogauss1 zip]# pwd
/opt/Kingbase/ES/V9/ClientTools/guitools/DeployTools/zip
[kingbase@ogauss1 zip]# scp * 192.168.20.245:/home/kingbase/soft

[kingbase@ogauss1 ~]$ scp /home/kingbase/cluster/mykes/kcluster/kingbase/bin/license.dat 192.168.20.245:/home/kingbase/soft
license.dat  

从先前部署集群的节点/home/kingbase/cluster/mykes/kcluster/kingbase/bin获取如下文件

  • root_env_init.sh

  • root_env_check.sh

  • arping

# 集群节点1通过scp 传输文件
[kingbase@ogauss1 bin]$ pwd
/home/kingbase/cluster/mykes/kcluster/kingbase/bin

[root@ogauss1 bin]# scp root_env_init.sh root_env_check.sh arping  192.168.20.245:/home/kingbase/soft

文件准备 完成如下

[kingbase@kingbase04 ~]$ cd soft/
[kingbase@kingbase04 soft]$ ll
total 322448
-rwxr-xr-x 1 kingbase kingbase     13544 Feb 26 16:05 arping
-rwxr-xr-x 1 kingbase kingbase    252402 Feb 26 16:02 cluster_install.sh
-rw-r--r-- 1 kingbase kingbase 327258132 Feb 26 16:02 db.zip
-rw-r--r-- 1 kingbase kingbase     19379 Feb 26 16:02 install.conf
-rwxr-xr-x 1 kingbase kingbase     11380 Feb 26 16:05 root_env_check.sh
-rwxr-xr-x 1 kingbase kingbase     10680 Feb 26 16:05 root_env_init.sh
-rw-r--r-- 1 kingbase kingbase   2595145 Feb 26 16:02 securecmdd.zip
-rwxr-xr-x 1 kingbase kingbase      9677 Feb 26 16:02 trust_cluster.sh

配置install.conf文件


[kingbase@kingbase04 soft]$ pwd
/home/kingbase/soft
[kingbase@kingbase04 soft]$ vi install.conf 

[expand]
expand_type="0"                   # The node type of standby/witness node, which would be add to cluster. 0:standby  1:witness
primary_ip="192.168.20.251"                    # The ip addr of cluster primary node, which need to expand a standby/witness node.
expand_ip="192.168.20.245"                     # The ip addr of standby/witness node, which would be add to cluster.
node_id="4"                       # The node_id of standby/witness node, which would be add to cluster. It does not the same with any one in  cluster node
                                 # for example: node_id="3"
sync_type=""                     # the sync_type parameter is used to specify the sync type for expand node. 0:sync 1:potential 2:async
                                 # this parameter is only valid when expand_type="0" and the synchronous parameter of the cluster is set to custom mode.

## Specific instructions ,see it under [install]
install_dir="/home/kingbase/cluster/mykes/kcluster/kingbase/bin"                   # the last layer of directory could not add '/'
zip_package="/home/kingbase/soft/db.zip"
net_device=()                    # if virtual_ip set,it must be set
net_device_ip=()                 # if virtual_ip set,it must be set
license_file=(license.dat)
deploy_by_sshd="1"
ssh_port="22"
scmd_port="8890"

SSH免密配置

首先使用root用户执行root_env_init.sh kingbase完成环境初始化。

# 待扩容节点
[root@kingbase04 soft]# ./root_env_init.sh kingbase
[Wed Feb 26 16:29:35 CST 2025] [INFO] change UsePAM ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change UsePAM ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change ulimit ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change ulimit ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change kernel.sem ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change kernel.sem ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] no need to change "/etc/profile"
[Wed Feb 26 16:29:35 CST 2025] [INFO] stop selinux ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] stop selinux ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change RemoveIPC ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change RemoveIPC ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] change DefaultTasksAccounting ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] change DefaultTasksAccounting ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping6 ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /bin/ping6 ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /sbin/ip ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /sbin/ip ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] copy /opt/kes/bin/arping ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] copy /opt/kes/bin/arping ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /opt/kes/bin/arping ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /opt/kes/bin/arping ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /usr/bin/crontab ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] chmod /usr/bin/crontab ... Done
[Wed Feb 26 16:29:35 CST 2025] [INFO] configuration to take effect ...
[Wed Feb 26 16:29:35 CST 2025] [INFO] configuration to take effect ... Done

执行完成后,使用root用户执行root_env_check.sh kingbase,检查是否完成所有的初始化修改。

# 待扩容节点
[root@kingbase04 soft]# ./root_env_check.sh kingbase
[Wed Feb 26 16:29:51 CST 2025] [INFO] [su - kingbase -c "echo su_info_check"] su_info_check
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ulimit.open files] 655360
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ulimit.open proc] 655360
[Wed Feb 26 16:29:51 CST 2025] [INFO] [kernel.sem] 5010 641280 5010 256
[Wed Feb 26 16:29:51 CST 2025] [INFO] [RemoveIPC] no
[Wed Feb 26 16:29:51 CST 2025] [INFO] [DefaultTasksAccounting] no
[Wed Feb 26 16:29:51 CST 2025] [INFO] [crond] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [SELINUX] disabled
[Wed Feb 26 16:29:51 CST 2025] [INFO] [firewall] down
[Wed Feb 26 16:29:51 CST 2025] [INFO] [The memory] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping6 command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ping6 access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [/bin/cp --version] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ip command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [ip access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [arping command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [arping -U command] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [arping access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [crontab command path] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [crontab access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [kingbase crontab access] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [sys_securecmdd dir] OK
[Wed Feb 26 16:29:51 CST 2025] [INFO] [sys_securecmdd user dir] OK

集群部署用户(不依赖root部署的场景)执行命令"trust_cluster.sh"进行SSH免密配置

# 待扩容节点
[kingbase@kingbase04 soft]$ ./trust_cluster.sh
[ERROR] [all_ip] and [production_ip] both are empty, please check your [install.conf] file

# 调整如下参数
[kingbase@kingbase04 soft]$ vi install.conf 
all_ip=(192.168.20.251 192.168.20.252 192.168.20.253 192.168.20.245)
install_with_root=0

[kingbase@kingbase04 soft]$ ./trust_cluster.sh 
[INFO] set password-free only between kingbase
Generating public/private rsa key pair.
Your identification has been saved in /home/kingbase/.ssh/id_rsa.
Your public key has been saved in /home/kingbase/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:Q/2F0MkynV/7ix8lvf/DoHpV+RS8e77lbIJ115v3eGQ kingbase@kingbase04
The key's randomart image is:
+---[RSA 2048]----+
|          .+ o.  |
|         .o.*. o.|
|        . .o....=|
|       .   . ..*.|
|        S   . o.B|
|         .   o.oE|
|            oo+BB|
|           o...XO|
|         .o   o*%|
+----[SHA256]-----+
Warning: Permanently added '192.168.20.251' (ECDSA) to the list of known hosts.
kingbase@192.168.20.251's password: 
known_hosts                                                                                                                                                        100%  347    45.1KB/s   00:00    
id_rsa                                                                                                                                                             100% 1679   283.6KB/s   00:00    
id_rsa.pub                                                                                                                                                         100%  401   303.4KB/s   00:00    
authorized_keys                                                                                                                                                    100%  401   202.8KB/s   00:00    
Warning: Permanently added '192.168.20.252' (ECDSA) to the list of known hosts.
kingbase@192.168.20.252's password: 
known_hosts                                                                                                                                                        100%  523    85.1KB/s   00:00    
id_rsa                                                                                                                                                             100% 1679   278.9KB/s   00:00    
id_rsa.pub                                                                                                                                                         100%  401   327.1KB/s   00:00    
authorized_keys                                                                                                                                                    100%  401   333.0KB/s   00:00    
Warning: Permanently added '192.168.20.253' (ECDSA) to the list of known hosts.
Password: 
known_hosts                                                                                                                                                        100%  699     1.1MB/s   00:00    
id_rsa                                                                                                                                                             100% 1679     2.9MB/s   00:00    
id_rsa.pub                                                                                                                                                         100%  401   745.8KB/s   00:00    
authorized_keys                                                                                                                                                    100%  401   760.4KB/s   00:00    
Warning: Permanently added '192.168.20.245' (ECDSA) to the list of known hosts.
known_hosts                                                                                                                                                        100%  875     1.6MB/s   00:00    
id_rsa                                                                                                                                                             100% 1679     3.8MB/s   00:00    
id_rsa.pub                                                                                                                                                         100%  401   744.3KB/s   00:00    
authorized_keys                                                                                                                                                    100%  401   889.3KB/s   00:00    
connect to "192.168.20.251" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.252" from "192.168.20.251" by 'ssh' kingbase->kingbase:0 .... OK
connect to "192.168.20.252" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.253" from "192.168.20.252" by 'ssh' kingbase->kingbase:0 .... OK
connect to "192.168.20.253" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.245" from "192.168.20.253" by 'ssh' kingbase->kingbase:0 .... OK
connect to "192.168.20.245" from current node by 'ssh' kingbase:0..... OK
connect to "192.168.20.251" from "192.168.20.245" by 'ssh' kingbase->kingbase:0 .... OK
check ssh connection success!

集群扩容

[kingbase@kingbase04 soft]$ ./cluster_install.sh expand
[CONFIG_CHECK] will deploy the cluster of 
[RUNNING] success connect to the target "192.168.20.245" ..... OK
[RUNNING] success connect to "192.168.20.245" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.168.20.251" ..... OK
[RUNNING] success connect to "192.168.20.251" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.168.20.251 ...
[RUNNING] Primary node ip is 192.168.20.251 ... OK
[CONFIG_CHECK] set install_with_root=0
[INSTALL] load config from cluster.....
 [INFO] db_user=system
 [INFO] db_port=54321
 [INFO] use_scmd=1
 [INFO] data_directory=/home/kingbase/cluster/mykes/kcluster/kingbase/data
 [INFO] scmd_port=8890
 [INFO] recovery=standby
 [INFO] use_check_disk=off
 [INFO] trusted_servers=127.0.0.1
 [INFO] reconnect_attempts=10
 [INFO] reconnect_interval=6
 [INFO] auto_cluster_recovery_level=0
 [INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] success to access license_file: /home/kingbase/soft/license.dat
[CONFIG_CHECK] file format is correct ... OK
[CONFIG_CHECK] check database connection ... 
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] expand_ip[192.168.20.245] is not used in the cluster ...
[CONFIG_CHECK] expand_ip[192.168.20.245] is not used in the cluster ...ok
[CONFIG_CHECK] The localhost is expand_ip:[192.168.20.245] ...
[CONFIG_CHECK] The localhost is expand_ip:[192.168.20.245] ...ok
[CONFIG_CHECK] check node_id is in cluster ... 
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] check the db is running or not...
[RUNNING] the db is not running on "192.168.20.245:54321" ..... OK
[RUNNING] the install dir is not exist on "192.168.20.245" ..... OK
[RUNNING] check the sys_securecmdd is running or not...
[RUNNING] the sys_securecmdd is not running on "192.168.20.245:8890" ..... OK
 [INFO] use_ssl=0
[INSTALL] create the install dir "/home/kingbase/cluster/mykes/kcluster/kingbase" on 192.168.20.245 ...
[INSTALL] success to create the install dir "/home/kingbase/cluster/mykes/kcluster/kingbase" on "192.168.20.245" ..... OK
[INSTALL] try to copy the zip package "/home/kingbase/soft/db.zip" to /home/kingbase/cluster/mykes/kcluster/kingbase of "192.168.20.245" .....
[INSTALL] success to scp the zip package "/home/kingbase/soft/db.zip" /home/kingbase/cluster/mykes/kcluster/kingbase of to "192.168.20.245" ..... OK
[INSTALL] decompress the "/home/kingbase/cluster/mykes/kcluster/kingbase" to "/home/kingbase/cluster/mykes/kcluster/kingbase" on 192.168.20.245
[INSTALL] success to decompress the "/home/kingbase/cluster/mykes/kcluster/kingbase/db.zip" to "/home/kingbase/cluster/mykes/kcluster/kingbase" on "192.168.20.245"..... OK
[INSTALL] check license_file "license.dat"
[INSTALL] Scp license to /home/kingbase/cluster/mykes/kcluster/kingbase/../license.dat on 192.168.20.245
[INSTALL] success to copy /home/kingbase/soft/license.dat to /home/kingbase/cluster/mykes/kcluster/kingbase/../ on 192.168.20.245
[RUNNING] config sys_securecmdd and start it ...
[RUNNING] config the sys_securecmdd port to 8890 ...
[RUNNING] success to config the sys_securecmdd port on 192.168.20.245 ... OK
successfully initialized the sys_securecmdd, please use "/home/kingbase/cluster/mykes/kcluster/kingbase/bin/sys_HAscmdd.sh start" to start the sys_securecmdd
[RUNNING] success to config sys_securecmdd on 192.168.20.245 ... OK
[RUNNING] success to start sys_securecmdd on 192.168.20.245 ... OK
[INSTALL] success to access file: /home/kingbase/cluster/mykes/kcluster/kingbase/etc/all_nodes_tools.conf
[INSTALL] success to scp the /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf from 192.168.20.251 to "192.168.20.245"..... ok
[INSTALL] success to scp the ~/.encpwd from 192.168.20.251 to "192.168.20.245"..... ok
[INSTALL] success to scp /home/kingbase/cluster/mykes/kcluster/kingbase/etc/all_nodes_tools.conf from "192.168.20.251" to "192.168.20.245" ...ok
[INSTALL] success to chmod 600 the ~/.encpwd on 192.168.20.245..... ok
 [INFO] parameter_name=node_id
 [INFO] parameter_values='4'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*node_id[ ]*=/cnode_id='4'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
 [INFO] parameter_name=node_name
 [INFO] parameter_values='node4'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*node_name[ ]*=/cnode_name='node4'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
 [INFO] parameter_name=conninfo
 [INFO] parameter_values='host
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*conninfo[ ]*=/cconninfo='host=192.168.20.245 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
 [INFO] parameter_name=ping_path
 [INFO] parameter_values='/bin'
 [INFO] [parameter_name] para_exist=1
 [INFO] sed -i "/[#]*ping_path[ ]*=/cping_path='/bin'" /home/kingbase/cluster/mykes/kcluster/kingbase/etc/repmgr.conf
[RUNNING] standby clone ...
[WARNING] following problems with command line parameters detected:
  -D/--sysdata will be ignored if a repmgr configuration file is provided
[NOTICE] destination directory "/home/kingbase/cluster/mykes/kcluster/kingbase/data" provided
[INFO] connecting to source node
[DETAIL] connection string is: host=192.168.20.251 user=esrep port=54321 dbname=esrep
[DETAIL] current installation size is 88 MB
[NOTICE] checking for available walsenders on the source node (2 required)
[NOTICE] checking replication connections can be made to the source server (2 required)
[INFO] checking and correcting permissions on existing directory "/home/kingbase/cluster/mykes/kcluster/kingbase/data"
[INFO] creating replication slot as user "esrep"
[NOTICE] starting backup (using sys_basebackup)...
[INFO] executing:
  /home/kingbase/cluster/mykes/kcluster/kingbase/bin/sys_basebackup -l "repmgr base backup"  -D /home/kingbase/cluster/mykes/kcluster/kingbase/data -h 192.168.20.251 -p 54321 -U esrep -c fast -X stream -S repmgr_slot_4 
[NOTICE] standby clone (using sys_basebackup) complete
[NOTICE] you can now start your Kingbase server
[HINT] for example: sys_ctl -D /home/kingbase/cluster/mykes/kcluster/kingbase/data start
[HINT] after starting the server, you need to register this standby with "repmgr standby register"
[RUNNING] standby clone ...OK
[RUNNING] db start ...
waiting for server to start.... done
server started
[RUNNING] db start ...OK
[INFO] connecting to local node "node4" (ID: 4)
[INFO] connecting to primary database
[WARNING] --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
[INFO] standby registration complete
[NOTICE] standby node "node4" (ID: 4) successfully registered
2025-02-26 19:31:50 begin to start DB on "[localhost]".
2025-02-26 19:31:51 DB on "[localhost]" already started, connect to check it.
2025-02-26 19:31:52 DB on "[localhost]" start success.
2025-02-26 19:31:52 Ready to start local kbha daemon and repmgrd daemon ...
2025-02-26 19:31:52 begin to start repmgrd on "[localhost]".
[2025-02-26 19:31:53] [NOTICE] using provided configuration file "/home/kingbase/cluster/mykes/kcluster/kingbase/bin/../etc/repmgr.conf"
[2025-02-26 19:31:53] [NOTICE] redirecting logging output to "/home/kingbase/cluster/mykes/kcluster/kingbase/log/hamgr.log"

2025-02-26 19:31:55 repmgrd on "[localhost]" start success.
[2025-02-26 19:31:58] [NOTICE] redirecting logging output to "/home/kingbase/cluster/mykes/kcluster/kingbase/log/kbha.log"

2025-02-26 19:31:59 Done.
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | witness | * running | node1    | default  | 0        | n/a      |         | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 4  | node4 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.245 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] query archive command at 192.168.20.251 ...
[RUNNING] current cluster not config sys_rman,return.
You have new mail in /var/spool/mail/kingbase

注:可能机器性能问题,调整了cluster_install.sh中关于ssh连接的参数,


ConnectTimeout=30

ServerAliveInterval=5

ServerAliveCountMax=5

ssh -q -o Batchmode=yes -o ConnectTimeout=30 -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -p 22 -o ServerAliveInterval=5 -o ServerAliveCountMax=5

扩容后检查

[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 19932 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 10930 | no      | 1 second(s) ago    
 3  | node3 | witness | * running | node1    | running | 1905  | no      | 1 second(s) ago    
 4  | node4 | standby |   running | node1    | running | 51445 | no      | 0 second(s) ago  

缩容操作

检查现有集群状态

[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 19932 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 10930 | no      | 1 second(s) ago    
 3  | node3 | witness | * running | node1    | running | 1905  | no      | 1 second(s) ago    
 4  | node4 | standby |   running | node1    | running | 51445 | no      | 0 second(s) ago  

准备缩容所需文件

[kingbase@kingbase04 soft]$ ll
total 322452
-rwxr-xr-x 1 kingbase kingbase     13544 Feb 26 16:05 arping
-rwxr-xr-x 1 kingbase kingbase    252324 Feb 26 19:21 cluster_install.sh  <=========
-rw-r--r-- 1 kingbase kingbase 327258132 Feb 26 16:02 db.zip
-rw-r--r-- 1 kingbase kingbase     19598 Feb 26 19:05 install.conf  <=========
-rwxr-xr-x 1 kingbase kingbase      3827 Feb 26 17:35 license.dat
-rwxr-xr-x 1 kingbase kingbase     11380 Feb 26 16:05 root_env_check.sh
-rwxr-xr-x 1 kingbase kingbase     10680 Feb 26 16:05 root_env_init.sh
-rw-r--r-- 1 kingbase kingbase   2595145 Feb 26 17:24 securecmdd.zip
-rwxr-xr-x 1 kingbase kingbase      9677 Feb 26 16:02 trust_cluster.sh

配置install.conf文件

[kingbase@kingbase04 soft]$ vi install.conf 

[shrink]
shrink_type="0"                   # The node type of standby/witness node, which would be delete from cluster. 0:standby  1:witness
primary_ip="192.168.20.251"                    # The ip addr of cluster primary node, which need to shrink a standby/witness node.
shrink_ip="192.168.20.245"                     # The ip addr of standby/witness node, which would be delete from cluster.
node_id="4"                       # The node_id of standby/witness node, which would be delete from cluster. It does not the same with any one in  cluster node
                                 # for example: node_id="3"
## Specific instructions ,see it under [install]
install_dir="/home/kingbase/cluster/mykes/kcluster"                   # the last layer of directory could not add '/'
ssh_port="22"                    # the port of ssh, default is 22
scmd_port="8890"                 # the port of sys_securecmd, default is 8890

集群缩容

[kingbase@kingbase04 soft]$ ./cluster_install.sh shrink
[CONFIG_CHECK] will deploy the cluster of 
[RUNNING] success connect to the target "192.168.20.245" ..... OK
[RUNNING] success connect to "192.168.20.245" from current node by 'ssh' ..... OK
[RUNNING] success connect to the target "192.168.20.251" ..... OK
[RUNNING] success connect to "192.168.20.251" from current node by 'ssh' ..... OK
[RUNNING] Primary node ip is 192.168.20.251 ...
[RUNNING] Primary node ip is 192.168.20.251 ... OK
[CONFIG_CHECK] set install_with_root=0
[INSTALL] load config from cluster.....
 [INFO] db_user=system
 [INFO] db_port=54321
 [INFO] use_scmd=1
 [INFO] auto_cluster_recovery_level=0
 [INFO] synchronous=quorum
[INSTALL] load config from cluster.....OK
[CONFIG_CHECK] check database connection ... 
[CONFIG_CHECK] check database connection ... OK
[CONFIG_CHECK] shrink_ip[192.168.20.245] is a standby node IP in the cluster ...
[CONFIG_CHECK] shrink_ip[192.168.20.245] is a standby node IP in the cluster ...ok 
[CONFIG_CHECK] The localhost is shrink_ip:[192.168.20.245] or primary_ip:[192.168.20.251]...
[CONFIG_CHECK] The localhost is shrink_ip:[192.168.20.245] or primary_ip:[192.168.20.251]...ok
[RUNNING] Primary node ip is 192.168.20.251 ...
[RUNNING] Primary node ip is 192.168.20.251 ... OK
[CONFIG_CHECK] check node_id is in cluster ... 
[CONFIG_CHECK] check node_id is in cluster ...OK
[RUNNING] The /home/kingbase/cluster/mykes/kcluster/kingbase/bin dir exist on "192.168.20.245" ... 
[RUNNING] The /home/kingbase/cluster/mykes/kcluster/kingbase/bin dir exist on "192.168.20.245" ... OK
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | witness | * running | node1    | default  | 0        | n/a      |         | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 4  | node4 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.245 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[RUNNING] Del node is standby ...
[INFO] node:192.168.20.245 can be deleted ... OK
[RUNNING] query archive command at 192.168.20.251 ...
[RUNNING] current cluster not config sys_rman,return.
[Wed Feb 26 19:39:19 CST 2025] [INFO] /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr standby unregister --node-id=4 ...
[INFO] connecting to local standby
[INFO] connecting to primary database
[NOTICE] unregistering node 4
[INFO] SET synchronous TO "quorum" on primary host 
[INFO] change synchronous_standby_names from "ANY 1( node2,node4)" to "ANY 1( node2)"
[INFO] try to drop slot "repmgr_slot_4" of node 4 on primary node
[WARNING] replication slot "repmgr_slot_4" is still active on node 4
[INFO] standby unregistration complete
[Wed Feb 26 19:39:20 CST 2025] [INFO] /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr standby unregister --node-id=4 ...OK
[Wed Feb 26 19:39:20 CST 2025] [INFO] check db connection ...
[Wed Feb 26 19:39:20 CST 2025] [INFO] check db connection ...ok
2025-02-26 19:39:25 Ready to stop local kbha daemon and repmgrd daemon ...
2025-02-26 19:39:31 begin to stop repmgrd on "[localhost]".
2025-02-26 19:39:32 repmgrd on "[localhost]" stop success.
2025-02-26 19:39:32 Done.
2025-02-26 19:39:32 begin to stop DB on "[localhost]".
waiting for server to shut down.... done
server stopped
2025-02-26 19:39:33 DB on "[localhost]" stop success.
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | witness | * running | node1    | default  | 0        | n/a      |         | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
[Wed Feb 26 19:39:33 CST 2025] [INFO] drop replication slot:repmgr_slot_4...
 pg_drop_replication_slot 
--------------------------
 
(1 row)

[Wed Feb 26 19:39:34 CST 2025] [INFO] drop replication slot:repmgr_slot_4...OK
[Wed Feb 26 19:39:34 CST 2025] [INFO] modify synchronous parameter configuration...
[Wed Feb 26 19:39:35 CST 2025] [INFO] modify synchronous parameter configuration...ok
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | LSN_Lag | Connection string                                                                                                                                                       
----+-------+---------+-----------+----------+----------+----------+----------+---------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        |         | host=192.168.20.251 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | 0 bytes | host=192.168.20.252 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000
 3  | node3 | witness | * running | node1    | default  | 0        | n/a      |         | host=192.168.20.253 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 tcp_user_timeout=9000

检查现有集群状态

[kingbase@ogauss1 ~]$ /home/kingbase/cluster/mykes/kcluster/kingbase/bin/repmgr service status
 ID | Name  | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
 1  | node1 | primary | * running |          | running | 19932 | no      | n/a                
 2  | node2 | standby |   running | node1    | running | 10930 | no      | 1 second(s) ago    
 3  | node3 | witness | * running | node1    | running | 1905  | no      | 1 second(s) ago    

总结

不管是通过命令行或者图形化进行扩缩容,目的都是一样,命令行整体操作不算复杂,关键步骤就是配置文件的调整,另外利用命令行可以实现自动化,对接其他平台实现全链路。各个的好处,可以根据实际情况自行挑选合适的。

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论