kingbaseES R6 集群修改物理IP和VIP案例(V2.0)

原创 jack 2021-11-16

814

案例测试环境：

操作系统： [kingbase@node1 bin]$ cat /etc/centos-release CentOS Linux release 7.2.1511 (Core) 数据库： [kingbase@node1 bin]$ ./ksql -U system test ksql (V8.0) Type "help" for help. test=# select version(); version ----------------------------------------------------------------------------------------------------------- KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit (1 row)

操作步骤总结：

1、查看和确定主备库后，关闭集群（cluster和db）服务。 2、修改系统ip及/etc/hosts文件中ip。 3、修改集群配置文件repmgr.conf中的物理ip和vip信息。 4、重启系统网络服务应用新的物理ip。 5、启动主备库数据库服务。 6、注册主库到集群。 7、关闭备库数据库服务，注册备库到集群并将备库节点重新加入到集群。 8、查看集群服务状态（cluster和db）并启动主备库repmgrd服务。 9、重启集群（sys_monitor.sh）服务验证。

集群IP信息：

序号	主机名	集群节点名	ip地址	VIP	网关地址	角色	备注
1	node1	node248	192.168.7.238	192.168.7.237	192.168.7.1	Standby
2	node2	node249	192.168.7.239	192.168.7.237	192.168.7.1	Primary
修改后的IP信息
1	node1	node248	192.168.7.248	192.168.7.240	192.168.7.1	Standby
2	node2	node249	192.168.7.249	192.168.7.240	192.168.7.1	Primary

一、查看集群主备库状态信息

1、节点状态信息

2、repmgr.conf配置文件信息

[kingbase@node1 etc]$ cat repmgr.conf on_bmj=off node_id=1 node_name='node248' promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf' follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n' conninfo='host=192.168.7.238 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3' log_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log' data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data' sys_bindir='/home/kingbase/cluster/R6HA/KHA/kingbase/bin' ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22' reconnect_attempts=3 reconnect_interval=5 failover='automatic' recovery='manual' monitoring_history='no' trusted_servers='192.168.7.1' virtual_ip='192.168.7.237/24' net_device='enp0s3' ipaddr_path='/sbin' arping_path='/sbin' synchronous='quorum' repmgrd_pid_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgrd.pid' ping_path='/usr/bin'

3、系统 ip 信息

主库：

[kingbase@node2 ~]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.7.238 node1 192.168.7.239 node2 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:48:34:53 brd ff:ff:ff:ff:ff:ff inet 192.168.7.239/24 brd 192.168.7.255 scope global enp0s3 valid_lft forever preferred_lft forever inet 192.168.7.237/24 scope global secondary enp0s3:3 valid_lft forever preferred_lft forever

备库：

[kingbase@node1 bin]$ cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.7.238 node1 192.168.7.239 node2 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff inet 192.168.7.238/24 brd 192.168.7.255 scope global enp0s3

二、修改系统IP并应用新的物理IP

1、关闭集群（cluster和db）

[kingbase@node2 bin]$ ./sys_monitor.sh stop 2021-03-01 12:19:00 Ready to stop all DB ... Service process "node_export" was killed at process 4629 Service process "postgres_ex" was killed at process 4630 Service process "node_export" was killed at process 9260 Service process "postgres_ex" was killed at process 9261 2021-03-01 12:19:07 begin to stop repmgrd on "[192.168.7.238]". 2021-03-01 12:19:08 repmgrd on "[192.168.7.238]" stop success. 2021-03-01 12:19:08 begin to stop repmgrd on "[192.168.7.239]". 2021-03-01 12:19:09 repmgrd on "[192.168.7.239]" stop success. 2021-03-01 12:19:09 begin to stop DB on "[192.168.7.238]". waiting for server to shut down.... done server stopped 2021-03-01 12:19:10 DB on "[192.168.7.238]" stop success. 2021-03-01 12:19:10 begin to stop DB on "[192.168.7.239]". waiting for server to shut down.... done server stopped 2021-03-01 12:19:11 DB on "[192.168.7.239]" stop success. 2021-03-01 12:19:11 Done.

2、修改系统IP

主备库修改： [root@node2 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.7.248 node1 192.168.7.249 node2 主库： 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:48:34:53 brd ff:ff:ff:ff:ff:ff inet 192.168.7.249/24 brd 192.168.7.255 scope global enp0s3 valid_lft forever preferred_lft forever 备库： 2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 08:00:27:73:47:f6 brd ff:ff:ff:ff:ff:ff inet 192.168.7.248/24 brd 192.168.7.255 scope global enp0s3 valid_lft forever preferred_lft forever

三、修改repmgr.conf配置文件

[kingbase@node1 etc]$ cat repmgr.conf on_bmj=off node_id=1 node_name='node248' promote_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby promote -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf' follow_command='/home/kingbase/cluster/R6HA/KHA/kingbase/bin/repmgr standby follow -f /home/kingbase/cluster/R6HA/KHA/kingbase/etc/repmgr.conf -W --upstream-node-id=%n' conninfo='host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3' log_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log' data_directory='/home/kingbase/cluster/R6HA/KHA/kingbase/data' sys_bindir='/home/kingbase/cluster/R6HA/KHA/kingbase/bin' ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22' reconnect_attempts=3 reconnect_interval=5 failover='automatic' recovery='manual' monitoring_history='no' trusted_servers='192.168.7.1' virtual_ip='192.168.7.240/24' net_device='enp0s3' ipaddr_path='/sbin' arping_path='/sbin' synchronous='quorum' repmgrd_pid_file='/home/kingbase/cluster/R6HA/KHA/kingbase/hamgrd.pid' ping_path='/usr/bin'

四、启动主备库数据库服务

[kingbase@node2 bin]$ ./sys_ctl start -D ../data waiting for server to start....2021-03-01 12:06:25.776 CST [4661] LOG: sepapower extension initialized 2021-03-01 12:06:25.818 CST [4661] LOG: starting KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit 2021-03-01 12:06:25.818 CST [4661] LOG: listening on IPv4 address "0.0.0.0", port 54321 2021-03-01 12:06:25.818 CST [4661] LOG: listening on IPv6 address "::", port 54321 2021-03-01 12:06:25.905 CST [4661] LOG: listening on Unix socket "/tmp/.s.KINGBASE.54321" .2021-03-01 12:06:26.425 CST [4661] LOG: redirecting log output to logging collector process 2021-03-01 12:06:26.425 CST [4661] HINT: Future log output will appear in directory "sys_log". done server started

五、注册主库到集群

1、注册primary到集群

[kingbase@node2 bin]$ ./repmgr primary register -F INFO: connecting to primary database... INFO: "repmgr" extension is already installed NOTICE: PING 192.168.7.240 (192.168.7.240) 56(84) bytes of data. --- 192.168.7.240 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1000ms WARNING: ping host"192.168.7.240" failed DETAIL: average RTT value is not greater than zero NOTICE: node (ID: 2) acquire the virtual ip 192.168.7.240/24 success NOTICE: primary node record (ID: 2) updated

2、查看集群节点状态

六、注册备库到集群

1、直接注册备库到集群会出现无法连接的故障（需要关闭备库数据库服务）

[kingbase@node1 bin]$ ./repmgr standby register -h 192.168.7.249 -d esrep -U esrep -W -F WARNING: following problems with command line parameters detected: --no-wait will be ignored when executing STANDBY REGISTER INFO: connecting to local node "node248" (ID: 1) WARNING: database connection parameters not required when the standby to be registered is running DETAIL: repmgr uses the "conninfo" parameter in "repmgr.conf" to connect to the standby INFO: connecting to primary database ERROR: connection to database failed DETAIL: could not connect to server: No route to host Is the server running on host "192.168.7.239" and accepting TCP/IP connections on port 54321? DETAIL: attempted to connect using: user=esrep connect_timeout=10 dbname=esrep host=192.168.7.239 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr ERROR: connection to database failed DETAIL: could not connect to server: No route to host Is the server running on host "192.168.7.238" and accepting TCP/IP connections on port 54321? DETAIL: attempted to connect using: user=esrep connect_timeout=10 dbname=esrep host=192.168.7.238 port=54321 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 fallback_application_name=repmgr ERROR: unable to connect to the primary database HINT: a primary node must be configured before registering a standby node

2、关闭备库数据库服务后将备库节点重新加入到集群

1）关闭数据库服务

[kingbase@node1 bin]$ ./sys_ctl stop -D ../data waiting for server to shut down.... done server stopped

2）注册standby到集群

[kingbase@node1 bin]$ ./repmgr standby register -h 192.168.7.249 -U esrep -d esrep -F INFO: connecting to local node "node248" (ID: 1) INFO: connecting to primary database INFO: standby registration complete NOTICE: standby node "node248" (ID: 1) successfully registered

3）将备库节点重新加入到集群

[kingbase@node1 bin]$ ./repmgr node rejoin -h 192.168.7.249 -U esrep -d esrep INFO: timelines are same, this server is not ahead DETAIL: local node lsn is 1/EEFF5AB0, rejoin target lsn is 1/EEFF7E78 NOTICE: setting node 1's upstream to node 2 WARNING: unable to ping "host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3" DETAIL: PQping() returned "PQPING_NO_RESPONSE" NOTICE: begin to start server at 2021-03-01 12:18:21.290629 NOTICE: starting server using "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_ctl -w -t 90 -D '/home/kingbase/cluster/R6HA/KHA/kingbase/data' -l /home/kingbase/cluster/R6HA/KHA/kingbase/bin/logfile start" NOTICE: start server finish at 2021-03-01 12:18:21.906449 NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2 [kingbase@node1 bin]$ ./repmgr cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+ 1 | node248 | standby | running | node249 | default | 100 | 4 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 2 | node249 | primary | * running | | default | 100 | 4 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

3、主库查看集群状态和主备流复制状态

1）查看集群节点状态

2）查看主备流复制状态

4、启动主备库repmgrd服务

[kingbase@node2 bin]$ ./repmgrd -d [2021-03-01 12:32:27] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"

七、重启集群服务验证

1、通过sys_monitor.sh启动集群

[kingbase@node2 bin]$ ./sys_monitor.sh restart 2021-03-01 12:20:29 Ready to stop all DB ... There is no service "node_export" running currently. There is no service "postgres_ex" running currently. There is no service "node_export" running currently. There is no service "postgres_ex" running currently. 2021-03-01 12:20:34 begin to stop repmgrd on "[192.168.7.248]". 2021-03-01 12:20:35 repmgrd on "[192.168.7.248]" already stopped. 2021-03-01 12:20:35 begin to stop repmgrd on "[192.168.7.249]". 2021-03-01 12:20:36 repmgrd on "[192.168.7.249]" already stopped. 2021-03-01 12:20:36 begin to stop DB on "[192.168.7.248]". waiting for server to shut down.... done server stopped 2021-03-01 12:20:37 DB on "[192.168.7.248]" stop success. 2021-03-01 12:20:37 begin to stop DB on "[192.168.7.249]". waiting for server to shut down..... done server stopped 2021-03-01 12:20:39 DB on "[192.168.7.249]" stop success. 2021-03-01 12:20:39 Done. 2021-03-01 12:20:39 Ready to start all DB ... 2021-03-01 12:20:39 begin to start DB on "[192.168.7.249]". waiting for server to start.... done server started 2021-03-01 12:20:41 execute to start DB on "[192.168.7.249]" success, connect to check it. 2021-03-01 12:20:42 DB on "[192.168.7.249]" start success. 2021-03-01 12:20:42 Try to ping trusted_servers on host 192.168.7.248 ... 2021-03-01 12:20:45 Try to ping trusted_servers on host 192.168.7.249 ... 2021-03-01 12:20:47 begin to start DB on "[192.168.7.248]". waiting for server to start.... done server started 2021-03-01 12:20:49 execute to start DB on "[192.168.7.248]" success, connect to check it. 2021-03-01 12:20:50 DB on "[192.168.7.248]" start success. ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+---------+---------+-----------+----------+----------+----------+----------+ 1 | node248 | standby | running | node249 | default | 100 | 4 | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 2 | node249 | primary | * running | | default | 100 | 4 | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 2021-03-01 12:20:50 The primary DB is started. 2021-03-01 12:20:55 Success to load virtual ip [192.168.7.240/24] on primary host [192.168.7.249]. 2021-03-01 12:20:55 Try to ping vip on host 192.168.7.248 ... 2021-03-01 12:20:57 Try to ping vip on host 192.168.7.249 ... 2021-03-01 12:21:00 begin to start repmgrd on "[192.168.7.248]". [2021-03-01 12:21:42] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf" [2021-03-01 12:21:42] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log" 2021-03-01 12:21:01 repmgrd on "[192.168.7.248]" start success. 2021-03-01 12:21:01 begin to start repmgrd on "[192.168.7.249]". [2021-03-01 12:21:02] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf" [2021-03-01 12:21:02] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log" 2021-03-01 12:21:03 repmgrd on "[192.168.7.249]" start success. ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen ----+---------+---------+-----------+----------+---------+------+---------+-------------------- 1 | node248 | standby | running | node249 | running | 5135 | no | 1 second(s) ago 2 | node249 | primary | * running | | running | 6723 | no | n/a 2021-03-01 12:21:10 Done.

2、查看集群节点状态

3、查看主备流复制状态

八、总结

对于集群IP的修改需要停止集群服务（cluster和db），将影响业务的正常运行，所以在集群部署前需要做好IP的规划，避免在后期修改给业务正常运行带来影响。

kingbase

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

kingbaseES R6 集群修改物理IP和VIP案例(V2.0)

评论