在CN上查询数据节点
lightdb@test=# select nodeid,nodename,nodeport,isactive from pg_dist_node;
nodeid | nodename | nodeport | isactive
--------+-------------+----------+----------
5 | 10.20.30.11 | 5432 | t
6 | 10.20.30.12 | 5432 | t
(2 rows)
搭建CN高可用, 支持failover
在primary机器上操作
本实例中, 在机器10上按如下步骤操作:
1、 lt_ctl stop , 停CN实例,修改$LTDATA/lightdb.conf,在shared_preload_libraries后面加上
ltcluster,如:
shared_preload_libraries='canopy,ltcluster,lt_stat_statements,lt_stat_activity,l t_prewarm,lt_cron,ltaudit,lt_hint_plan'
2、 lt_ctl start ,重新启动CN实例,并用如下命令添加高可用组件相关信息
ltsql -p 5432 -h localhost -dpostgres -c"create extension ltcluster;" ltsql -p 5432 -h localhost -dpostgres -c"create role ltcluster superuser password 'ltcluster' login;" ltsql -p 5432 -h localhost -dpostgres -c"create database ltcluster owner ltcluster;"
3、 添加用户认证信息,以便standby有权限从primary复制数据; echo后使用 lt_ctl reload 重新加
载配置
echo "host replication ltcluster 10.20.30.0/24 trust " >> $LTDATA/lt_hba.conf lt_ctl reload
4、执行下面sh脚本,生成高可用配置文件ltcluster.conf
id=10 NODE_NAME=cn10 ip=10.20.30.10 port=5432 ltclusterconf=$LTHOME/etc/ltcluster/ltcluster.conf echo " node_id=$id node_name='$NODE_NAME' conninfo='host=$ip port=$port user=ltcluster dbname=ltcluster connect_timeout=2' data_directory='$LTDATA' pg_bindir='$LTHOME/bin' failover='automatic' promote_command='$LTHOME/bin/ltcluster standby promote -f $ltclusterconf' follow_command='$LTHOME/bin/ltcluster standby follow -f $ltclusterconf -- upstream-node-id=%n' restore_command='cp $LTHOME/archive/%f %p' monitoring_history=true #(Enable monitoring parameters) monitor_interval_secs=2 #(Define monitoring data interval write time parameter) connection_check_type='ping' reconnect_attempts=3 #(before failover,Number of attempts to reconnect to primary before failover(default 6)) reconnect_interval=5 standby_disconnect_on_failover =true log_level=INFO log_facility=STDERR log_file='$LTHOME/etc/ltcluster/ltcluster.log' failover_validation_command='$LTHOME/etc/ltcluster/ltcluster_failover.sh "$LTHOME" "$LTDATA"' shutdown_check_timeout=1800 use_replication_slots=true check_lightdb_command='$LTHOME/etc/ltcluster/check_lightdb.sh' check_lightdb_interval=10 " > $ltclusterconf
5、使用如下命令注册CN主节点(primary),并检查状态
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf primary register -F INFO: connecting to primary database... NOTICE: attempting to install extension "ltcluster" NOTICE: "ltcluster" extension successfully installed NOTICE: primary node record (ID: 10) registered ltclusterd -d -f $LTHOME/etc/ltcluster/ltcluster.conf -p -f $LTHOME/etc/ltcluster/ltclusterd.pid [2023-02-20 02:03:41] [NOTICE] redirecting logging output to "/data/base/lightdb-x/13.8-22.4/etc/ltcluster/ltcluster2023-02-20_020341.log" ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+------+---------+-----------+----------+----------+----------+----------+------------------------------------------------------------------------------ 10 | cn10 | primary | * running | | default | 100 | 1 | host=10.20.30.10 port=5432 user=ltcluster dbname=ltcluster connect_timeout=2 ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status ID | Name | Role | Status | Upstream | ltclusterd | PID | Paused? | Upstream last seen ----+------+---------+-----------+----------+-------------+-----+---------+-------------------- 10 | cn10 | primary | * running | | not running | n/a | n/a | n/a
在12上操作(CN standby)
机器12作为CN standby,进行如下操作:
- 把上一节生成ltcluster.conf的sh脚本修改如下, 并执行生成ltcluster.conf
# 修改ip、节点名等为12
id=12
NODE_NAME=cn12
ip=10.20.30.12
port=5433
ltclusterconf=$LTHOME/etc/ltcluster/ltcluster.conf
echo "
node_id=$id
node_name='$NODE_NAME'
conninfo='host=$ip port=$port user=ltcluster dbname=ltcluster connect_timeout=2'
data_directory='$LTDATA'
pg_bindir='$LTHOME/bin'
failover='automatic'
promote_command='$LTHOME/bin/ltcluster standby promote -f $ltclusterconf'
follow_command='$LTHOME/bin/ltcluster standby follow -f $ltclusterconf -- upstream-node-id=%n'
restore_command='cp $LTHOME/archive/%f %p'
monitoring_history=true #(Enable monitoring parameters)
monitor_interval_secs=2 #(Define monitoring data interval write time parameter)
connection_check_type='ping'
reconnect_attempts=3 #(before failover,Number of attempts to reconnect to primary before failover(default 6)) reconnect_interval=5
standby_disconnect_on_failover =true
log_level=INFO
log_facility=STDERR
log_file='$LTHOME/etc/ltcluster/ltcluster.log'
failover_validation_command='$LTHOME/etc/ltcluster/ltcluster_failover.sh "$LTHOME" "$LTDATA"'
shutdown_check_timeout=1800
use_replication_slots=true
check_lightdb_command='$LTHOME/etc/ltcluster/check_lightdb.sh'
check_lightdb_interval=10 " > $ltclusterconf
2.克隆CN primary,其中-h参数为primary IP。视数据量大小, 这可能需要几分钟到几个小时
如果是,需要先停掉备库,如果$LTDATA下有数据需要执行-F
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf standby clone -h 10.20.30.10 -p 5432 -U ltcluster
3.完成克隆后,启动数据库,注册standby,并检查状态
lt_ctl start ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf standby register -F ltclusterd -d -f $LTHOME/etc/ltcluster/ltcluster.conf -p -f $LTHOME/etc/ltcluster/ltclusterd.pid ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf cluster show ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status
示例如下,可以看到集群监控进程(ltclusterd)正在运行,集群中有一主一备,备节点的上游
(upstream)为cn10。
$ ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status
ID | Name | Role | Status | Upstream | ltclusterd | PID | Paused? | Upstream last seen
----+------+---------+-----------+----------+------------+-------+---------+--------------------
10 | cn10 | primary | * running | | running | 27844 | no | n/a
12 | cn12 | standby | running | cn10 | running | 28079 | no | 1 second(s) ago
验证CN standby支持DML
在CN主节点10上执行sql: ltsql -p 5432
create table the_table(id int, code text, price numeric(8,2));
select create_distributed_table('the_table', 'id');
insert into the_table values (1, '1', 3.439);
insert into the_table values (2, '2', 6.86);
select * from the_table;
在CN备节点10上执行sql: ltsql -p 5432
select * from the_table;
delete from the_table where id = 1; -- 失败
ERROR: writing to worker nodes is not currently allowed
DETAIL: the database is read-only
SET canopy.writable_standby_coordinator TO on; -- 设置standby支持DML, 下面的DML可成功执行,备库执行
lightdb@test=# SET canopy.writable_standby_coordinator TO on;
SET
lightdb@test=# delete from the_table where id = 1;
DELETE 1
select * from the_table;
insert into the_table values (3, '3', 6.86);
把canopy.writable_standby_coordinator = on 添加到两个CN节点的lightdb.conf,并执行 lt_ctl reload ,可永久有效。
部署LVS负载均衡
采用LVS DR模式做负载均衡。
首先安装ipvsadm: yum install ipvsadm , 或使用光盘中rpm包安装。
Director脚本:修改脚本前面的VIP,RIP1,RIP2,ethx(网卡,使用ifconfig查看),port变量。
#!/bin/sh
#
# Startup script handle the initialisation of LVS
# chkconfig: - 28 72 # description: Initialise the Linux Virtual Server for DR
#### BEGIN INIT INFO
# Provides: ipvsadm
# Required-Start: $local_fs $network $named
# Required-Stop: $local_fs $remote_fs $network
# Short-Description: Initialise the Linux Virtual Server
# Description: The Linux Virtual Server is a highly scalable and highly
# available server built on a cluster of real servers, with the load
# balancer running on Linux.
# description: start LVS of DR
LOCK=/var/lock/ipvsadm.lock
VIP=10.20.30.9
RIP1=10.20.30.10 # CN IP
RIP2=10.20.30.12 # CN IP
ethx=eth0
port=5432 # CN port
. /etc/rc.d/init.d/functions
start() {
PID=`ipvsadm -Ln | grep ${VIP} | wc -l`
if [ $PID -gt 0 ];
then
echo "The LVS-DR Server is already running !"
else #Set the Virtual IP Address
/sbin/ifconfig $ethx:1 $VIP broadcast $VIP netmask 255.255.255.255 up
/sbin/route add -host $VIP dev $ethx:1
#Clear IPVS Table
/sbin/ipvsadm -C
#Set Lvs
#echo $VIP:$port
#echo $RIP1:$port
#echo $RIP2:$port
#echo $RIP3:$port
/sbin/ipvsadm -At $VIP:$port -s rr
/sbin/ipvsadm -at $VIP:$port -r $RIP1:$port -g -w 1
/sbin/ipvsadm -at $VIP:$port -r $RIP2:$port -g -w 1
#/sbin/ipvsadm -at $VIP:$port -r $RIP3:$port -g -w 1 /bin/touch $LOCK
#Run Lvs
echo "starting LVS-DR Server is ok !"
fi
}
stop() {
#clear Lvs and vip
/sbin/ipvsadm -C
/sbin/route del -host $VIP dev $ethx:1
/sbin/ifconfig $ethx:1 down >/dev/null
rm -rf $LOCK
echo "stopping LVS-DR server is ok !"
}
status() {
if [ -e $LOCK ];
then
echo "The LVS-DR Server is already running !"
else
echo "The LVS-DR Server is not running !"
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
status
;;
*)
echo "Usage: $1 {start|stop|restart|status}"
exit 1
esac
exit 0
RealServer脚本: 修改脚本前面的VIP,ethx(网卡,使用ifconfig查看)变量。
#!/bin/sh
## Startup script handle the initialisation of LVS
# chkconfig: - 28 72
# description: Initialise the Linux Virtual Server for DR
#### BEGIN INIT INFO
# Provides: ipvsadm
# Required-Start: $local_fs $network $named
# Required-Stop: $local_fs $remote_fs $network
# Short-Description: Initialise the Linux Virtual Server
# Description: The Linux Virtual Server is a highly scalable and highly
# available server built on a cluster of real servers, with the load
# balancer running on Linux.
# description: start LVS of DR-RIP
LOCK=/var/lock/ipvsadm.lock
VIP=10.20.30.9
ethx=eth0
. /etc/rc.d/init.d/functions
start() {
PID=`ifconfig | grep lo:0 | wc -l`
if [ $PID -ne 0 ];
then
echo "The LVS-DR-RIP Server is already running !"
else
/sbin/ifconfig lo:0 $VIP netmask 255.255.255.255 broadcast $VIP up
/sbin/route add -host $VIP dev lo:0
echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" >/proc/sys/net/ipv4/conf/$ethx/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/$ethx/arp_announce
echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce
/bin/touch $LOCK
echo "starting LVS-DR-RIP server is ok !"
fi
}
stop() {
/sbin/route del -host $VIP dev lo:0 /sbin/ifconfig lo:0 down >/dev/null
echo "0" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "0" >/proc/sys/net/ipv4/conf/$ethx/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/$ethx/arp_announce
echo "0" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "0" >/proc/sys/net/ipv4/conf/all/arp_announce
rm -rf $LOCK
echo "stopping
LVS-DR-RIP server is ok !"
}
status() {
if [ -e $LOCK ];
then
echo "The LVS-DR-RIP Server is already running !"
else
echo "The LVS-DR-RIP Server is not running !"
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
status
;;
*)
echo "Usage: $1 {start|stop|restart|status}"
exit 1
esac
exit 0
机器有限,10是cn primary(即RealServer),也是LVS director;12是cn standby
(RealServer)。把上述Director,RealServer脚本上传至10 /etc/init.d 目录,把RealServer脚本
上传至12 /etc/init.d 目录,并加上可执行权限 chmod +x ,并启动服务:
# 10
./lvs-dr start
./lvs-rs start
# 12
./lvs-rs start
可使用 ip a 看到虚拟地址是否已经加到对应的网卡上。
开多个客户端(比如ltsql)连接到VIP,在Director上使用命令 ipvsadm -Ln --stats 查看负载情况。
# ipvsadm -Ln --stats
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes OutBytes
-> RemoteAddress:Port
TCP 10.19.70.166:15858 5 16 15 2918 5763
-> 10.20.30.10:15858 3 7 6 1320 2461
-> 10.20.30.12:15858 2 9 9 1598 3302




