1.演示环境说明
1.1 主机名和ip地址
| 主机名 | ip地址 | 虚拟IP |
| server1 | 10.8.0.51 | 10.8.0.54 |
| server2 | 10.8.0.52 | |
| server3 | 10.8.0.53 |
1.2 数据库版本及配置说明
| 禁止 |
1.3 Pgpool-II 版本和配置
2.集群部署前的基础环境准备
[all servers]# -------表示所有服务器用root用户操作
[all servers]$ -------表示所有服务器用postgres用户操作
[server1]# -------表示server1服务器用root用户操作
[server1]$ -------表示server1服务器用postgres用户操作
postgres=# -------表示登录到psql数据库用户操作
2.1 配置所有服务器的hosts文件
[all servers]#vi etc/hosts#加入以下内容(以实际主机ip为准)10.8.0.51 server110.8.0.52 server210.8.0.53 server3
2.2 安装postgresql 12和pgpool
一、 添加解决依赖关系的软件源:
[all servers]#yum install epel-release -y[all servers]#yum install centos-release-scl -y
二、安装postgresql12:
[all servers]#yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm -y[all servers]#yum install postgresql12 postgresql12-libs postgresql12-devel postgresql12-server -y
三、安装pgpool-II:
[all servers]#yum install http://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-release-4.1-2.noarch.rpm[all servers]#yum install pgpool-II-pg12-* -y
2.3 启动前配置
一、创建归档日志目录(所有服务器):
切换到postgres用户操作
[all servers]# su - postgres[all servers]$ mkdir var/lib/pgsql/archivedir
二、对server1数据库进行初始化(server2、server3不要执行下面的操作):
root 用户执行
/usr/pgsql-12/bin/postgresql-12-setup initdb
三、修改server1的数据库主配置文件参数(server2、server3不要执行下面的操作):
[server1]$ vi var/lib/pgsql/12/data/postgresql.confwal_log_hints = on#以下为需要修改的参数listen_addresses = '*'port = 5432archive_mode = onarchive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"'max_wal_senders = 10max_replication_slots = 10wal_level = replicahot_standby = on
四、启动server1的postgresql服务:
[all servers]#systemctl start postgresql-12.service
3.数据库集群配置
3.1 数据库集群用户说明
用户名 | 密码 | 描述 |
repl | repl | 用于PostgreSQL流复制 |
pgpool | pgpool | 用于pgpool-II心跳检测以及复制延迟检测 |
postgres | postgres | 用于在线恢复 |
3.2 初始化相关系统及数据库用户配置
一、登录postgresql控制台(以下操作只需要在server1上进行)
[server1]# su postgres[server1]$ psql
二、初始化列表中的三个用户(其中postgres用户是系统用户,不需要再创建了。以下 操作只需要在server1上进行)
postgres=# SET password_encryption = 'scram-sha-256';postgres=# CREATE ROLE pgpool WITH LOGIN;postgres=# CREATE ROLE repl WITH REPLICATION LOGIN;postgres=# \password pgpool这里键入(生产环境请使用复杂口令):pgpoolpostgres=# \password repl这里键入(生产环境请使用复杂口令):replpostgres=# \password postgres这里键入(生产环境请使用复杂口令):postgres
设置完密码后先退出(\q),重新登入后执行这条语句。以后通过执行show pool_nodes;
命令就可以查看到"replication_state" 和"replication_sync_state" 字段了
Grant pg_monitor to pgpool;
三、配置pg_hba.conf文件(以下操作只需要在server1上进行):
postgres=#vi var/lib/pgsql/12/data/pg_hba.confhost all all 0.0.0.0/0 scram-sha-256host replication all 0.0.0.0/0 scram-sha-256
四、设置集群服务器之间ssh无密码登录(所有服务器上操作):
[all servers]#mkdir ~/.ssh #若目录已经存在则忽略该命令[all servers]# cd ~/.ssh[all servers]# ssh-keygen -t rsa -f id_rsa_pgpool[all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1[all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2[all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3[all servers]# su - postgres[all servers]$mkdir ~/.ssh #若目录已经存在则忽略该命令[all servers]$ cd ~/.ssh[all servers]$ ssh-keygen -t rsa -f id_rsa_pgpool[all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1[all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2[all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3
五、在root用户和postgres用户下进行以下测试,验证是否实现无密码登录(所有服务器 上操作):
ssh postgres@server1 -i ~/.ssh/id_rsa_pgpoolssh postgres@server2 -i ~/.ssh/id_rsa_pgpoolssh postgres@server3 -i ~/.ssh/id_rsa_pgpool
六、设置允许数据库用户repl和postgres用户无密码执行集群操作(所有服务器上操作):
[all servers]# su - postgres[all servers]$ vi var/lib/pgsql/.pgpass
写入以下内容:(密码对应前面第二点上设置的密码)
server1:5432:replication:repl:<repl user password>server2:5432:replication:repl:<repl user passowrd>server3:5432:replication:repl:<repl user passowrd>server1:5432:postgres:postgres:<postgres user passowrd>server2:5432:postgres:postgres:<postgres user passowrd>server3:5432:postgres:postgres:<postgres user passowrd>
参考示例:
server1:5432:replication:repl:replserver2:5432:replication:repl:replserver3:5432:replication:repl:replserver1:5432:postgres:postgres:postgresserver2:5432:postgres:postgres:postgresserver3:5432:postgres:postgres:postgres
七、设置文件权限:
[all servers]$ chmod 600 var/lib/pgsql/.pgpass
3.3 防火墙配置
方法一直接关闭防火墙(所有服务器上操作):
[all servers]# systemctl disable firewalld.service[all servers]# systemctl stop firewalld.service
方法二防火墙墙上开放相关端口(所有服务器上操作):
[all servers]# firewall-cmd --permanent --zone=public --add-service=postgresql[all servers]# firewall-cmd --permanent --zone=public --add-port=9999/tcp --add-port=9898/tcp --add-port=9000/tcp --add-port=9694/udp[all servers]# firewall-cmd --reload
3.4 配置pgpool服务
一、公共配置
[all servers]# chown -R postgres:postgres etc/pgpool-II[all servers]# su - postgres[all servers]$ cp -p etc/pgpool-II/pgpool.conf.sample-stream etc/pgpool-II/pgpool.conf[all servers]$ vi etc/pgpool-II/pgpool.conf
修改下面的参数
listen_addresses = '*'sr_check_user = 'pgpool'sr_check_password = ''health_check_period = 5health_check_timeout = 30health_check_user = 'pgpool'health_check_password = ''health_check_max_retries = 3backend_hostname0 = 'server1'backend_port0 = 5432backend_weight0 = 1backend_data_directory0 = '/var/lib/pgsql/12/data'backend_flag0 = 'ALLOW_TO_FAILOVER'backend_application_name0 = 'server1'backend_hostname1 = 'server2'backend_port1 = 5432backend_weight1 = 1backend_data_directory1 = '/var/lib/pgsql/12/data'backend_flag1 = 'ALLOW_TO_FAILOVER'backend_application_name1 = 'server2'backend_hostname2 = 'server3'backend_port2 = 5432backend_weight2 = 1backend_data_directory2 = '/var/lib/pgsql/12/data'backend_flag2 = 'ALLOW_TO_FAILOVER'backend_application_name2 = 'server3'failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'follow_master_command='/etc/pgpool-II/follow_master.sh %d %h %p %D %m %H %M %P %r %R'recovery_user = 'postgres'# Online recovery userrecovery_password = ''# Online recovery passwordrecovery_1st_stage_command = 'recovery_1st_stage'enable_pool_hba = on
二、创建failover_command和follow_master_command调用的脚本:
[all servers]# su - postgres[all servers]$ vi etc/pgpool-II/failover.sh
写入以下内容
#!/bin/bash# This script is run by failover_command.set -o xtraceexec > >(logger -i -p local1.info) 2>&1# Special values:# %d = failed node id# %h = failed node hostname# %p = failed node port number# %D = failed node database cluster path# %m = new master node id# %H = new master node hostname# %M = old master node id# %P = old primary node id# %r = new master port number# %R = new master database cluster path# %N = old primary node hostname# %S = old primary node port number# %% = '%' characterFAILED_NODE_ID="$1"FAILED_NODE_HOST="$2"FAILED_NODE_PORT="$3"FAILED_NODE_PGDATA="$4"NEW_MASTER_NODE_ID="$5"NEW_MASTER_NODE_HOST="$6"OLD_MASTER_NODE_ID="$7"OLD_PRIMARY_NODE_ID="$8"NEW_MASTER_NODE_PORT="$9"NEW_MASTER_NODE_PGDATA="${10}"OLD_PRIMARY_NODE_HOST="${11}"OLD_PRIMARY_NODE_PORT="${12}"PGHOME=/usr/pgsql-12logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST## If there's no master node anymore, skip failover.if [ $NEW_MASTER_NODE_ID -lt 0 ]; thenlogger -i -p local1.info failover.sh: All nodes are down. Skipping failover.exit 0fi## Test passwrodless SSHssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/nullif [ $? -ne 0 ]; thenlogger -i -p local1.info failover.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.exit 1fi## If Standby node is down, skip failover.if [ $FAILED_NODE_ID -ne $OLD_PRIMARY_NODE_ID ]; thenlogger -i -p local1.info failover.sh: Standby node is down. Skipping failover.ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$OLD_PRIMARY_NODE_HOST -i ~/.ssh/id_rsa_pgpool "${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\""if [ $? -ne 0 ]; thenlogger -i -p local1.error failover.sh: drop replication slot "${FAILED_NODE_HOST}" failedexit 1fiexit 0fi## Promote Standby node.logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}.ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promoteif [ $? -ne 0 ]; thenlogger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failedexit 1filogger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary nodeexit 0
[all servers]$vi etc/pgpool-II/follow_master.sh
写入以下内容
#!/bin/bash# This script is run after failover_command to synchronize the Standby with the new Primary.# First try pg_rewind. If pg_rewind failed, use pg_basebackup.set -o xtraceexec > >(logger -i -p local1.info) 2>&1# Special values:# %d = failed node id# %h = failed node hostname# %p = failed node port number# %D = failed node database cluster path# %m = new master node id# %H = new master node hostname# %M = old master node id# %P = old primary node id# %r = new master port number# %R = new master database cluster path# %N = old primary node hostname# %S = old primary node port number# %% = '%' characterFAILED_NODE_ID="$1"FAILED_NODE_HOST="$2"FAILED_NODE_PORT="$3"FAILED_NODE_PGDATA="$4"NEW_MASTER_NODE_ID="$5"NEW_MASTER_NODE_HOST="$6"OLD_MASTER_NODE_ID="$7"OLD_PRIMARY_NODE_ID="$8"NEW_MASTER_NODE_PORT="$9"NEW_MASTER_NODE_PGDATA="${10}"PGHOME=/usr/pgsql-12ARCHIVEDIR=/var/lib/pgsql/archivedirREPLUSER=replPCP_USER=pgpoolPGPOOL_PATH=/usr/binPCP_PORT=9898logger -i -p local1.info follow_master.sh: start: Standby node ${FAILED_NODE_ID}## Test passwrodless SSHssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/nullif [ $? -ne 0 ]; thenlogger -i -p local1.info follow_master.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.exit 1fi## Get PostgreSQL major versionPGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`if [ $PGVERSION -ge 12 ]; thenRECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.confelseRECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conffi## Check the status of Standbyssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status## If Standby is running, synchronize it with the new Primary.if [ $? -eq 0 ]; thenlogger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID# Create replication slot "${FAILED_NODE_HOST}"ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\""ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "set -o errexit${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stopcat > ${RECOVERYCONF} << EOTprimary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''recovery_target_timeline = 'latest'restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'primary_slot_name = '${FAILED_NODE_HOST}'EOTif [ ${PGVERSION} -ge 12 ]; thentouch ${FAILED_NODE_PGDATA}/standby.signalelseecho \"standby_mode = 'on'\" >> ${RECOVERYCONF}fi${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\""if [ $? -ne 0 ]; thenlogger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "set -o errexit# Execute pg_basebackuprm -rf ${FAILED_NODE_PGDATA}rm -rf ${ARCHIVEDIR}/*${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X streamif [ ${PGVERSION} -ge 12 ]; thensed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \-e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.confficat > ${RECOVERYCONF} << EOTprimary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''recovery_target_timeline = 'latest'restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'primary_slot_name = '${FAILED_NODE_HOST}'EOTif [ ${PGVERSION} -ge 12 ]; thentouch ${FAILED_NODE_PGDATA}/standby.signalelseecho \"standby_mode = 'on'\" >> ${RECOVERYCONF}fi"if [ $? -ne 0 ]; then# drop replication slotssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\""logger -i -p local1.error follow_master.sh: end: pg_basebackup failedexit 1fifi# start Standby node on ${FAILED_NODE_HOST}ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l dev/null -w -D ${FAILED_NODE_PGDATA} start# If start Standby successfully, attach this nodeif [ $? -eq 0 ]; then# Run pcp_attact_node to attach Standby node to Pgpool-II.${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}if [ $? -ne 0 ]; thenlogger -i -p local1.error follow_master.sh: end: pcp_attach_node failedexit 1fi# If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"elsessh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')"logger -i -p local1.error follow_master.sh: end: follow master command failedexit 1fielselogger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master commandexit 0filogger -i -p local1.info follow_master.sh: end: follow master command completeexit 0
[all servers]$ chmod +x etc/pgpool-II/{failover.sh,follow_master.sh}
3.5 Online Recovery 脚本配置
一、添加recovery_1st_stage脚本(以下操作只需要在server1上进行)
[server1]# su - postgres[server1]$ vi var/lib/pgsql/12/data/recovery_1st_stage
写入以下内容
#!/bin/bash# This script is executed by "recovery_1st_stage" to recovery a Standby node.set -o xtraceexec > >(logger -i -p local1.info) 2>&1PRIMARY_NODE_PGDATA="$1"DEST_NODE_HOST="$2"DEST_NODE_PGDATA="$3"PRIMARY_NODE_PORT="$4"DEST_NODE_ID="$5"DEST_NODE_PORT="$6"PRIMARY_NODE_HOST=$(hostname)PGHOME=/usr/pgsql-12ARCHIVEDIR=/var/lib/pgsql/archivedirREPLUSER=repllogger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID## Test passwrodless SSHssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/nullif [ $? -ne 0 ]; thenlogger -i -p local1.info recovery_1st_stage: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.exit 1fi## Get PostgreSQL major versionPGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`if [ $PGVERSION -ge 12 ]; thenRECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.confelseRECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conffi## Create replication slot "${DEST_NODE_HOST}"${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQSELECT pg_create_physical_replication_slot('${DEST_NODE_HOST}');EOQ## Execute pg_basebackup to recovery Standby nodessh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "set -o errexitrm -rf $DEST_NODE_PGDATArm -rf $ARCHIVEDIR/*${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X streamif [ ${PGVERSION} -ge 12 ]; thensed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \-e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.confficat > ${RECOVERYCONF} << EOTprimary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''recovery_target_timeline = 'latest'restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'primary_slot_name = '${DEST_NODE_HOST}'EOTif [ ${PGVERSION} -ge 12 ]; thentouch ${DEST_NODE_PGDATA}/standby.signalelseecho \"standby_mode = 'on'\" >> ${RECOVERYCONF}fised -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf"if [ $? -ne 0 ]; then${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQSELECT pg_drop_replication_slot('${DEST_NODE_HOST}');EOQlogger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failedexit 1filogger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage completeexit 0
二、添加pgpool_remote_start脚本
[server1]$ vi var/lib/pgsql/12/data/pgpool_remote_start
写入以下内容
#!/bin/bash# This script is run after recovery_1st_stage to start Standby node.set -o xtraceexec > >(logger -i -p local1.info) 2>&1PGHOME=/usr/pgsql-12DEST_NODE_HOST="$1"DEST_NODE_PGDATA="$2"logger -i -p local1.info pgpool_remote_start: start: remote start Standby node $DEST_NODE_HOST## Test passwrodless SSHssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/nullif [ $? -ne 0 ]; thenlogger -i -p local1.info pgpool_remote_start: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.exit 1fi## Start Standby nodessh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "$PGHOME/bin/pg_ctl -l dev/null -w -D $DEST_NODE_PGDATA start"if [ $? -ne 0 ]; thenlogger -i -p local1.error pgpool_remote_start: $DEST_NODE_HOST PostgreSQL start failed.exit 1filogger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully.exit 0[server1]$ chmod +x var/lib/pgsql/12/data/{recovery_1st_stage,pgpool_remote_start}
三、安装pgpool recovery on template1扩展(以下操作只需要在server1上进行)
[server1]# su - postgres[server1]$ psql template1 -c "CREATE EXTENSION pgpool_recovery"
3.5 配置pgpool-II的客户端登录验证
[all servers]$vi etc/pgpool-II/pool_hba.conf
加入以下内容
host all pgpool 0.0.0.0/0 scram-sha-256host all postgres 0.0.0.0/0 scram-sha-256
配置解密证书用于scram-sha-256验证
[all servers]# su - postgres[all servers]$ echo 'some string' > ~/.pgpoolkey[all servers]$ chmod 600 ~/.pgpoolkey[all servers]$ pg_enc -m -k ~/.pgpoolkey -u pgpool -p (与前面3.2第二点的密码保持一致)db password: [pgpool 用户密码][all servers]$ pg_enc -m -k ~/.pgpoolkey -u postgres -pdb password: [postgres 用户密码]
查看pool_passwd内容
cat etc/pgpool-II/pool_passwd
pgpool:AESheq2ZMZjynddMWk5sKP/Rw==postgres:AESHs/pWL5rtXy2IwuzroHfqg==
3.6 配置pgpool Watchdog
[all servers]#vi etc/pgpool-II/pgpool.conf
修改下列参数
use_watchdog = on#指定集群使用的虚拟IPdelegate_IP = '10.8.0.54'#虚拟ip绑定的网卡根据实际情况配置,测试环境的网卡名为ens160,如你的机器使用网卡名为eth0,就将其替换if_up_cmd = '/usr/bin/sudo sbin/ip addr add $_IP_$/24 dev ens160 label ens160:0'if_down_cmd = '/usr/bin/sudo sbin/ip addr del $_IP_$/24 dev ens160'arping_cmd = '/usr/bin/sudo usr/sbin/arping -U $_IP_$ -w 1 -I ens160'if_cmd_path = '/sbin'arping_path = '/usr/sbin'
指定主机名和端口号
server1上配置如下
wd_hostname = 'server1'wd_port = 9000
server2上配置如下
wd_hostname = 'server2'wd_port = 9000
server3上配置如下
wd_hostname = 'server3'wd_port = 9000
配置其它pgpool的连接信息
server1上配置如下
other_pgpool_hostname0 = 'server2'other_pgpool_port0 = 9999other_wd_port0 = 9000other_pgpool_hostname1 = 'server3'other_pgpool_port1 = 9999other_wd_port1 = 9000
server2上配置如下
other_pgpool_hostname0 = 'server1'other_pgpool_port0 = 9999other_wd_port0 = 9000other_pgpool_hostname1 = 'server3'other_pgpool_port1 = 9999other_wd_port1 = 9000
server3上配置如下
other_pgpool_hostname0 = 'server1'other_pgpool_port0 = 9999other_wd_port0 = 9000other_pgpool_hostname1 = 'server2'other_pgpool_port1 = 9999other_wd_port1 = 9000
pgpool的心跳信号配置
server1上配置如下
heartbeat_destination0 = 'server2'heartbeat_destination_port0 = 9694heartbeat_device0 = ''heartbeat_destination1 = 'server3'heartbeat_destination_port1 = 9694heartbeat_device1 = ''
server2上配置如下
heartbeat_destination0 = 'server1'heartbeat_destination_port0 = 9694heartbeat_device0 = ''heartbeat_destination1 = 'server3'heartbeat_destination_port1 = 9694heartbeat_device1 = ''
server3上配置如下
heartbeat_destination0 = 'server1'heartbeat_destination_port0 = 9694heartbeat_device0 = ''heartbeat_destination1 = 'server2'heartbeat_destination_port1 = 9694heartbeat_device1 = ''
3.7设置在pgpool-II启动时忽略pgpool_status文件
将OPTS=" -n"改为OPTS=" -D -n"
[all servers]# vi etc/sysconfig/pgpool...OPTS=" -D -n"
3.8 Pgpool日志配置
[all servers]#vi etc/pgpool-II/pgpool.conflog_destination = 'syslog'syslog_facility = 'LOCAL1'# Syslog local facility. Default to LOCAL0
创建pgpool日志文件
[all servers]# mkdir var/log/pgpool-II[all servers]# touch var/log/pgpool-II/pgpool.log
修改rsyslog配置
[all servers]# vi /etc/rsyslog.conf...*.info;mail.none;authpriv.none;cron.none;LOCAL1.none /var/log/messagesLOCAL1.* /var/log/pgpool-II/pgpool.log
配置日志切割
[all servers]# vi /etc/logrotate.d/syslog.../var/log/messages#加入pgpool.log/var/log/pgpool-II/pgpool.log/var/log/secure
重启rsyslog 服务
[all servers]# systemctl restart rsyslog
3.9 PCP 配置
[all servers]# echo 'pgpool:'`pg_md5 PCP passowrd` >> /etc/pgpool-II/pcp.conf
(与前面3.2第二点的密码保持一致)
以下是参考用例
[all servers]#echo 'pgpool:'`pg_md5 pgpool` >> /etc/pgpool-II/pcp.conf
配置.pcppass
实现follow_master_command无密码PCP命令。注意是在root用户下执行
[all servers]# echo 'localhost:9898:pgpool:pgpool' > ~/.pcppass[all servers]# chmod 600 ~/.pcppass
至此,pgpool-II的配置完成了
4启动集群
4.1启动pgpool-II和postgresql服务
所有服务器上执行:
[all servers]# systemctl start pgpool.service
在server1上执行:
[server1]# systemctl start postgresql-12.service
确认虚拟ip已经启用
#ping 10.8.0.54
4.2启动PostgreSQL standby 服务器
在server2上执行操作
[server2]# pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 1Password:pcp_recovery_node -- Command Successful
-n后面的参数对应的关系是:
0 server1
1 server2
2 server3
在server3上执行操作
[server3]# pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 2Password:pcp_recovery_node -- Command Successful
确认server2和server3已经启动为standby节点
(任一服务器)
[server*]# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"Password for user pgpoolnode_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2020-05-22 21:44:071 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2020-05-22 21:29:472 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-22 21:44:45(3 rows)
4.3测试pgpool高可用 watchdog
查看 watchdog状态
# pcp_watchdog_info -h 10.8.0.54 -p 9898 -U pgpoolPassword:3 YES server1:9999 Linux server1 server1server1:9999 Linux server1 server1 9999 9000 4 MASTERserver2:9999 Linux server2 server2 9999 9000 7 STANDBYserver3:9999 Linux server3 server3 9999 9000 7 STANDBY
可以看到目前集群的主节点为server1
关闭server1的pgpool服务,观察server2或server3是否接替工作(注意,pgpool集群至少 要有两台正常工作,虚拟ip才会启动)
[server1]# systemctl stop pgpool.service# pcp_watchdog_info -p 9898 -h 10.8.0.54 -U pgpoolPassword:3 YES server2:9999 Linux server2 server2server2:9999 Linux server2 server2 9999 9000 4 MASTER #server2 成为MASTERserver1:9999 Linux server1 server1 9999 9000 10 SHUTDOWN #server1 状态为关闭server3:9999 Linux server3 server3 9999 9000 7 STANDBY #server3 仍然是standby
启动server1上的pgpool服务
[server1]# systemctl start pgpool.service[server1]# pcp_watchdog_info -p 9898 -h 10.8.0.54 -U pgpoolPassword:3 YES server2:9999 Linux server2 server2server2:9999 Linux server2 server2 9999 9000 4 MASTERserver1:9999 Linux server1 server1 9999 9000 7 STANDBYserver3:9999 Linux server3 server3 9999 9000 7 STANDBY
可以看到server1变成了standby状态
4.4数据库故障转移测试
查看数据库后端信息
[server*]# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"Password for user pgpool:node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2020-05-22 21:29:471 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2020-05-22 21:44:072 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-22 21:44:45(3 rows)
停止主节点PostgreSQL server服务并验证故障转移
[server1]$ /usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data -m immediate stop
查看集群信息
# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"Password for user pgpool:node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-05-22 21:36:031 | server2 | 5432 | up | 0.333333 | primary | 0 | true | 0 | | | 2020-05-22 21:36:032 | server3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-22 21:36:15(3 rows)
可以看到 server 2 成了新主节点。Server3也跟着down掉了
4.5在线恢复down掉的节点
[server1] # pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 0Password:pcp_recovery_node -- Command Successful[server3]# pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 2Password:pcp_recovery_node -- Command SuccessfulThen verify that server1 is started as a standby.[server*]# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"Password for user pgpool:node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------0 | server1 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-06-02 18:26:061 | server2 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2020-06-02 17:12:462 | server3 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2020-06-02 18:26:06(3 rows)




