暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

centos 7.4 pgpool 4.1-II搭建postgresql 12 高可用集群

da哥的运维经验说 2020-06-09
663

1.演示环境说明


1.1 主机名和ip地址


主机名ip地址虚拟IP
server110.8.0.5110.8.0.54
server210.8.0.52
server310.8.0.53

 

 

1.2 数据库版本及配置说明


补充说明
PostgreSQL
12.3
-
port
5432
-
$PGDATA
/var/lib/pgsql/12/data
-
Archive mode
on
/var/lib/pgsql/archivedir
复制槽
Enable
-
服务自动启动
禁止
-

 

1.3 Pgpool-II 版本和配置

 

补充说明
Pgpool-II版本
4.1.2
-
端口
9999
Pgpool-II使用的端口
9898
PCP 进程使用的端口 accepts connections
9000
pgpool-II 服务器上的需要被监控的watchdog的端口
9694
watchdog心跳端口
配置文件
/etc/pgpool-II/pgpool.conf
Pgpool-II 主配置文件
Pgpool-II 启动用户
postgres (Pgpool-II 4.1 or later)
Pgpool-II 4.0 或之前使用的用户为root
运行模式
流复制模式
-
Watchdog
on
存活检测方式: 心跳
服务自动启动
禁止
-

 

 

 

 

 

 

2.集群部署前的基础环境准备

 

[all servers]#         -------表示所有服务器用root用户操作

[all servers]$         -------表示所有服务器用postgres用户操作

[server1]#           -------表示server1服务器用root用户操作

[server1]$           -------表示server1服务器用postgres用户操作

postgres=#            -------表示登录到psql数据库用户操作

 

 

2.1 配置所有服务器的hosts文件




    [all servers]#vi etc/hosts


    #加入以下内容(以实际主机ip为准)
    10.8.0.51 server1
    10.8.0.52 server2
    10.8.0.53 server3


     


    2.2 安装postgresql 12pgpool

     

      一、 添加解决依赖关系的软件源:


        [all servers]#yum install epel-release -y
      [all servers]#yum install centos-release-scl -y

       

       

       

        二、安装postgresql12


        [all servers]#yum install https://download.postgresql.org/pub/repos/yum/reporpms/EL-7-x86_64/pgdg-redhat-repo-latest.noarch.rpm -y
        [all servers]#yum install postgresql12 postgresql12-libs postgresql12-devel postgresql12-server -y

         

          三、安装pgpool-II

          [all servers]#yum install http://www.pgpool.net/yum/rpms/4.1/redhat/rhel-7-x86_64/pgpool-II-release-4.1-2.noarch.rpm
          [all servers]#yum install pgpool-II-pg12-* -y

           

           

          2.3 启动前配置

            一、创建归档日志目录(所有服务器):


            切换到postgres用户操作

            [all servers]# su - postgres
            [all servers]$ mkdir var/lib/pgsql/archivedir


             

             

              二、server1数据库进行初始化(server2server3不要执行下面的操作):


               root 用户执行

              /usr/pgsql-12/bin/postgresql-12-setup initdb

               

               

                三、修改server1的数据库主配置文件参数(server2server3不要执行下面的操作):

               

                [server1]$ vi var/lib/pgsql/12/data/postgresql.conf
                wal_log_hints = on
                #以下为需要修改的参数
                listen_addresses = '*'
                port = 5432
                archive_mode = on
                archive_command = 'cp "%p" "/var/lib/pgsql/archivedir/%f"'
                max_wal_senders = 10
                max_replication_slots = 10
                wal_level = replica
                hot_standby = on

                 

                  四、启动server1postgresql服务:


                  [all servers]#systemctl start postgresql-12.service


                   

                  3.数据库集群配置

                  3.1 数据库集群用户说明

                  用户名

                  密码

                  描述

                  repl

                  repl

                  用于PostgreSQL流复制

                  pgpool

                  pgpool

                  用于pgpool-II心跳检测以及复制延迟检测

                  postgres

                  postgres

                  用于在线恢复

                   

                   

                  3.2 初始化相关系统及数据库用户配置


                    一、登录postgresql控制台(以下操作只需要在server1上进行)

                     [server1]# su postgres
                    [server1]$ psql


                     

                      二、初始化列表中的三个用户(其中postgres用户是系统用户,不需要再创建了。以下      操作只需要在server1上进行

                         

                      postgres=# SET password_encryption = 'scram-sha-256';


                      postgres=# CREATE ROLE pgpool WITH LOGIN;


                      postgres=# CREATE ROLE repl WITH REPLICATION LOGIN;


                      postgres=# \password pgpool
                      这里键入(生产环境请使用复杂口令):pgpool
                      postgres=# \password repl
                      这里键入(生产环境请使用复杂口令):repl
                      postgres=# \password postgres
                      这里键入(生产环境请使用复杂口令):postgres


                       

                        设置完密码后先退出(\q),重新登入后执行这条语句。以后通过执行show pool_nodes;

                        命令就可以查看到"replication_state" "replication_sync_state" 字段了


                        Grant pg_monitor to pgpool;


                         

                          三、配置pg_hba.conf文件(以下操作只需要在server1上进行):


                          postgres=#vi var/lib/pgsql/12/data/pg_hba.conf


                          host all all 0.0.0.0/0 scram-sha-256
                          host replication all 0.0.0.0/0 scram-sha-256


                           

                            四、设置集群服务器之间ssh无密码登录(所有服务器上操作):

                           


                            [all servers]#mkdir ~/.ssh              #若目录已经存在则忽略该命令
                            [all servers]# cd ~/.ssh
                            [all servers]# ssh-keygen -t rsa -f id_rsa_pgpool
                            [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
                            [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server2
                            [all servers]# ssh-copy-id -i id_rsa_pgpool.pub postgres@server3


                            [all servers]# su - postgres
                            [all servers]$mkdir ~/.ssh #若目录已经存在则忽略该命令
                            [all servers]$ cd ~/.ssh
                            [all servers]$ ssh-keygen -t rsa -f id_rsa_pgpool
                            [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server1
                            [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server2
                            [all servers]$ ssh-copy-id -i id_rsa_pgpool.pub postgres@server3



                              五、root用户和postgres用户下进行以下测试,验证是否实现无密码登录(所有服务器     上操作)

                             

                              ssh postgres@server1 -i ~/.ssh/id_rsa_pgpool
                              ssh postgres@server2 -i ~/.ssh/id_rsa_pgpool
                              ssh postgres@server3 -i ~/.ssh/id_rsa_pgpool


                                六、设置允许数据库用户replpostgres用户无密码执行集群操作(所有服务器上操作):


                                [all servers]# su - postgres
                                [all servers]$ vi var/lib/pgsql/.pgpass


                                  写入以下内容:(密码对应前面第二点上设置的密码)

                                  server1:5432:replication:repl:<repl user password>
                                  server2:5432:replication:repl:<repl user passowrd>
                                  server3:5432:replication:repl:<repl user passowrd>
                                  server1:5432:postgres:postgres:<postgres user passowrd>
                                  server2:5432:postgres:postgres:<postgres user passowrd>
                                  server3:5432:postgres:postgres:<postgres user passowrd>


                                   

                                    参考示例:

                                    server1:5432:replication:repl:repl
                                    server2:5432:replication:repl:repl
                                    server3:5432:replication:repl:repl
                                    server1:5432:postgres:postgres:postgres
                                    server2:5432:postgres:postgres:postgres
                                    server3:5432:postgres:postgres:postgres


                                     

                                      七、设置文件权限:

                                      [all servers]$ chmod 600  var/lib/pgsql/.pgpass

                                      3.3 防火墙配置

                                        方法一直接关闭防火墙(所有服务器上操作):


                                        [all servers]# systemctl disable firewalld.service
                                        [all servers]# systemctl stop firewalld.service


                                         

                                          方法二防火墙墙上开放相关端口(所有服务器上操作):


                                          [all servers]# firewall-cmd --permanent --zone=public --add-service=postgresql
                                          [all servers]# firewall-cmd --permanent --zone=public --add-port=9999/tcp --add-port=9898/tcp --add-port=9000/tcp --add-port=9694/udp
                                          [all servers]# firewall-cmd --reload


                                           

                                           

                                          3.4 配置pgpool服务

                                            一、公共配置


                                            [all servers]# chown -R postgres:postgres etc/pgpool-II
                                            [all servers]# su - postgres
                                            [all servers]$ cp -p etc/pgpool-II/pgpool.conf.sample-stream etc/pgpool-II/pgpool.conf
                                            [all servers]$ vi etc/pgpool-II/pgpool.conf


                                              修改下面的参数


                                              listen_addresses = '*'


                                              sr_check_user = 'pgpool'
                                              sr_check_password = ''


                                              health_check_period = 5


                                              health_check_timeout = 30


                                              health_check_user = 'pgpool'
                                              health_check_password = ''


                                              health_check_max_retries = 3


                                              backend_hostname0 = 'server1'
                                              backend_port0 = 5432
                                              backend_weight0 = 1
                                              backend_data_directory0 = '/var/lib/pgsql/12/data'
                                              backend_flag0 = 'ALLOW_TO_FAILOVER'
                                              backend_application_name0 = 'server1'


                                              backend_hostname1 = 'server2'
                                              backend_port1 = 5432
                                              backend_weight1 = 1
                                              backend_data_directory1 = '/var/lib/pgsql/12/data'
                                              backend_flag1 = 'ALLOW_TO_FAILOVER'
                                              backend_application_name1 = 'server2'


                                              backend_hostname2 = 'server3'
                                              backend_port2 = 5432
                                              backend_weight2 = 1
                                              backend_data_directory2 = '/var/lib/pgsql/12/data'
                                              backend_flag2 = 'ALLOW_TO_FAILOVER'
                                              backend_application_name2 = 'server3'


                                              failover_command = '/etc/pgpool-II/failover.sh %d %h %p %D %m %H %M %P %r %R %N %S'
                                              follow_master_command='/etc/pgpool-II/follow_master.sh %d %h %p %D %m %H %M %P %r %R'


                                              recovery_user = 'postgres'
                                              # Online recovery user
                                              recovery_password = ''
                                              # Online recovery password
                                              recovery_1st_stage_command = 'recovery_1st_stage'


                                              enable_pool_hba = on


                                               

                                               

                                                二、创建failover_commandfollow_master_command调用的脚本:


                                                [all servers]# su - postgres
                                                [all servers]$ vi etc/pgpool-II/failover.sh


                                                  写入以下内容


                                                  #!/bin/bash
                                                  # This script is run by failover_command.


                                                  set -o xtrace
                                                  exec > >(logger -i -p local1.info) 2>&1


                                                  # Special values:
                                                  # %d = failed node id
                                                  # %h = failed node hostname
                                                  # %p = failed node port number
                                                  # %D = failed node database cluster path
                                                  # %m = new master node id
                                                  # %H = new master node hostname
                                                  # %M = old master node id
                                                  # %P = old primary node id
                                                  # %r = new master port number
                                                  # %R = new master database cluster path
                                                  # %N = old primary node hostname
                                                  # %S = old primary node port number
                                                  # %% = '%' character


                                                  FAILED_NODE_ID="$1"
                                                  FAILED_NODE_HOST="$2"
                                                  FAILED_NODE_PORT="$3"
                                                  FAILED_NODE_PGDATA="$4"
                                                  NEW_MASTER_NODE_ID="$5"
                                                  NEW_MASTER_NODE_HOST="$6"
                                                  OLD_MASTER_NODE_ID="$7"
                                                  OLD_PRIMARY_NODE_ID="$8"
                                                  NEW_MASTER_NODE_PORT="$9"
                                                  NEW_MASTER_NODE_PGDATA="${10}"
                                                  OLD_PRIMARY_NODE_HOST="${11}"
                                                  OLD_PRIMARY_NODE_PORT="${12}"


                                                  PGHOME=/usr/pgsql-12




                                                  logger -i -p local1.info failover.sh: start: failed_node_id=$FAILED_NODE_ID old_primary_node_id=$OLD_PRIMARY_NODE_ID failed_host=$FAILED_NODE_HOST new_master_host=$NEW_MASTER_NODE_HOST


                                                  ## If there's no master node anymore, skip failover.
                                                  if [ $NEW_MASTER_NODE_ID -lt 0 ]; then
                                                  logger -i -p local1.info failover.sh: All nodes are down. Skipping failover.
                                                  exit 0
                                                  fi


                                                  ## Test passwrodless SSH
                                                  ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/null


                                                  if [ $? -ne 0 ]; then
                                                  logger -i -p local1.info failover.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
                                                  exit 1
                                                  fi


                                                  ## If Standby node is down, skip failover.
                                                  if [ $FAILED_NODE_ID -ne $OLD_PRIMARY_NODE_ID ]; then
                                                  logger -i -p local1.info failover.sh: Standby node is down. Skipping failover.


                                                  ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$OLD_PRIMARY_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
                                                  ${PGHOME}/bin/psql -p $OLD_PRIMARY_NODE_PORT -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
                                                  "


                                                  if [ $? -ne 0 ]; then
                                                  logger -i -p local1.error failover.sh: drop replication slot "${FAILED_NODE_HOST}" failed
                                                  exit 1
                                                  fi


                                                  exit 0
                                                  fi


                                                  ## Promote Standby node.
                                                  logger -i -p local1.info failover.sh: Primary node is down, promote standby node ${NEW_MASTER_NODE_HOST}.


                                                  ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
                                                  postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -D ${NEW_MASTER_NODE_PGDATA} -w promote


                                                  if [ $? -ne 0 ]; then
                                                  logger -i -p local1.error failover.sh: new_master_host=$NEW_MASTER_NODE_HOST promote failed
                                                  exit 1
                                                  fi


                                                  logger -i -p local1.info failover.sh: end: new_master_node_id=$NEW_MASTER_NODE_ID started as the primary node
                                                  exit 0


                                                   

                                                   


                                                    [all servers]$vi etc/pgpool-II/follow_master.sh


                                                      写入以下内容


                                                      #!/bin/bash
                                                      # This script is run after failover_command to synchronize the Standby with the new Primary.
                                                      # First try pg_rewind. If pg_rewind failed, use pg_basebackup.


                                                      set -o xtrace
                                                      exec > >(logger -i -p local1.info) 2>&1


                                                      # Special values:
                                                      # %d = failed node id
                                                      # %h = failed node hostname
                                                      # %p = failed node port number
                                                      # %D = failed node database cluster path
                                                      # %m = new master node id
                                                      # %H = new master node hostname
                                                      # %M = old master node id
                                                      # %P = old primary node id
                                                      # %r = new master port number
                                                      # %R = new master database cluster path
                                                      # %N = old primary node hostname
                                                      # %S = old primary node port number
                                                      # %% = '%' character


                                                      FAILED_NODE_ID="$1"
                                                      FAILED_NODE_HOST="$2"
                                                      FAILED_NODE_PORT="$3"
                                                      FAILED_NODE_PGDATA="$4"
                                                      NEW_MASTER_NODE_ID="$5"
                                                      NEW_MASTER_NODE_HOST="$6"
                                                      OLD_MASTER_NODE_ID="$7"
                                                      OLD_PRIMARY_NODE_ID="$8"
                                                      NEW_MASTER_NODE_PORT="$9"
                                                      NEW_MASTER_NODE_PGDATA="${10}"


                                                      PGHOME=/usr/pgsql-12
                                                      ARCHIVEDIR=/var/lib/pgsql/archivedir
                                                      REPLUSER=repl
                                                      PCP_USER=pgpool
                                                      PGPOOL_PATH=/usr/bin
                                                      PCP_PORT=9898


                                                      logger -i -p local1.info follow_master.sh: start: Standby node ${FAILED_NODE_ID}


                                                      ## Test passwrodless SSH
                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/null


                                                      if [ $? -ne 0 ]; then
                                                      logger -i -p local1.info follow_master.sh: passwrodless SSH to postgres@${NEW_MASTER_NODE_HOST} failed. Please setup passwrodless SSH.
                                                      exit 1
                                                      fi


                                                      ## Get PostgreSQL major version
                                                      PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`


                                                      if [ $PGVERSION -ge 12 ]; then
                                                      RECOVERYCONF=${FAILED_NODE_PGDATA}/myrecovery.conf
                                                      else
                                                      RECOVERYCONF=${FAILED_NODE_PGDATA}/recovery.conf
                                                      fi


                                                      ## Check the status of Standby
                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
                                                      postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ${PGHOME}/bin/pg_ctl -w -D ${FAILED_NODE_PGDATA} status




                                                      ## If Standby is running, synchronize it with the new Primary.
                                                      if [ $? -eq 0 ]; then


                                                      logger -i -p local1.info follow_master.sh: pg_rewind for $FAILED_NODE_ID


                                                      # Create replication slot "${FAILED_NODE_HOST}"
                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
                                                      ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_create_physical_replication_slot('${FAILED_NODE_HOST}');\"
                                                      "


                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "


                                                      set -o errexit


                                                      ${PGHOME}/bin/pg_ctl -w -m f -D ${FAILED_NODE_PGDATA} stop


                                                      cat > ${RECOVERYCONF} << EOT
                                                      primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
                                                      recovery_target_timeline = 'latest'
                                                      restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
                                                      primary_slot_name = '${FAILED_NODE_HOST}'
                                                      EOT


                                                      if [ ${PGVERSION} -ge 12 ]; then
                                                      touch ${FAILED_NODE_PGDATA}/standby.signal
                                                      else
                                                      echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
                                                      fi


                                                      ${PGHOME}/bin/pg_rewind -D ${FAILED_NODE_PGDATA} --source-server=\"user=postgres host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT}\"


                                                      "


                                                      if [ $? -ne 0 ]; then
                                                      logger -i -p local1.error follow_master.sh: end: pg_rewind failed. Try pg_basebackup.


                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "


                                                      set -o errexit


                                                      # Execute pg_basebackup
                                                      rm -rf ${FAILED_NODE_PGDATA}
                                                      rm -rf ${ARCHIVEDIR}/*
                                                      ${PGHOME}/bin/pg_basebackup -h ${NEW_MASTER_NODE_HOST} -U $REPLUSER -p ${NEW_MASTER_NODE_PORT} -D ${FAILED_NODE_PGDATA} -X stream


                                                      if [ ${PGVERSION} -ge 12 ]; then
                                                      sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
                                                      -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${FAILED_NODE_PGDATA}/postgresql.conf
                                                      fi


                                                      cat > ${RECOVERYCONF} << EOT
                                                      primary_conninfo = 'host=${NEW_MASTER_NODE_HOST} port=${NEW_MASTER_NODE_PORT} user=${REPLUSER} application_name=${FAILED_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
                                                      recovery_target_timeline = 'latest'
                                                      restore_command = 'scp ${NEW_MASTER_NODE_HOST}:${ARCHIVEDIR}/%f %p'
                                                      primary_slot_name = '${FAILED_NODE_HOST}'
                                                      EOT


                                                      if [ ${PGVERSION} -ge 12 ]; then
                                                      touch ${FAILED_NODE_PGDATA}/standby.signal
                                                      else
                                                      echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
                                                      fi
                                                      "


                                                      if [ $? -ne 0 ]; then
                                                      # drop replication slot
                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool "
                                                      ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c \"SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')\"
                                                      "


                                                      logger -i -p local1.error follow_master.sh: end: pg_basebackup failed
                                                      exit 1
                                                      fi
                                                      fi


                                                      # start Standby node on ${FAILED_NODE_HOST}
                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null \
                                                      postgres@${FAILED_NODE_HOST} -i ~/.ssh/id_rsa_pgpool $PGHOME/bin/pg_ctl -l dev/null -w -D ${FAILED_NODE_PGDATA} start


                                                      # If start Standby successfully, attach this node
                                                      if [ $? -eq 0 ]; then


                                                      # Run pcp_attact_node to attach Standby node to Pgpool-II.
                                                      ${PGPOOL_PATH}/pcp_attach_node -w -h localhost -U $PCP_USER -p ${PCP_PORT} -n ${FAILED_NODE_ID}


                                                      if [ $? -ne 0 ]; then
                                                      logger -i -p local1.error follow_master.sh: end: pcp_attach_node failed
                                                      exit 1
                                                      fi


                                                      # If start Standby failed, drop replication slot "${FAILED_NODE_HOST}"
                                                      else


                                                      ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${NEW_MASTER_NODE_HOST} -i ~/.ssh/id_rsa_pgpool \
                                                      ${PGHOME}/bin/psql -p ${NEW_MASTER_NODE_PORT} -c "SELECT pg_drop_replication_slot('${FAILED_NODE_HOST}')"


                                                      logger -i -p local1.error follow_master.sh: end: follow master command failed
                                                      exit 1
                                                      fi


                                                      else
                                                      logger -i -p local1.info follow_master.sh: failed_nod_id=${FAILED_NODE_ID} is not running. skipping follow master command
                                                      exit 0
                                                      fi


                                                      logger -i -p local1.info follow_master.sh: end: follow master command complete
                                                      exit 0


                                                       


                                                        [all servers]$ chmod +x etc/pgpool-II/{failover.sh,follow_master.sh}


                                                         

                                                        3.5 Online Recovery 脚本配置

                                                          一、添加recovery_1st_stage脚本(以下操作只需要在server1上进行)


                                                          [server1]# su - postgres
                                                          [server1]$ vi var/lib/pgsql/12/data/recovery_1st_stage


                                                            写入以下内容


                                                            #!/bin/bash
                                                            # This script is executed by "recovery_1st_stage" to recovery a Standby node.


                                                            set -o xtrace
                                                            exec > >(logger -i -p local1.info) 2>&1


                                                            PRIMARY_NODE_PGDATA="$1"
                                                            DEST_NODE_HOST="$2"
                                                            DEST_NODE_PGDATA="$3"
                                                            PRIMARY_NODE_PORT="$4"
                                                            DEST_NODE_ID="$5"
                                                            DEST_NODE_PORT="$6"


                                                            PRIMARY_NODE_HOST=$(hostname)
                                                            PGHOME=/usr/pgsql-12
                                                            ARCHIVEDIR=/var/lib/pgsql/archivedir
                                                            REPLUSER=repl


                                                            logger -i -p local1.info recovery_1st_stage: start: pg_basebackup for Standby node $DEST_NODE_ID


                                                            ## Test passwrodless SSH
                                                            ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/null


                                                            if [ $? -ne 0 ]; then
                                                            logger -i -p local1.info recovery_1st_stage: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
                                                            exit 1
                                                            fi


                                                            ## Get PostgreSQL major version
                                                            PGVERSION=`${PGHOME}/bin/initdb -V | awk '{print $3}' | sed 's/\..*//' | sed 's/\([0-9]*\)[a-zA-Z].*/\1/'`
                                                            if [ $PGVERSION -ge 12 ]; then
                                                            RECOVERYCONF=${DEST_NODE_PGDATA}/myrecovery.conf
                                                            else
                                                            RECOVERYCONF=${DEST_NODE_PGDATA}/recovery.conf
                                                            fi


                                                            ## Create replication slot "${DEST_NODE_HOST}"
                                                            ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
                                                            SELECT pg_create_physical_replication_slot('${DEST_NODE_HOST}');
                                                            EOQ


                                                            ## Execute pg_basebackup to recovery Standby node
                                                            ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "


                                                            set -o errexit


                                                            rm -rf $DEST_NODE_PGDATA
                                                            rm -rf $ARCHIVEDIR/*


                                                            ${PGHOME}/bin/pg_basebackup -h $PRIMARY_NODE_HOST -U $REPLUSER -p $PRIMARY_NODE_PORT -D $DEST_NODE_PGDATA -X stream


                                                            if [ ${PGVERSION} -ge 12 ]; then
                                                            sed -i -e \"\\\$ainclude_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'\" \
                                                            -e \"/^include_if_exists = '$(echo ${RECOVERYCONF} | sed -e 's/\//\\\//g')'/d\" ${DEST_NODE_PGDATA}/postgresql.conf
                                                            fi


                                                            cat > ${RECOVERYCONF} << EOT
                                                            primary_conninfo = 'host=${PRIMARY_NODE_HOST} port=${PRIMARY_NODE_PORT} user=${REPLUSER} application_name=${DEST_NODE_HOST} passfile=''/var/lib/pgsql/.pgpass'''
                                                            recovery_target_timeline = 'latest'
                                                            restore_command = 'scp ${PRIMARY_NODE_HOST}:${ARCHIVEDIR}/%f %p'
                                                            primary_slot_name = '${DEST_NODE_HOST}'
                                                            EOT


                                                            if [ ${PGVERSION} -ge 12 ]; then
                                                            touch ${DEST_NODE_PGDATA}/standby.signal
                                                            else
                                                            echo \"standby_mode = 'on'\" >> ${RECOVERYCONF}
                                                            fi


                                                            sed -i \"s/#*port = .*/port = ${DEST_NODE_PORT}/\" ${DEST_NODE_PGDATA}/postgresql.conf
                                                            "


                                                            if [ $? -ne 0 ]; then


                                                            ${PGHOME}/bin/psql -p ${PRIMARY_NODE_PORT} << EOQ
                                                            SELECT pg_drop_replication_slot('${DEST_NODE_HOST}');
                                                            EOQ


                                                            logger -i -p local1.error recovery_1st_stage: end: pg_basebackup failed. online recovery failed
                                                            exit 1
                                                            fi


                                                            logger -i -p local1.info recovery_1st_stage: end: recovery_1st_stage complete
                                                            exit 0


                                                             

                                                              二、添加pgpool_remote_start脚本


                                                              [server1]$ vi var/lib/pgsql/12/data/pgpool_remote_start


                                                               

                                                                写入以下内容


                                                                #!/bin/bash
                                                                # This script is run after recovery_1st_stage to start Standby node.


                                                                set -o xtrace
                                                                exec > >(logger -i -p local1.info) 2>&1


                                                                PGHOME=/usr/pgsql-12
                                                                DEST_NODE_HOST="$1"
                                                                DEST_NODE_PGDATA="$2"




                                                                logger -i -p local1.info pgpool_remote_start: start: remote start Standby node $DEST_NODE_HOST


                                                                ## Test passwrodless SSH
                                                                ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@${DEST_NODE_HOST} -i ~/.ssh/id_rsa_pgpool ls tmp > dev/null


                                                                if [ $? -ne 0 ]; then
                                                                logger -i -p local1.info pgpool_remote_start: passwrodless SSH to postgres@${DEST_NODE_HOST} failed. Please setup passwrodless SSH.
                                                                exit 1
                                                                fi


                                                                ## Start Standby node
                                                                ssh -T -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null postgres@$DEST_NODE_HOST -i ~/.ssh/id_rsa_pgpool "
                                                                $PGHOME/bin/pg_ctl -l dev/null -w -D $DEST_NODE_PGDATA start
                                                                "


                                                                if [ $? -ne 0 ]; then
                                                                logger -i -p local1.error pgpool_remote_start: $DEST_NODE_HOST PostgreSQL start failed.
                                                                exit 1
                                                                fi


                                                                logger -i -p local1.info pgpool_remote_start: end: $DEST_NODE_HOST PostgreSQL started successfully.
                                                                exit 0


                                                                [server1]$ chmod +x var/lib/pgsql/12/data/{recovery_1st_stage,pgpool_remote_start}



                                                                  三、安装pgpool recovery on template1扩展(以下操作只需要在server1上进行)

                                                                 


                                                                  [server1]# su - postgres
                                                                  [server1]$ psql template1 -c "CREATE EXTENSION pgpool_recovery"


                                                                   

                                                                  3.5 配置pgpool-II的客户端登录验证


                                                                    [all servers]$vi etc/pgpool-II/pool_hba.conf


                                                                      加入以下内容


                                                                      host    all         pgpool           0.0.0.0/0          scram-sha-256
                                                                      host all postgres 0.0.0.0/0 scram-sha-256


                                                                         

                                                                        配置解密证书用于scram-sha-256验证

                                                                       


                                                                        [all servers]# su - postgres
                                                                        [all servers]$ echo 'some string' > ~/.pgpoolkey
                                                                        [all servers]$ chmod 600 ~/.pgpoolkey
                                                                        [all servers]$ pg_enc -m -k ~/.pgpoolkey -u pgpool -p (与前面3.2第二点的密码保持一致)
                                                                        db password: [pgpool 用户密码]
                                                                        [all servers]$ pg_enc -m -k ~/.pgpoolkey -u postgres -p
                                                                        db password: [postgres 用户密码]


                                                                         

                                                                               查看pool_passwd内容


                                                                          cat etc/pgpool-II/pool_passwd
                                                                            pgpool:AESheq2ZMZjynddMWk5sKP/Rw==
                                                                            postgres:AESHs/pWL5rtXy2IwuzroHfqg==


                                                                               

                                                                            3.6 配置pgpool Watchdog

                                                                             


                                                                              [all servers]#vi etc/pgpool-II/pgpool.conf


                                                                                修改下列参数


                                                                                use_watchdog = on


                                                                                #指定集群使用的虚拟IP
                                                                                delegate_IP = '10.8.0.54'


                                                                                #虚拟ip绑定的网卡根据实际情况配置,测试环境的网卡名为ens160,如你的机器使用网卡名为eth0,就将其替换
                                                                                if_up_cmd = '/usr/bin/sudo sbin/ip addr add $_IP_$/24 dev ens160 label ens160:0'
                                                                                if_down_cmd = '/usr/bin/sudo sbin/ip addr del $_IP_$/24 dev ens160'
                                                                                arping_cmd = '/usr/bin/sudo usr/sbin/arping -U $_IP_$ -w 1 -I ens160'


                                                                                if_cmd_path = '/sbin'
                                                                                arping_path = '/usr/sbin'


                                                                                   

                                                                                  指定主机名和端口号

                                                                                      server1上配置如下


                                                                                        wd_hostname = 'server1'
                                                                                  wd_port = 9000


                                                                                       

                                                                                        server2上配置如下

                                                                                   


                                                                                          wd_hostname = 'server2'
                                                                                    wd_port = 9000


                                                                                         

                                                                                            server3上配置如下

                                                                                     

                                                                                     

                                                                                           wd_hostname = 'server3'
                                                                                      wd_port = 9000


                                                                                           

                                                                                        配置其它pgpool的连接信息

                                                                                       

                                                                                             server1上配置如下

                                                                                       


                                                                                        other_pgpool_hostname0 = 'server2'
                                                                                        other_pgpool_port0 = 9999
                                                                                        other_wd_port0 = 9000
                                                                                        other_pgpool_hostname1 = 'server3'
                                                                                        other_pgpool_port1 = 9999
                                                                                        other_wd_port1 = 9000


                                                                                             

                                                                                                server2上配置如下

                                                                                         


                                                                                          other_pgpool_hostname0 = 'server1'
                                                                                          other_pgpool_port0 = 9999
                                                                                          other_wd_port0 = 9000
                                                                                          other_pgpool_hostname1 = 'server3'
                                                                                          other_pgpool_port1 = 9999
                                                                                          other_wd_port1 = 9000


                                                                                               

                                                                                                  server3上配置如下

                                                                                           


                                                                                            other_pgpool_hostname0 = 'server1'
                                                                                            other_pgpool_port0 = 9999
                                                                                            other_wd_port0 = 9000
                                                                                            other_pgpool_hostname1 = 'server2'
                                                                                            other_pgpool_port1 = 9999
                                                                                            other_wd_port1 = 9000


                                                                                                 

                                                                                              pgpool的心跳信号配置

                                                                                             

                                                                                                  server1上配置如下

                                                                                             

                                                                                              heartbeat_destination0 = 'server2'
                                                                                              heartbeat_destination_port0 = 9694
                                                                                              heartbeat_device0 = ''
                                                                                              heartbeat_destination1 = 'server3'
                                                                                              heartbeat_destination_port1 = 9694
                                                                                              heartbeat_device1 = ''

                                                                                                   

                                                                                                     server2上配置如下

                                                                                               

                                                                                                heartbeat_destination0 = 'server1'
                                                                                                heartbeat_destination_port0 = 9694
                                                                                                heartbeat_device0 = ''
                                                                                                heartbeat_destination1 = 'server3'
                                                                                                heartbeat_destination_port1 = 9694
                                                                                                heartbeat_device1 = ''

                                                                                                     

                                                                                                      server3上配置如下

                                                                                                 

                                                                                                  heartbeat_destination0 = 'server1'
                                                                                                  heartbeat_destination_port0 = 9694
                                                                                                  heartbeat_device0 = ''
                                                                                                  heartbeat_destination1 = 'server2'
                                                                                                  heartbeat_destination_port1 = 9694
                                                                                                  heartbeat_device1 = ''


                                                                                                  3.7设置在pgpool-II启动时忽略pgpool_status文件


                                                                                                    将OPTS=" -n"改为OPTS=" -D -n"

                                                                                                    [all servers]# vi etc/sysconfig/pgpool
                                                                                                    ...
                                                                                                    OPTS=" -D -n"


                                                                                                      

                                                                                                    3.8 Pgpool日志配置


                                                                                                      [all servers]#vi etc/pgpool-II/pgpool.conf


                                                                                                      log_destination = 'syslog'
                                                                                                      syslog_facility = 'LOCAL1'
                                                                                                      # Syslog local facility. Default to LOCAL0


                                                                                                         

                                                                                                        创建pgpool日志文件


                                                                                                        [all servers]# mkdir var/log/pgpool-II
                                                                                                        [all servers]# touch var/log/pgpool-II/pgpool.log


                                                                                                           

                                                                                                          修改rsyslog配置


                                                                                                          [all servers]# vi /etc/rsyslog.conf
                                                                                                          ...
                                                                                                          *.info;mail.none;authpriv.none;cron.none;LOCAL1.none /var/log/messages
                                                                                                          LOCAL1.* /var/log/pgpool-II/pgpool.log


                                                                                                             

                                                                                                            配置日志切割


                                                                                                            [all servers]# vi /etc/logrotate.d/syslog
                                                                                                            ...
                                                                                                            /var/log/messages
                                                                                                            #加入pgpool.log
                                                                                                            /var/log/pgpool-II/pgpool.log
                                                                                                            /var/log/secure


                                                                                                               

                                                                                                              重启rsyslog 服务

                                                                                                              [all servers]# systemctl restart rsyslog

                                                                                                               

                                                                                                              3.9 PCP 配置


                                                                                                                [all servers]# echo 'pgpool:'`pg_md5 PCP passowrd` >> /etc/pgpool-II/pcp.conf

                                                                                                                 (与前面3.2第二点的密码保持一致)

                                                                                                                  以下是参考用例

                                                                                                                  [all servers]#echo 'pgpool:'`pg_md5 pgpool` >> /etc/pgpool-II/pcp.conf

                                                                                                                    

                                                                                                                    配置.pcppass

                                                                                                                    实现follow_master_command无密码PCP命令。注意是在root用户下执行


                                                                                                                    [all servers]# echo 'localhost:9898:pgpool:pgpool' > ~/.pcppass
                                                                                                                    [all servers]# chmod 600 ~/.pcppass


                                                                                                                     

                                                                                                                      至此,pgpool-II的配置完成了

                                                                                                                     

                                                                                                                    4启动集群

                                                                                                                    4.1启动pgpool-IIpostgresql服务

                                                                                                                      所有服务器上执行:

                                                                                                                      [all servers]# systemctl start pgpool.service

                                                                                                                       

                                                                                                                        在server1上执行:


                                                                                                                        [server1]# systemctl start postgresql-12.service


                                                                                                                          确认虚拟ip已经启用

                                                                                                                          #ping 10.8.0.54

                                                                                                                           

                                                                                                                          4.2启动PostgreSQL standby 服务器

                                                                                                                           

                                                                                                                            在server2上执行操作

                                                                                                                            [server2]# pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 1
                                                                                                                            Password:
                                                                                                                            pcp_recovery_node -- Command Successful


                                                                                                                              -n后面的参数对应的关系是:      

                                                                                                                               0 server1

                                                                                                                               1 server2

                                                                                                                               2 server3


                                                                                                                             

                                                                                                                              在server3上执行操作

                                                                                                                             


                                                                                                                              [server3]# pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 2
                                                                                                                              Password:
                                                                                                                              pcp_recovery_node -- Command Successful


                                                                                                                                 

                                                                                                                                确认server2server3已经启动为standby节点

                                                                                                                                (任一服务器)


                                                                                                                                [server*]# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"
                                                                                                                                Password for user pgpool
                                                                                                                                node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
                                                                                                                                ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
                                                                                                                                0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2020-05-22 21:44:07
                                                                                                                                1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2020-05-22 21:29:47
                                                                                                                                2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-22 21:44:45
                                                                                                                                (3 rows)


                                                                                                                                   

                                                                                                                                4.3测试pgpool高可用 watchdog

                                                                                                                                  查看 watchdog状态

                                                                                                                                 

                                                                                                                                     # pcp_watchdog_info -h 10.8.0.54 -p 9898 -U pgpool
                                                                                                                                  Password:
                                                                                                                                  3 YES server1:9999 Linux server1 server1


                                                                                                                                  server1:9999 Linux server1 server1 9999 9000 4 MASTER
                                                                                                                                  server2:9999 Linux server2 server2 9999 9000 7 STANDBY
                                                                                                                                  server3:9999 Linux server3 server3 9999 9000 7 STANDBY


                                                                                                                                        可以看到目前集群的主节点为server1  

                                                                                                                                   

                                                                                                                                    关闭server1pgpool服务,观察server2server3是否接替工作(注意,pgpool集群至少    要有两台正常工作,虚拟ip才会启动

                                                                                                                                      

                                                                                                                                    [server1]# systemctl stop pgpool.service
                                                                                                                                    # pcp_watchdog_info -p 9898 -h 10.8.0.54 -U pgpool
                                                                                                                                    Password:
                                                                                                                                    3 YES server2:9999 Linux server2 server2


                                                                                                                                    server2:9999 Linux server2 server2 9999 9000 4 MASTER #server2 成为MASTER
                                                                                                                                        server1:9999 Linux server1 server1 9999 9000 10 SHUTDOWN  #server1 状态为关闭    
                                                                                                                                    server3:9999 Linux server3 server3 9999 9000 7 STANDBY #server3 仍然是standby


                                                                                                                                     

                                                                                                                                      启动server1上的pgpool服务    


                                                                                                                                      [server1]# systemctl start pgpool.service
                                                                                                                                      [server1]# pcp_watchdog_info -p 9898 -h 10.8.0.54 -U pgpool
                                                                                                                                      Password:
                                                                                                                                      3 YES server2:9999 Linux server2 server2


                                                                                                                                      server2:9999 Linux server2 server2 9999 9000 4 MASTER
                                                                                                                                      server1:9999 Linux server1 server1 9999 9000 7 STANDBY
                                                                                                                                      server3:9999 Linux server3 server3 9999 9000 7 STANDBY


                                                                                                                                        可以看到server1变成了standby状态   

                                                                                                                                       

                                                                                                                                      4.4数据库故障转移测试

                                                                                                                                        查看数据库后端信息

                                                                                                                                        

                                                                                                                                         [server*]# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"
                                                                                                                                        Password for user pgpool:
                                                                                                                                        node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
                                                                                                                                        ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
                                                                                                                                        0 | server1 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2020-05-22 21:29:47
                                                                                                                                        1 | server2 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2020-05-22 21:44:07
                                                                                                                                        2 | server3 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-22 21:44:45
                                                                                                                                        (3 rows)


                                                                                                                                           

                                                                                                                                          停止主节点PostgreSQL server服务并验证故障转移


                                                                                                                                          [server1]$ /usr/pgsql-12/bin/pg_ctl -D /var/lib/pgsql/12/data -m immediate stop


                                                                                                                                             

                                                                                                                                            查看集群信息


                                                                                                                                                # psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"
                                                                                                                                            Password for user pgpool:
                                                                                                                                            node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
                                                                                                                                            ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
                                                                                                                                            0 | server1 | 5432 | down | 0.333333 | standby | 0 | false | 0 | | | 2020-05-22 21:36:03
                                                                                                                                            1 | server2 | 5432 | up | 0.333333 | primary | 0 | true | 0 | | | 2020-05-22 21:36:03
                                                                                                                                            2 | server3 | 5432 | down | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-05-22 21:36:15
                                                                                                                                            (3 rows)


                                                                                                                                                   可以看到 server 2 成了新主节点。Server3也跟着down掉了 

                                                                                                                                               

                                                                                                                                            4.5在线恢复down掉的节点


                                                                                                                                              [server1] # pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 0
                                                                                                                                              Password:
                                                                                                                                              pcp_recovery_node -- Command Successful


                                                                                                                                              [server3]# pcp_recovery_node -h 10.8.0.54 -p 9898 -U pgpool -n 2
                                                                                                                                              Password:
                                                                                                                                              pcp_recovery_node -- Command Successful


                                                                                                                                              Then verify that server1 is started as a standby.
                                                                                                                                              [server*]# psql -h 10.8.0.54 -p 9999 -U pgpool postgres -c "show pool_nodes"
                                                                                                                                              Password for user pgpool:
                                                                                                                                              node_id | hostname | port | status | lb_weight | role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
                                                                                                                                              ---------+----------+------+--------+-----------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
                                                                                                                                              0 | server1 | 5432 | up | 0.333333 | standby | 0 | false | 0 | streaming | async | 2020-06-02 18:26:06
                                                                                                                                              1 | server2 | 5432 | up | 0.333333 | primary | 0 | false | 0 | | | 2020-06-02 17:12:46
                                                                                                                                              2 | server3 | 5432 | up | 0.333333 | standby | 0 | true | 0 | streaming | async | 2020-06-02 18:26:06
                                                                                                                                              (3 rows)


                                                                                                                                                 


                                                                                                                                              文章转载自da哥的运维经验说,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                                                                                                                              评论