PostgreSQL热备之流复制环境搭建以及常见问题处理

IT那活儿 2020-08-12

4357

[

简介

]

本文主要是介绍PostgreSQL数据库的热备技术，PostgreSQL数据库提供了类似Oracle的standby数据库的功能。PostgreSQL日志传送的方法有两种，一种是基于文件（base-file）的传送方式，一种是流复制（streamingreplication）的方式。基于文件（base-file）的传送方式，这种方式是PostgreSQL9.0之前就提供的方法。也就是服务器写完一个WAL日志文件后，才把WAL日志文件拷贝到standby数据库上去应用。流复制的方式是PostgreSQL提供的一种服务器间的数据复制方式。这是PostgreSQL9.0才提供的新方法。这个方法就是事务提交后，主服务器则在WAL记录产生时即将它们以流式传送给后备服务器而不必等到WAL文件被填充。就是把生成的日志异步的传送到standby数据库上应用，这比基本文件的日志传送方法有更低的数据延迟。

[

环境准备

]

1. 服务器两台：

服务器	IP
主数据库服务器	192.168.88.191
备数据库服务器	192.168.88.192

操作系统版本：CentOS-7-x86_64

软件：PostgreSQL10.10

安装目录是：/db/pgsql/

数据目录是：/db/pgsql_data/

注意：实施之前两台机器上都安装好了postgresql数据库

2. 确保操作系统防火墙已关闭

systemctl stopfirewalld

systemctl disablefirewalld

[

搭建PostgreSQL 主备环境

]

1)在主节点上的操作

切换用户，并执行启动服务命令：

su – postgres

pg_ctl start -D $PGDATA 启动服务。

创建用于流复制的用户。执行命令：

psql -h 127.0.0.1 -p 5432 -U postgres

进入PostgreSQL数据库，并执行如下语句创建用户：

create user repuser with login replication password '123456';

修改pg_hba.conf文件，添加如下内容，允许两台计算机上的复制用户和超级用户登录：

host replication repuser 192.168.88.191/32 md5

host replication repuser 192.168.88.192/32 md5

host all postgres 192.168.88.191/32 trust

host all postgres 192.168.88.192/32 trust

在主节点的postgresql.conf 中设置这些参数：

max_wal_senders = 10

wal_level = replica

wal_log_hints = on

wal_keep_segments = 10

wal_receiver_status_interval = 5s

hot_standby_feedback = on

参数含义：

max_wal_senders表示来自后备服务器或流式基础备份客户端的并发连接的最大数量；

wal_level表示日志级别，对于流复制，它的值应设置为replica；

wal_log_hints =on表示，在PostgreSQL服务器一个检查点之后页面被第一次修改期间，把该磁盘页面的整个内容都写入WAL，即使对所谓的提示位做非关键修改也会这样做；

wal_keep_segments指定在后备服务器需要为流复制获取日志段文件的情况下，pg_wal（PostgreSQL9.6 以下版本的是pg_xlog）目录下所能保留的过去日志文件段的最小数目;

log_connections表示是否在日志中记录客户端对服务器的连接；

wal_receiver_status_interval指定在后备机上的 WAL接收者进程向主服务器或上游后备机发送有关复制进度的信息的最小周期；

hot_standby_feedback指定一个热后备机是否将会向主服务器或上游后备机发送有关于后备机上当前正被执行的查询的反馈，这里设置为on。

关于详细内容，可以参考postgresql官方文档。

重启主节点：

pg_ctl restart -D $PGDATA

重启之后，为主服务器和后备服务器创建复制槽(该步非必做项）：

select * frompg_create_physical_replication_slot('postgresql_node191');

select * frompg_create_physical_replication_slot('postgresql_node192');

创建后查询复制槽

select * from pg_replication_slots;

复制槽（replicationslot）的作用是：

1.在流复制中，当一个备节点断开连接时，备节点通过hot_standby_feedback提供反馈数据数据会丢失。当备节点重新连接时，它可能因为被主节点发送清理记录而引发查询冲突。复制槽即使在备节点断开时仍然会记录下备节点的xmin（复制槽要需要数据库保留的最旧事务ID）值，从而确保不会有清理冲突。

2.当一个备节点断开连接时，备节点需要的WAL文件信息也丢失了。如果没有复制槽，当备节点重连时，可能已经丢弃了所需要的WAL文件，因此需要完全重建备节点。而复制槽确保这个节点保留所有下游节点需要的WAL文件。

要配置slave使用这个槽，在后备机的recovery.conf中应该配置primary_slot_name，如下：

$ vi $PGDATA/recovery.conf

primary_slot_name = ' postgresql_node191'

standby_mode = 'on'

recovery_target_timeline = 'latest'

primary_conninfo = 'user=replicator password=1qaz2wsxhost=192.168.88.191 port=5432 application_name= postgresql_node192'

trigger_file = '/tmp/postgresql.trigger.5432'

---删除复制槽

slave在使用primary_slot_name 参数时是无法删除replication slots

postgres=# SELECT * FROM pg_drop_replication_slot('postgresql_node191');

postgres=# SELECT * FROM pg_drop_replication_slot('postgresql_node192');

2）在备节点上的操作

确保服务是停止的：

su - postgres

切换用户，并执行：

pg_ctl stop -D $PGDATA

关闭服务。

首先删除备节点中的数据目录$PGDATA 中的文件：

cd $PGDATA

rm –rf *

然后执行：

pg_basebackup -Xs -d "hostaddr=192.168.88.191 port=5432user=repuser password=123456" -D $PGDATA -v -Fp

这里,-Xs表示复制方式是流式的（stream），这种方式不会复制在此次备份开始前，已经归档完成的WAL文件；-d后面是一个连接字符串，其中“hostaddr=192.168.88.191”表示主服务器的ip地址是192.168.88.191，“port=5432”表示数据库的端口是5432，“user=repuser”表示用于流复制的用户是repuser，“password=123456”表示密码是123456；“-D$PGDATA”表示将备份内容输入到本地的$PGDATA 目录；“-v”表示打印详细信息，–Fp表示复制结果输出位普通（plain）文件。

如果遇到以下报错：

pg_basebackup: could not connect to server: could not connect toserver: No route to host

Is the server running on host "192.168.88.191" andaccepting

TCP/IP connections on port 5432?

pg_basebackup: removing contents of data directory "/db/pgsql_data"

用root清除下防火墙:

sudo iptables -F

基础备份完成后，修改备节点的postgresql.conf 文件设置：

hot_standby = on

将/db/pgsql/share/ 中的recovery.conf.sample 拷贝到$PGDATA 下，重命名为recovery.conf：

cp db/pgsql/share/recovery.conf.sample $PGDATA/recovery.conf

并设置如下参数：

recovery_target_timeline = 'latest'

standby_mode = on

primary_conninfo = 'host=192.168.88.191 port=5432 user=repuserpassword=repuser123'

primary_slot_name = 'postgresql_node122'

trigger_file = 'tgfile'

上述参数含义如下：

recovery_target_timeline表示恢复到数据库时间线的上的什么时间点，这里设置为latest，即最新；

standby_mode表示是否将PostgreSQL服务器作为一个后备服务器启动，这里设置为on，即后备模式；

primary_conninfo指定后备服务器用来连接主服务器的连接字符串，其中“host=192.168.88.191”表示主服务器的ip地址是192.168.88.191，“port=5432”表示数据库的端口是5432，“user=repuser”表示用于流复制的用户是repuser，“password=123456”表示密码是123456；

primary_slot_name指定通过流复制连接到主服务器时使用一个现有的复制槽来控制上游节点上的资源移除。这里我们指定之前创建的postgresql_node122。如果没有在主服务器上创建复制槽，则不配置此参数；

trigger_file指定一个触发器文件，该文件的存在会结束后备机中的恢复，使它成为主机。

启动备节点服务：

pg_ctl start -D $PGDATA

pg_ctl stop -D $PGDATA

问题解决：

[postgres@postgresql2 pgsql_data]$ pg_ctl start -D $PGDATA

waiting for server to start....2020-06-15 18:22:49.203 EDT [2883]FATAL: data directory "/db/pgsql_data" has group or worldaccess

2020-06-15 18:22:49.203 EDT [2883] DETAIL: Permissions should beu=rwx (0700).

stopped waiting

pg_ctl: could not start server

Examine the log output.

Chmod 700 db/pgsql_data

[

主备环境检测

]

在主节点上创建一个表,并插入数据：

postgres=# create table demo1 (id int, name varchar(20));

CREATE TABLE

postgres=# insert into demo1 (id, name) values (1,'zhangsan');

INSERT 0 1

在备节点上检测：

postgres=# select * from demo1;

id | name

----+------

1 | zhangsan

主节点数据同步到了备机。

同时，在备节点上写数据会失败：

postgres=# insert into demo1 (id, name) values (2,'wangwu');

ERROR: cannot execute INSERT in a read-only transaction

[

主备环境切换

]

启动备节点，使之成为新的主节点：

pg_ctl promote -D $PGDATA

结果是：

waiting for server to promote........ done

server promoted

查看新主节点的状态：

postgres=# pg_controldata | grep cluster

Database cluster state: in production

插入数据后无报错：

postgres=# insert into demo1 (id, name) values (2,'wangwu');

INSERT 0 1

停止旧的主节点：

pg_ctl stop -m fast -D $PGDATA

结果：

waiting for server to shut down.... done

server stopped

在停止的旧主节点上执行恢复数据的操作：

pg_rewind --target-pgdata $PGDATA--source-server='host=192.168.88.192 port=5432 user=postgresdbname=postgres' -P

结果如下：

connected to server

servers diverged at WAL location 0/2B000230 on timeline 4

rewinding from last common checkpoint at 0/2A000098 on timeline 4

reading source file list

reading target file list

reading WAL in target

need to copy 57 MB (total source directory size is 143 MB)

58749/58749 kB (100%) copied

creating backup label and updating control file

syncing target data directory

Done!

表示从新主节点上成功获取WAL日志。

重新配置新备节点的recovery.conf：

recovery_target_timeline = 'latest'

standby_mode = on

primary_conninfo = 'hostaddr=192.168.88.192 port=5432 user=repuserpassword=repuser123'

primary_slot_name = 'postgresql_node121'

在新备节点上执行下面的命令，重新启动该节点：

pg_ctl start -D $PGDATA

在新备节点上验证：insertinto提示为read-only

postgres=# insert into demo1 (id, name) values (3,'lisi');

ERROR: cannot execute INSERT in a read-only transaction

postgresql

文章转载自IT那活儿，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL热备之流复制环境搭建以及常见问题处理

1)在主节点上的操作

2）在备节点上的操作

评论