暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

基于PG 15搭建主备

原创 心在梦在 2023-04-10
1277

基于PG 15搭建主备

 
环境规划如下:

IP 主机名 作用 Port 类型 备注
192.168.11 pg1 主库 5432 写入 对外提供写服务
192.168.12 pg2 备库 5432 对外提供读服务

 

一、环境准备

1.1 申请2台linux机器

略。。
 

1.2 安装pg数据库

安装过程省略。。。

-- 登录主库 [postgresql@pg1 ~]$ psql psql (15.2) Type "help" for help. postgres=# -- 登录从库 [postgresql@pg2 ~]$ psql psql (15.2) Type "help" for help. postgres=#

主库和备库都安装了pg 15.2版本。

二、主库操作

2.1 修改pg_hba.conf

-- 添加replication,用于复制 [postgresql@pg1 ~]$ cat >> /postgresql/pgdata/pg_hba.conf <<"EOF" # add replication host replication all 0.0.0.0/0 md5 EOF

👉 注意:host all里面不包含replication。

2.2 主库配置归档

-- 建议该路径在从库也创建,方便后期主备切换 [postgresql@pg1 ~]$ mkdir -p /postgresql/archive [postgresql@pg1 ~]$ chown -R postgresql:postgresql /postgresql/archive [postgresql@pg1 ~]$ cat >> /postgresql/pgdata/postgresql.conf <<"EOF" wal_level='replica' archive_mode='on' archive_command='test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f' restore_command='cp /postgresql/archive/%f %p' # 指定wal日志发送进程的最大并发连接数 max_wal_senders=10 # 指定日志复制进程保留多少日志量,单位是MB,wal_keep_size = wal_keep_segments * wal_segment_size。旧版本使用wal_keep_segments参数,在PG13中已经取消。 wal_keep_size=512 # 设置流复制主机发送数据包的超时时间 wal_sender_timeout=60s EOF -- 重启PG服务 [postgresql@pg1 ~]$ pg_ctl restart pg_ctl: PID file "/postgresql/pgdata/postmaster.pid" does not exist Is server running? trying to start server anyway waiting for server to start....2023-04-04 03:44:31.354 GMT [11872] LOG: redirecting log output to logging collector process 2023-04-04 03:44:31.354 GMT [11872] HINT: Future log output will appear in directory "pg_log". done server started -- 检查归档参数 [postgresql@pg1 ~]$ psql psql (15.2) Type "help" for help. postgres=# \x Expanded display is on. postgres=# postgres=# select * from pg_settings where name in ('wal_level','archive_mode','archive_command'); -[ RECORD 1 ]---+------------------------------------------------------------------ name | archive_command setting | test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f unit | category | Write-Ahead Log / Archiving short_desc | Sets the shell command that will be called to archive a WAL file. extra_desc | This is used only if "archive_library" is not set. context | sighup vartype | string source | configuration file min_val | max_val | enumvals | boot_val | reset_val | test ! -f /postgresql/archive/%f && cp %p /postgresql/archive/%f sourcefile | /postgresql/pgdata/postgresql.conf sourceline | 10 pending_restart | f -[ RECORD 2 ]---+------------------------------------------------------------------ name | archive_mode setting | on unit | category | Write-Ahead Log / Archiving short_desc | Allows archiving of WAL files using archive_command. extra_desc | context | postmaster vartype | enum source | configuration file min_val | max_val | enumvals | {always,on,off} boot_val | off reset_val | on sourcefile | /postgresql/pgdata/postgresql.conf sourceline | 9 pending_restart | f -[ RECORD 3 ]---+------------------------------------------------------------------ name | wal_level setting | replica unit | category | Write-Ahead Log / Settings short_desc | Sets the level of information written to the WAL. extra_desc | context | postmaster vartype | enum source | configuration file min_val | max_val | enumvals | {minimal,replica,logical} boot_val | replica reset_val | replica sourcefile | /postgresql/pgdata/postgresql.conf sourceline | 8 pending_restart | f postgres=# -- 手动切换归档 postgres=# select pg_switch_wal(); pg_switch_wal --------------- 0/14D2208 (1 row) [root@pg1 ~]# ll /postgresql/archive/ total 16384 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:06 000000010000000000000001

归档配置完成,可以正常归档wal日志。

2.3 主库创建复制用户

postgres=# create user repl encrypted password 'repl' replication; CREATE ROLE

三、从库操作

3.1 在从库对主库进行备份

[postgresql@pg2 backup]$ pg_basebackup -h 192.168.1.11 -p 5432 -U repl -Fp -Xs -Pv -R -D /home/postgresql/backup Password: pg_basebackup: initiating base backup, waiting for checkpoint to complete WARNING: skipping special file "./.s.PGSQL.5432" pg_basebackup: checkpoint completed pg_basebackup: write-ahead log start point: 0/3000028 on timeline 1 pg_basebackup: starting background WAL receiver pg_basebackup: created temporary replication slot "pg_basebackup_16304" WARNING: skipping special file "./.s.PGSQL.5432" 22312/22312 kB (100%), 1/1 tablespace pg_basebackup: write-ahead log end point: 0/3000100 pg_basebackup: waiting for background process to finish streaming ... pg_basebackup: syncing data to disk ... pg_basebackup: renaming backup_manifest.tmp to backup_manifest pg_basebackup: base backup completed

备份完成,当然,在主库备份,将备份文件拷贝到备库也是可以的,目的是一样的。

注意:在PG12之前,-R备份结束之后会自动生成recovery.conf文件,用来做流复制判断主从同步的信息。但是从PG12开始,这个文件已经不需要了。只需要在参数文件postgresql.conf中配置primary_conninfo参数即可。-R参数会产生文件standby.signal信号文件,且为空文件。

[postgresql@pg2 backup]$ ll standby.signal -rw------- 1 postgresql postgresql 0 Apr 4 13:09 standby.signal

3.2 还原从库

-- 关闭从库 [postgresql@pg2 backup]$ pg_ctl stop waiting for server to shut down.... done server stopped -- 删除从库的数据文件,将备份文件覆盖备库的数据文件 [postgresql@pg2 pgdata]$ rm -rf /postgresql/pgdata/* [postgresql@pg2 pgdata]$ cp -r /home/postgresql/backup/* /postgresql/pgdata/

3.3 修改备库primary_conninfo参数

-- 配置连接主库的ip、端口、用户 [postgresql@pg2 pgdata]$ cat >> /postgresql/pgdata/postgresql.conf <<"EOF" primary_conninfo = 'host=192.168.1.11 port=5432 user=repl password=repl' EOF -- 可选参数 archive_cleanup_command = 'pg_archivecleanup /mnt/server/archiverdir %r' hot_standby_feedback=on

注释:

  • %r 代表最后一个有效的 restart point 的wal file。该wal file 是最早一个必须保留的文件,以便允许 restore 操作可以被 restart,所有早于 %r 的wal日志文件可以被安全的清理掉。

  • restart point 是一个 point ,该 point 用于 standby server 重启 recovery 操作。

  • pg_archivecleanup 命令常被用在standby 配置的 archive_cleanup_command 参数中。

3.4 启动从库

[postgresql@pg2 pgdata]$ pg_ctl start
waiting for server to start....2023-04-04 05:14:35.251 GMT [16365] LOG:  redirecting log output to logging collector process
2023-04-04 05:14:35.251 GMT [16365] HINT:  Future log output will appear in directory "pg_log".
 done
server started

主库进程:

[postgresql@pg1 ~]$ ps -ef|grep post root 280 1 0 11:12 ? 00:00:00 /usr/libexec/postfix/master -w postfix 282 280 0 11:12 ? 00:00:00 qmgr -l -t unix -u root 1003 393 0 11:23 pts/0 00:00:00 su - postgresql postgre+ 1005 1003 0 11:23 pts/0 00:00:00 -bash postgre+ 11872 1 0 11:44 ? 00:00:00 /postgresql/pg15/bin/postgres postgre+ 11873 11872 0 11:44 ? 00:00:00 postgres: logger postgre+ 11874 11872 0 11:44 ? 00:00:00 postgres: checkpointer postgre+ 11875 11872 0 11:44 ? 00:00:00 postgres: background writer postgre+ 11877 11872 0 11:44 ? 00:00:00 postgres: walwriter postgre+ 11878 11872 0 11:44 ? 00:00:00 postgres: autovacuum launcher postgre+ 11879 11872 0 11:44 ? 00:00:00 postgres: archiver last was 000000010000000000000003.00000028.backup postgre+ 11880 11872 0 11:44 ? 00:00:00 postgres: logical replication launcher postfix 15366 280 0 12:52 ? 00:00:00 pickup -l -t unix -u postgre+ 16544 11872 0 13:14 ? 00:00:00 postgres: walsender repl 192.168.1.12(52848) streaming 0/4000148 postgre+ 16583 1005 0 13:15 pts/0 00:00:00 ps -ef postgre+ 16584 1005 0 13:15 pts/0 00:00:00 grep --color=auto post

从库进程:

[postgresql@pg2 ~]$ ps -ef|grep post root 255 1 0 11:12 ? 00:00:00 /usr/libexec/postfix/master -w postfix 257 255 0 11:12 ? 00:00:00 qmgr -l -t unix -u root 925 381 0 11:23 pts/0 00:00:00 su - postgresql postgre+ 926 925 0 11:23 pts/0 00:00:00 -bash postfix 15211 255 0 12:52 ? 00:00:00 pickup -l -t unix -u postgre+ 16365 1 0 13:14 ? 00:00:00 /postgresql/pg15/bin/postgres postgre+ 16366 16365 0 13:14 ? 00:00:00 postgres: logger postgre+ 16367 16365 0 13:14 ? 00:00:00 postgres: checkpointer postgre+ 16368 16365 0 13:14 ? 00:00:00 postgres: background writer postgre+ 16369 16365 0 13:14 ? 00:00:00 postgres: startup recovering 000000010000000000000004 postgre+ 16374 16365 0 13:14 ? 00:00:00 postgres: walreceiver streaming 0/4000110 postgre+ 16400 926 0 13:15 pts/0 00:00:00 ps -ef postgre+ 16401 926 0 13:15 pts/0 00:00:00 grep --color=auto post

pg流复制进程:

  • 主库:wal sender进程将主库最新LSN到备库最新的LSN之间的wal 传递给备库。

  • 备库:wal receiver进程接收wal sender传递过来的WAL数据并写入WAL日志。另外,如果启用了hot_standby_feedback参数,备库会定期(wal_receiver_status_interval)向主库发送xmin信息,用以保证主库不会vacuum掉备库需要的元组信息;如果没有开启hot_standby_feedback的时候,主库在做vacuum的时候,不能感知备库的查询。

  • 备库:startup实例恢复进程,将wal日志在备库上重放。

3.5 查询复制状态

3.5.1 主库查询复制状态

postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 16544 usesysid | 16389 usename | repl application_name | walreceiver client_addr | 192.168.1.12 client_hostname | client_port | 52848 backend_start | 2023-04-04 05:14:35.280968+00 backend_xmin | state | streaming sent_lsn | 0/4000148 write_lsn | 0/4000148 flush_lsn | 0/4000148 replay_lsn | 0/4000148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async -- 默认为异步复制模式 reply_time | 2023-04-04 05:17:18.005466+00

pg_stat_replication是一个视图,主要用于监控PG流复制情况。在这个系统视图中每个记录只代表一个slave。因此,可以看到谁处于连接状态,在做什么任务。

3.5.2 从库查询wal日志接收状态

postgres=# \x Expanded display is on. postgres=# select * from pg_stat_wal_receiver; -[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ pid | 16374 status | streaming receive_start_lsn | 0/4000000 receive_start_tli | 1 written_lsn | 0/4000148 flushed_lsn | 0/4000148 received_tli | 1 last_msg_send_time | 2023-04-04 05:17:37.929886+00 last_msg_receipt_time | 2023-04-04 05:17:37.92991+00 latest_end_lsn | 0/4000148 latest_end_time | 2023-04-04 05:15:07.716782+00 slot_name | sender_host | 192.168.1.11 sender_port | 5432 conninfo | user=repl password=******** channel_binding=disable dbname=replication host=192.168.1.11 port=5432 fallback_application_name=walreceiver sslmode=disable sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any
  • 也可以查看pg_is_in_recovery,主库是f代表false ;备库是t,代表true
-- 主库 postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row) -- 备库 postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)

四、测试主从复制

4.1 主库创建测试表

postgres-# create database testdb; CREATE DATABASE postgres=# postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# create table test1 as select * from pg_class; SELECT 410 testdb=# select count(*) from test1; count ------- 410 (1 row)

4.2 备库查询数据

postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# testdb=# select count(*) from test1; count ------- 410 (1 row) -- 可以看到,备库是只读状态,不可以做写操作 testdb=# delete from test1; ERROR: cannot execute DELETE in a read-only transaction

另外,测试发现,虽然备库也配置了归档模式,但是实际中备库并没有归档wal日志。

-- 主库正常归档 [postgresql@pg1 ~]$ ll /postgresql/archive total 114692 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:06 000000010000000000000001 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:09 000000010000000000000002 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:09 000000010000000000000003 -rw------- 1 postgresql postgresql 338 Apr 4 13:09 000000010000000000000003.00000028.backup -rw------- 1 postgresql postgresql 16777216 Apr 4 13:26 000000010000000000000004 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:33 000000010000000000000005 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:38 000000010000000000000006 -rw------- 1 postgresql postgresql 16777216 Apr 4 13:38 000000010000000000000007 -- 备库并没有归档 [postgresql@pg2 ~]$ ll /postgresql/pgdata/pg_wal/archive_status/ total 0 -rw------- 1 postgresql postgresql 0 Apr 4 13:26 000000010000000000000004.done -rw------- 1 postgresql postgresql 0 Apr 4 13:33 000000010000000000000005.done -rw------- 1 postgresql postgresql 0 Apr 4 13:38 000000010000000000000006.done -rw------- 1 postgresql postgresql 0 Apr 4 13:38 000000010000000000000007.done [postgresql@pg2 ~]$ [postgresql@pg2 ~]$ ll /postgresql/archive/ total 0

五、主从Switchover切换

在保证主从数据一致的情况下切换。一定先从主库操作,这样能够保证切换过程中应用无法写入新的数据。

1、在当前主库操作

-- 停止PG服务 [postgresql@pg1 ~]$ pg_ctl stop waiting for server to shut down.... done server stopped -- 创建standby.signal文件 [postgresql@pg1 ~]$ touch /postgresql/pgdata/standby.signal -- 修改postgresql.conf文件 [postgresql@pg1 ~]$ cat >> /postgresql/pgdata/postgresql.conf <<"EOF" primary_conninfo = 'host=192.168.1.12 port=5432 user=repl password=repl' EOF -- 启动PG服务 [postgresql@pg1 ~]$ pg_ctl start waiting for server to start....2023-04-04 06:01:21.717 GMT [19035] LOG: redirecting log output to logging collector process 2023-04-04 06:01:21.717 GMT [19035] HINT: Future log output will appear in directory "pg_log". done server started postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- t (1 row)

结论:可以看到,此时2个数据库都是备库角色,都是只读状态,无法做写操作。

2、在当前备库操作

-- 停止PG服务 [postgresql@pg2 ~]$ pg_ctl stop waiting for server to shut down.... done server stopped -- 删除standby.signal文件 [postgresql@pg2 ~]$ rm -rf /postgresql/pgdata/standby.signal -- 修改postgresql.conf,将primary_conninfo这一行删除,或者注释 [postgresql@pg2 ~]$ sed -i 's/primary_conninfo/#primary_conninfo/g' /postgresql/pgdata/postgresql.conf [postgresql@pg2 ~]$ cat /postgresql/pgdata/postgresql.conf|grep primary_conninfo #primary_conninfo = 'host=192.168.1.11 port=5432 user=repl password=repl' -- 启动PG服务 [postgresql@pg2 ~]$ pg_ctl start waiting for server to start....2023-04-04 06:04:41.041 GMT [19040] LOG: redirecting log output to logging collector process 2023-04-04 06:04:41.041 GMT [19040] HINT: Future log output will appear in directory "pg_log". done server started [postgresql@pg2 ~]$ psql psql (15.2) Type "help" for help. postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)

结论:可以看到,原备库已经切换为主库,读写状态。

3. 查询主从状态

-- 新的主库 postgres=# \x Expanded display is on. postgres=# select * from pg_stat_replication; -[ RECORD 1 ]----+------------------------------ pid | 19050 usesysid | 16389 usename | repl application_name | walreceiver client_addr | 192.168.1.11 client_hostname | client_port | 58432 backend_start | 2023-04-04 06:04:41.691186+00 backend_xmin | state | streaming sent_lsn | 0/B000148 write_lsn | 0/B000148 flush_lsn | 0/B000148 replay_lsn | 0/B000148 write_lag | flush_lag | replay_lag | sync_priority | 0 sync_state | async reply_time | 2023-04-04 06:09:59.950244+00 -- 新的备库 postgres=# \x Expanded display is on. postgres=# postgres=# select * from pg_stat_wal_receiver; -[ RECORD 1 ]---------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ pid | 19270 status | streaming receive_start_lsn | 0/9000000 receive_start_tli | 1 written_lsn | 0/B000148 flushed_lsn | 0/B000148 received_tli | 1 last_msg_send_time | 2023-04-04 06:10:20.025055+00 last_msg_receipt_time | 2023-04-04 06:10:20.025076+00 latest_end_lsn | 0/B000148 latest_end_time | 2023-04-04 06:09:49.947921+00 slot_name | sender_host | 192.168.1.12 sender_port | 5432 conninfo | user=repl password=******** channel_binding=disable dbname=replication host=192.168.1.12 port=5432 fallback_application_name=walreceiver sslmode=disable sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any

4. 测试主从

-- 新的主库创建测试表 postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# testdb=# create table test2 as select * from pg_roles; SELECT 14 testdb=# testdb=# select count(*) from test2; count ------- 14 (1 row) -- 新的备库查询 postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# testdb=# select count(*) from test2; count ------- 14 (1 row)

主从切换成功。另外可以看到,原备库切换为主库之后,也正常归档wal日志了。

[postgresql@pg2 ~]$ ll /postgresql/archive/ total 32768 -rw------- 1 postgresql postgresql 16777216 Apr 4 14:08 000000010000000000000009 -rw------- 1 postgresql postgresql 16777216 Apr 4 14:08 00000001000000000000000A

六、主从Failover切换

主数据库是读写的,备数据库是只读的。当主库出现故障时,我们需要将备库提升为主库进行读写操作。

1)切换后,原主库以从库的身份启动:修复过程类似于重建。

2)切换后,原主库以主库的身份启动,那么就是独立的数据库,原有主备架构不再存在。

  • 从P12开始: 直接执行 select pg_promote(true,60)函数即可完成切换。

6.1 Failover切换过程

1)主库操作:关闭主库,模拟主库故障:

[postgresql@pg2 ~]$ pg_ctl stop
waiting for server to shut down.... done
server stopped

2)备库操作:激活备库为主库:

-- 不关闭主库,也可以执行该命令强制切换为主库角色,切换完成后,主从关系断开 postgres=# select pg_promote(true,60); pg_promote ------------ t (1 row) -- 查询是否切换成功 postgres=# select pg_is_in_recovery(); pg_is_in_recovery ------------------- f (1 row)

结论:备库成功failover成主库,可以正常做读写操作,并且自动删除了standby.signal文件。

[postgresql@pg1 ~]$ ll /postgresql/pgdata/standby.signal ls: cannot access /postgresql/pgdata/standby.signal: No such file or directory postgres=# \c testdb You are now connected to database "testdb" as user "postgres". testdb=# delete from test2; DELETE 14 testdb=# testdb=# select * from test2; rolname | rolsuper | rolinherit | rolcreaterole | rolcreatedb | rolcanlogin | rolreplication | rolconnlimit | rolpassword | rolvali duntil | rolbypassrls | rolconfig | oid ---------+----------+------------+---------------+-------------+-------------+----------------+--------------+-------------+-------- -------+--------------+-----------+----- (0 rows)

6.2 原主库修复后降为备库

过程类似于重建备库,需要重新备份恢复。

七、查看主备复制延迟

方式一:通过WAL延迟时间衡量

-- 在流复制主库执行以下SQL: select pid,usename,client_addr,state,write_lag,flush_lag,replay_lag from pg_stat_replication;

方式二:通过WAL日志应用延迟量衡量

-- 通过流复制备库WAL的应用位置和主库本地WAL写入位置之间的WAL日志量能够准确判断主备延时,在流复制主库执行以下SQL: select pid,usename,client_addr,state, pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) write_delay, pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) flush_delay, pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) replay_dely from pg_stat_replication;
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
2人已赞赏
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论