1、环境说明
| 操作系统 | IP | PG版本 | hostname | ||
|---|---|---|---|---|---|
| 主库 | CentOS Linux release 7.7.1908 | 192.168.91.133 | 12.9 | pg1 | |
| 从库 | CentOS Linux release 7.7.1908 | 192.168.91.134 | 12.9 | pg2 | |
| 从库 | CentOS Linux release 7.7.1908 | 192.168.91.135 | 12.9 | pg3 |
2、前提
主库安装数据库软件并初始化db
从库安装数据库软件
3、主库操作
3.1 主库放开防火墙(注意添加replication)
cat << EOF >> /postgresql/pg12/data/pg_hba.conf
# TYPE DATABASE USER ADDRESS METHOD
local all all trust
host all all 127.0.0.1/32 trust
host all all 0.0.0.0/0 md5
host replication all 0.0.0.0/0 md5
EOF3.2、主库配置归档
-- 修改参数
cat >> /postgresql/pg12/data/postgresql.conf <<"EOF"
wal_level='replica'
archive_mode='on'
archive_command='test ! -f /postgresql/pg12/archive_log/%f && cp %p /postgresql/pg12/archive_log/%f'
max_wal_senders=10
wal_keep_segments=256
wal_sender_timeout=60s
EOF--重启主库
pg_ctl restart -D $PGDATA-- 查询参数
psql
select * from pg_settings where name in ('wal_level','archive_mode','archive_command');
-- 切换归档
select pg_switch_wal();3.3、主库创建复制用户(创建用户需要加上replication选项。)
CREATE ROLE repl LOGIN ENCRYPTED PASSWORD 'repl' REPLICATION;4、 从库操作
4.1、在从库对主库进行备份
su - postgres
postgres@pg2-> pg_basebackup -h192.168.91.133 -p5432 -Urepl -F p -P -R -D $PGDATA
Password:
33703/33703 kB (100%), 1/1 tablespace4.2、启动从库
postgres@pg2-> pg_ctl start
waiting for server to start....2021-11-24 09:14:21.329 CST [8912] LOG: starting PostgreSQL 12.9 on x86_64-pc-linux-gn u, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-11-24 09:14:21.329 CST [8912] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-11-24 09:14:21.329 CST [8912] LOG: listening on IPv6 address "::", port 5432
2021-11-24 09:14:21.330 CST [8912] LOG: listening on Unix socket "/postgresql/pg12/data/.s.PGSQL.5432"
2021-11-24 09:14:21.435 CST [8913] LOG: database system was interrupted; last known up at 2021-11-24 09:14:09 CST
2021-11-24 09:14:21.439 CST [8913] LOG: entering standby mode
2021-11-24 09:14:21.441 CST [8913] LOG: redo starts at 0/7000028
2021-11-24 09:14:21.442 CST [8913] LOG: consistent recovery state reached at 0/7000100
2021-11-24 09:14:21.462 CST [8912] LOG: database system is ready to accept read only connections
2021-11-24 09:14:21.516 CST [8917] LOG: started streaming WAL from primary at 0/8000000 on timeline 1
done
server started5、查询复制状态
-- 主库查看wal日志发送状态
select * from pg_stat_replication;
postgres=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 2451
usesysid | 16384
usename | repl
application_name | walreceiver
client_addr | 192.168.91.134
client_hostname |
client_port | 59098
backend_start | 2021-11-24 09:14:21.499172+08
backend_xmin |
state | streaming
sent_lsn | 0/8000148
write_lsn | 0/8000148
flush_lsn | 0/8000148
replay_lsn | 0/8000148
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2021-11-24 09:57:28.166605+08
-- 从库查看wal日志接收状态
select * from pg_stat_wal_receiver;
postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 8917
status | streaming
receive_start_lsn | 0/8000000
receive_start_tli | 1
received_lsn | 0/8000148
received_tli | 1
last_msg_send_time | 2021-11-24 09:58:18.267709+08
last_msg_receipt_time | 2021-11-24 09:58:18.265728+08
latest_end_lsn | 0/8000148
latest_end_time | 2021-11-24 09:19:14.039943+08
slot_name |
sender_host | 192.168.91.133
sender_port | 5432
conninfo | user=repl password=******** dbname=replication host=192.168.91.133 port=5432 fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any
-- 也可以通过该命令查看
pg_controldata | grep state
postgres@pg1-> pg_controldata | grep state
Database cluster state: in production
postgres@pg2-> pg_controldata | grep state
Database cluster state: in archive recovery
-- 也可以查看这个,主库是f代表false ;备库是t,代表true
select pg_is_in_recovery();
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)pg_stat_replication是一个视图,主要用于监控一个基于流的设置,在这个系统视图中每个记录只代表一个slave。因此,可以看到谁处于连接状态,在做什么任务。pg_stat_replication也是检查slave是否处于连接状态的一个好方法。 每个字段代码的含义:
• pid: 这代表负责流连接的wal_sender进程的进程ID。例如“postgres: walsender replhr 172.72.6.3(40056) streaming 0/7000148”。
• usesysid: 每个内部用户都有一个独一无二的编号。该系统的工作原理很像UNIX。 usesysid 是 (PostgreSQL) 用户连接到系统的唯一标识符。
• usename: (不是用户名, 注意少了 r),它存储与用户相关的 usesysid 的名字。这是客户端放入到连接字符串中的东西。
• application_name:这是同步复制的通常设置。它可以通过连接字符串传递到master。
• client_addr: 它会告诉您流连接从何而来。它拥有客户端的IP地址。
• client_hostname: 除了客户端的IP,您还可以这样做,通过它的主机名来标识客户端。您可以通过master上的postgresql.conf中的log_hostname启用DNS反向查找。
• client_port: 这是客户端用来和WALsender进行通信使用的TPC端口号。 如果不本地UNIX套接字被使用了将显示-1。
• backend_start: 它告诉我们slave什么时间创建了流连接。
• state: 此列告诉我们数据的连接状态。如果事情按计划进行,它应该包含流信息。
• sent_lsn:这代表发送到连接的最后的事务日志的位置。已经通过网络发送了多少WAL?
• write_lsn: 这是写到standby系统磁盘上最后的事务日志位置。已向操作系统发送了多少WAL?( 尚未 flushing)
• flush_lsn: 这是被刷新到standby系统的最后位置。(这里注意写和刷新之间的区别。写并不意味着刷新 。)已经有多少WAL已 flush 到磁盘?
• replay_lsn: 这是slave上重放的最后的事务日志位置。已重放了多少WAL,因此对查询可见?
• sync_priority: 这个字段是唯一和同步复制相关的。每次同步复制将会选择一个优先权 —sync_priority—会告诉您选择了那个优先权。
• sync_state: 最后您会看到slave在哪个状态。这个状态可以是async, sync, or potential。当有一个带有较高优先权的同步slave时,PostgreSQL会把slave 标记为 potential。
人们经常说 pg_stat_replication 视图是primary 端的,这是不对的。该视图的作用是揭示有关wal sender 进程的信息。换句话说:如果你正在运行级联复制,该视图意味着在 secondary 复制到其他slaves 的时候, secondary 端的 pg_stat_replication 上的也会显示entries ( 条目 ) 6、主从切换
6.1、主从Switchover切换
在保证主从数据一致的情况下:
1、在当前主库操作:
1、停止服务
pg_ctl stop
2、以“postgres”用户创建创建standby.signal文件
touch /postgresql/pg12/data/standby.signal
cat >> /postgresql/pg12/data/postgresql.auto.conf <<"EOF"
primary_conninfo = 'user=repl password=repl host=192.168.91.134 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
EOF
3、启动服务
pg_ctl start2、在当前从库操作:
1、停止服务
pg_ctl stop
2、删除“standby.signal”文件
rm -rf /postgresql/pg12/data/standby.signal
3、修改postgresql.conf
sed -i 's/primary_conninfo/#primary_conninfo/g' /postgresql/pg12/data/postgresql.auto.conf
4、启动服务
pg_ctl start6.2、主从Failover切换
主数据库是读写的,备数据库是只读的。当主库出现故障时,我们需要将备库提升为主库进行读写操作。
1)切换后,原主库以从库的身份启动:修复过程类似于重建
2)切换后,原主库以主库的身份启动:
切换方法跟版本有关系:
P12之前,有2种办法:
① pg_ctl promote:执行之后发现recovery.conf变为recovery.done文件,代表 切换成功
② 触发器方式,:备库配置 recovery.conf 文件的 trigger_file 参数,之后在备库主机上创建触发器文件
从P12开始 : 直接执行 select pg_promote(true,60)函数即可完成切换。
pg12开始新增了一个pg_promote()函数,让我们可以通过SQL命令激活备库。
pg_promote()语法:
pg_promote(wait boolean DEFAULT true, wait_seconds integer DEFAULT 60)
两个参数:
wait: 表示是否等待备库的 promotion 完成或者
wait_seconds 秒之后返回成功,默认值为 true。 wait_seconds: 等待时间,单位秒,默认 60
6.2.1、Failover切换过程
1)主库操作:关闭主库,模拟主库故障:
postgres@pg2-> pg_ctl stop
waiting for server to shut down...2021-11-24 10:38:39.444 CST [9036] LOG: received fast shutdown request
.2021-11-24 10:38:39.445 CST [9036] LOG: aborting any active transactions
2021-11-24 10:38:39.445 CST [9036] LOG: background worker "logical replication launcher" (PID 9044) exited with exit code 1
2021-11-24 10:38:39.445 CST [9038] LOG: shutting down
.2021-11-24 10:38:40.579 CST [9036] LOG: database system is shut down
done
server stopped2)备库操作:激活备库为主库:
-- 不关闭主库,也可以执行该命令强制切换为主库角色,切换完成后,主从关系断开
postgres=# select pg_promote(true,60);
2021-11-24 10:38:58.558 CST [2582] LOG: received promote request
2021-11-24 10:38:58.558 CST [2582] LOG: redo done at 0/A000028
2021-11-24 10:38:58.558 CST [2582] LOG: last completed transaction was at log time 2021-11-24 10:18:13.734815+08
2021-11-24 10:38:58.559 CST [2582] LOG: selected new timeline ID: 2
2021-11-24 10:38:58.862 CST [2582] LOG: archive recovery complete
pg_promote
------------
t
(1 row)
-- 查询是否切换成功
select pg_is_in_recovery();
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)6.2.2、原主库修复后降为备库
1、停止服务
pg_ctl stop
2、以“postgres”用户创建创建standby.signal文件
touch /postgresql/pg12/data/standby.signal
cat >> /postgresql/pg12/data/postgresql.auto.conf <<"EOF"
primary_conninfo = 'user=repl password=repl host=192.168.91.133 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
EOF
3、启动服务
pg_ctl start7、新增从节点(扩容节点)
7.1、新增一台从库192.168.91.135,安装PG12.9
7.2、在从库对主库进行备份
[root@pg3 ~]# su - postgres
Last login: Mon Nov 22 20:38:32 CST 2021 on pts/0
postgres@pg3-> pg_basebackup -h192.168.91.133 -p5432 -Urepl -F p -P -R -D $PGDATA
Password:
66081/66081 kB (100%), 1/1 tablespace7.3、启动从库
postgres@pg3-> pg_ctl start
waiting for server to start....2021-11-24 11:06:43.238 CST [8998] LOG: starting PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-11-24 11:06:43.238 CST [8998] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-11-24 11:06:43.238 CST [8998] LOG: listening on IPv6 address "::", port 5432
2021-11-24 11:06:43.239 CST [8998] LOG: listening on Unix socket "/postgresql/pg12/data/.s.PGSQL.5432"
2021-11-24 11:06:43.380 CST [8999] LOG: database system was interrupted; last known up at 2021-11-24 11:05:22 CST
2021-11-24 11:06:43.388 CST [8999] LOG: entering standby mode
2021-11-24 11:06:43.390 CST [8999] LOG: redo starts at 0/B000028
2021-11-24 11:06:43.390 CST [8999] LOG: consistent recovery state reached at 0/B000100
2021-11-24 11:06:43.390 CST [8998] LOG: database system is ready to accept read only connections
2021-11-24 11:06:43.446 CST [9003] LOG: started streaming WAL from primary at 0/C000000 on timeline 2
done
server started7.4、查询复制状态
--主库查看wal日志发送状态
select * from pg_stat_replication;
--从库查看wal日志接收状态
select * from pg_stat_wal_receiver;
-- 也可以通过该名称查看
pg_controldata | grep state
-- 也可以查看这个,主库是f代表false ;备库是t,代表true
select pg_is_in_recovery();8、修改为同步模式
在3个节点都修改如下参数:
postgres@pg1-> cat >> /postgresql/pg12/data/postgresql.conf <<"EOF"
> synchronous_commit='on'
> synchronous_standby_names='*'
> EOF
postgres@pg2-> cat >> /postgresql/pg12/data/postgresql.conf <<"EOF"
> synchronous_commit='on'
> synchronous_standby_names='*'
> EOF
postgres@pg3-> cat >> /postgresql/pg12/data/postgresql.conf <<"EOF"
> synchronous_commit='on'
> synchronous_standby_names='*'
> EOF在3个节点重新加载配置:
postgres@pg1-> pg_ctl reload
server signaled
2021-11-24 11:35:06.413 CST [2581] LOG: received SIGHUP, reloading configuration files
postgres@pg1-> 2021-11-24 11:35:06.413 CST [2581] LOG: parameter "synchronous_standby_names" changed to "*"
2021-11-24 11:35:06.823 CST [2631] LOG: standby "walreceiver" is now a synchronous standby with priority 1
2021-11-24 11:35:06.823 CST [2631] STATEMENT: START_REPLICATION 0/A000000 TIMELINE 2
postgres@pg2-> pg_ctl reload
server signaled
2021-11-24 11:35:03.595 CST [9102] LOG: received SIGHUP, reloading configuration files
postgres@pg2-> 2021-11-24 11:35:03.596 CST [9102] LOG: parameter "synchronous_standby_names" changed to "*"
postgres@pg3-> pg_ctl reload
server signaled
2021-11-24 11:34:53.355 CST [8998] LOG: received SIGHUP, reloading configuration files
postgres@pg3-> 2021-11-24 11:34:53.355 CST [8998] LOG: parameter "synchronous_standby_names" changed to "*"查询同步模式:
postgres=# select client_addr,state,sync_state from pg_stat_replication;
client_addr | state | sync_state
----------------+-----------+------------
192.168.91.134 | streaming | sync
192.168.91.135 | streaming | potential
(2 rows)若要修改为异步模式,则注销以上参数即可:
sed -i 's|synchronous_commit|#synchronous_commit|g' /postgresql/pg12/data/postgresql.conf
sed -i 's|synchronous_standby_names|#synchronous_standby_names|g' /postgresql/pg12/data/postgresql.conf
pg_ctl reload9、级联流复制
1、关闭pg3上的pg
postgres@pg3-> pg_ctl stop
waiting for server to shut down....2021-11-24 12:55:52.291 CST [8998] LOG: received fast shutdown request
2021-11-24 12:55:52.291 CST [8998] LOG: aborting any active transactions
2021-11-24 12:55:52.291 CST [9003] FATAL: terminating walreceiver process due to administrator command
2021-11-24 12:55:52.292 CST [9000] LOG: shutting down
2021-11-24 12:55:52.294 CST [8998] LOG: database system is shut down
done
server stopped
2、删除实例目录
postgres@pg3-> rm -rf $PGDATA
3、在pg3上对pg2上的从库进行备份
postgres@pg3-> pg_basebackup -h192.168.91.134 -p5432 -Urepl -F p -P -R -D $PGDATA
Password:
WARNING: skipping special file "./.s.PGSQL.5432"
WARNING: skipping special file "./.s.PGSQL.5432"
76875/76875 kB (100%), 1/1 tablespace
4、修改postgresql.auto.conf文件
postgres@pg3-> vi postgresql.auto.conf
postgres@pg3-> more postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'user=repl password=repl host=192.168.91.134 port=5432 sslmode=prefer sslcompression=0 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
5、启动pg3上的从库
postgres@pg3-> pg_ctl start
waiting for server to start....2021-11-24 12:59:42.445 CST [9122] LOG: starting PostgreSQL 12.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-39), 64-bit
2021-11-24 12:59:42.445 CST [9122] LOG: listening on IPv4 address "0.0.0.0", port 5432
2021-11-24 12:59:42.445 CST [9122] LOG: listening on IPv6 address "::", port 5432
2021-11-24 12:59:42.446 CST [9122] LOG: listening on Unix socket "/postgresql/pg12/data/.s.PGSQL.5432"
2021-11-24 12:59:42.554 CST [9123] LOG: database system was interrupted while in recovery at log time 2021-11-24 12:56:22 CST
2021-11-24 12:59:42.554 CST [9123] HINT: If this has occurred more than once some data might be corrupted and you might need to choose an earlier recovery target.
2021-11-24 12:59:42.563 CST [9123] LOG: entering standby mode
2021-11-24 12:59:42.564 CST [9123] LOG: redo starts at 0/C000C20
2021-11-24 12:59:42.565 CST [9123] LOG: consistent recovery state reached at 0/C001548
2021-11-24 12:59:42.565 CST [9123] LOG: invalid record length at 0/C001548: wanted 24, got 0
2021-11-24 12:59:42.565 CST [9122] LOG: database system is ready to accept read only connections
2021-11-24 12:59:42.570 CST [9127] LOG: started streaming WAL from primary at 0/C000000 on timeline 2
done
server started「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




