一分钟掌握 PostgreSQL 时点恢复

原创数据蛙 All China Database Union 2024-11-08

738

前言

在前面的文章介绍过如何进行数据库的全库或指定表的备份的与恢复【快速掌握 PostgreSQL 数据加载与备份】，但在实际生产环境中，由于数据库的规模比较大，全库备份需要的时间太长，指定表备份也不能预测过具体哪些表需要定时备份，使用增量备份与恢复可以帮助企业实现高数据安全和可靠的备份方案。

PITR 与 WAL

PostgreSQL的增量备份和恢复技术 PITR（Point-in-Time Recovery，时点恢复），是一种基于日志的备份恢复机制，能够将数据库恢复到指定的时间点。PITR 通常用于数据灾难恢复，或是为了防止因人为的误操作造成的数据丢失。其中 PITR 包含两个主要的操作：基础备份和 WAL（Write-Ahead Logging）日志备份。

什么是 WAL 日志？有接触过 Oracle 的应该都很清楚，其实类似 Oracle 里的 archive log 归档日志。在 PostgreSQL 中，WAL 日志记录机制类似于“先写日志后执行”，在进行任何数据更改之前，数据库会将操作记录到 WAL 日志文件中。这些日志文件记录了每个事务的增量更改，确保数据能够在断电或故障后恢复。我们在安装完 PostgreSQL 后，归档 WAL 日志默认是关闭的，这可以在当前正在进程中查看，如果是启动归档后台会对应单独有一个进程。

WAL 日志开启好处：

实现增量数据备份：归档 WAL 日志可记录数据库的增量变化，实现增量备份，减少每次全库备份的存储空间，同时提高备份效率。
实现数据恢复：可以利用WAL日志重新执行未完成的事务，从而恢复到最近一次提交的状态。
高可用：增强故障恢复能力、确保数据一致性，实现流复制和同步复制时配合使用归档日志。

归档模式配置

首先检查数据库里的归档日志是否有开启，可通过 show archive_mode 命令查看。

# 检查是否已经开启归档模式，off 表示未开启
postgres=# show archive_mode ;
 archive_mode
--------------
 off
(1 row)

创建归档目录，archivelog 文件夹的用户与用户组应为 postgres，这里具体创建目录为 /usr/postgres/16.1/data/archivelog：

[postgres@pca001 data]$ mkdir archivelog

开启归档日志：
在配置文件 postgresql.conf 中配置以下参数，配置完成后需要重启数据库服务：

archive_mode = on
wal_level = replica
archive_command = 'test ! -f /usr/postgres/16.1/data/archivelog/%f && cp %p /usr/postgres/16.1/data/archivelog/%f'

# 重启服务
[postgres@pca001 ~]$ pg_ctl -l logfile restart
waiting for server to shut down.... done
server stopped
waiting for server to start.... done
server started

# 开启归档日志后，查看正在执行的进程，可以看到新增了 archiver 进程
[postgres@pca001 ~]$ ps -ef|grep postgres
root        1705    1585  0 15:02 pts/0    00:00:00 su - postgres
postgres    1706    1705  0 15:02 pts/0    00:00:00 -bash
postgres    1838    1706  0 15:02 pts/0    00:00:00 psql
root        2855    2741  0 15:05 pts/1    00:00:00 su - postgres
postgres    2856    2855  0 15:05 pts/1    00:00:00 -bash
postgres    7415       1  0 15:17 ?        00:00:00 /usr/postgres/16.1/bin/postgres
postgres    7416    7415  0 15:17 ?        00:00:00 postgres: checkpointer
postgres    7417    7415  0 15:17 ?        00:00:00 postgres: background writer
postgres    7419    7415  0 15:17 ?        00:00:00 postgres: walwriter
postgres    7420    7415  0 15:17 ?        00:00:00 postgres: autovacuum launcher
postgres    7421    7415  0 15:17 ?        00:00:00 postgres: archiver
postgres    7422    7415  0 15:17 ?        00:00:00 postgres: logical replication launcher
postgres    8316    2856  0 15:20 pts/1    00:00:00 ps -ef
postgres    8317    2856  0 15:20 pts/1    00:00:00 grep --color=auto postgres

基础备份

创建测试表并生成数据，再手动强制进行日志切换。

# 创建测试表，并新增测试数据
postgres=# create table test_book(id int,name varchar(20),create_time timestamp);
CREATE TABLE
postgres=# insert into test_book values(1,'mysql',current_timestamp(0));
INSERT 0 1
postgres=# insert into test_book values(2,'postgresql',current_timestamp(0));
INSERT 0 1
postgres=# insert into test_book values(3,'oracle',current_timestamp(0));
INSERT 0 1

postgres=# select * from test_book;
 id |    name    |     create_time
----+------------+---------------------
  1 | mysql      | 2024-11-08 17:27:44
  2 | postgresql | 2024-11-08 17:28:03
  3 | oracle     | 2024-11-08 17:28:11
(3 rows)

# 由于需要生成增量归档日志，这里先手动强制切换日志
postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/380492D8
(1 row)

日志切换后，可以通过 pg_basebackup 工具进行一次基础备份，备份的输出目录为：/soft/base_bak，其中：-Ft 为设置备份文件格式为 tar 格式。备份完成后，模拟数据库出现故障，并数据文件丢失。更多 pg_basebackup 参数可通过 pg_basebackup --help 查看。具体操作步骤如下：

# 执行一次基础备份 pg_basebackup
[postgres@pca001 ~]$ pg_basebackup -Ft -D /soft/base_bak
[postgres@pca001 pg_wal]$ cd /soft/base_bak
[postgres@pca001 base_bak]$ ll
总用量 351628
-rw-------. 1 postgres postgres    143588 11月  8 17:33 backup_manifest
-rw-------. 1 postgres postgres 343136256 11月  8 17:33 base.tar
-rw-------. 1 postgres postgres  16778752 11月  8 17:33 pg_wal.tar

[postgres@pca001 ~]$ pg_ctl stop
waiting for server to shut down.... done
server stopped

[postgres@pca001 ~]$ rm -rf /usr/postgres/16.1/data/*
[postgres@pca001 ~]$ ls /usr/postgres/16.1/data/

数据恢复

进入上面备份的数据目录 /soft/base_bak，将 base.tar 与 pg_wal.tar 解压到对应的目录，解压完成后，修改 postgresql.conf 文件中的 restore_command 对应的参数值，修改完成后再重启服务器，重启完成后可正常进入 psql，并能查询到故障发生前时点的最新数据。

[root@pca001 ~]# tar -xvf /soft/base_bak/base.tar -C /usr/postgres/16.1/data/
[root@pca001 ~]# tar -xvf /soft/base_bak/pg_wal.tar -C /usr/postgres/16.1/data/pg_wal/

[root@pca001 ~]# vim /usr/postgres/16.1/data/postgresql.conf
restore_command = 'cp %p /usr/postgres/16.1/data/archivelog/%f'

[postgres@pca001 ~]$ pg_ctl  start
waiting for server to start....2024-11-08 17:49:22.881 CST [62508] LOG:  starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2), 64-bit
2024-11-08 17:49:22.882 CST [62508] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-11-08 17:49:22.882 CST [62508] LOG:  listening on IPv6 address "::", port 5432
2024-11-08 17:49:22.884 CST [62508] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-11-08 17:49:22.888 CST [62511] LOG:  database system was interrupted; last known up at 2024-11-08 17:33:38 CST
2024-11-08 17:49:22.907 CST [62511] LOG:  redo starts at 0/3A000028
2024-11-08 17:49:22.907 CST [62511] LOG:  consistent recovery state reached at 0/3A000100
2024-11-08 17:49:22.908 CST [62511] LOG:  redo done at 0/3A000100 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2024-11-08 17:49:23.024 CST [62509] LOG:  checkpoint starting: end-of-recovery immediate wait
2024-11-08 17:49:23.031 CST [62509] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.002 s, total=0.008 s; sync files=2, longest=0.001 s, average=0.001 s; distance=16384 kB, estimate=16384 kB; lsn=0/3B000028, redo lsn=0/3B000028
2024-11-08 17:49:23.034 CST [62508] LOG:  database system is ready to accept connections
 done
server started
[postgres@pca001 ~]$ psql
psql (16.1)
Type "help" for help.

postgres=# select * from test_book ;
 id |    name    |     create_time
----+------------+---------------------
  1 | mysql      | 2024-11-08 17:27:44
  2 | postgresql | 2024-11-08 17:28:03
  3 | oracle     | 2024-11-08 17:28:11
(3 rows)

问题与解决

**由于参考的旧版本的是 PostgreSQL 使用 recovery.conf 来恢复，但在 PostgreSQL 12和更高版本上，归档恢复、流复制和PITR使用正常的服务器配置参数进行配置。**这些参数可以在 postgresql.conf 中设置，也可以通过ALTER SYSTEM像其他参数一样设置，如果存在recovery.conf，服务器将不会启动。最后通过在 postgresql.conf 参数文件中配置，问题得以解决。具体问题描述与处理方法如下：

# 重启时出现不支持通过 recovery.conf 方式执行恢复
[postgres@pca001 ~]$ pg_ctl  start
waiting for server to start....2024-11-08 17:46:09.877 CST [61234] LOG:  starting PostgreSQL 16.1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-2), 64-bit
2024-11-08 17:46:09.878 CST [61234] LOG:  listening on IPv4 address "0.0.0.0", port 5432
2024-11-08 17:46:09.878 CST [61234] LOG:  listening on IPv6 address "::", port 5432
2024-11-08 17:46:09.882 CST [61234] LOG:  listening on Unix socket "/tmp/.s.PGSQL.5432"
2024-11-08 17:46:09.885 CST [61237] LOG:  database system was interrupted; last known up at 2024-11-08 17:33:38 CST
2024-11-08 17:46:09.900 CST [61237] FATAL:  using recovery command file "recovery.conf" is not supported
2024-11-08 17:46:09.901 CST [61234] LOG:  startup process (PID 61237) exited with exit code 1
2024-11-08 17:46:09.901 CST [61234] LOG:  aborting startup due to startup process failure
2024-11-08 17:46:09.901 CST [61234] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

# 删除 recovery.conf 文件，将恢复命令参数配置在 postgresql.conf 里
[root@pca001 data]# rm -rf recovery.conf
[root@pca001 data]# vim postgresql.conf

总结

PostgreSQL 的归档日志为数据备份和恢复提供了灵活可靠的手段。配置归档模式，定期进行基础备份和WAL日志备份，可以实现高效的增量备份方案，满足企业对数据安全和灾备的需求。

💡注意：WAL日志文件生成频繁，占用大量存储空间，需要定期管理，防止磁盘空间耗尽，可使用以下方式：

自动清理：控制需要保留的WAL文件数量，超出部分会被自动清理。

周期性备份：使用脚本定期将归档日志备份到其他存储（如NAS、云存储），保障数据的安全性。

清理归档目录：可以配置自动清理脚本，将超出保留时间的归档文件删除，避免归档目录占用过多磁盘空间。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

【PostgreSQL】（共7篇）

postgresql知识