暂无图片
暂无图片
5
暂无图片
暂无图片
暂无图片

Postgresql 17新特性 - 块级别增量备份的实验

原创 ByteHouse 2025-07-07
469

摘要:

PgSQL可通过pg_basebackup进行全量备份。在构建复制关系时,创建备机时需要通过pg_basebackup全量拉取一个备份,形成一个mirror。但很多场景下,我们往往不需要进行全量备份/恢复,数据量特别大的时候,这个代价太大了。GPDB中有个工具gprecoverseg支持全量备份和增量备份。所谓全量备份,主要通过pg_basebackup从其他节点全量拷贝一份数据过来;而增量备份主要通过pg_rewind工具,只拷贝新增的数据。而PgSQL中单独的pg_rewind,仅从分叉点之前最近的checkpoint位置开始解析WAL,解析出变动的数据页,然后仅将变动的数据页拷贝过来。所以,仅靠pg_rewind实现不了完美的增量备份。

1.创建用例表及插入数据

postgres=# CREATE TABLE bigdata (last_updated timestamptz);
CREATE TABLE
postgres=# INSERT INTO bigdata (last_updated) VALUES (now());
INSERT 0 1
postgres=# UPDATE bigdata SET last_updated = now();
UPDATE 1
postgres=# 

2.执行pg_basebackup

[postgres@bigdata ~]$ mkdir /opt/postgresql-17.4/backups
[postgres@bigdata ~]$ pg_basebackup -D /opt/postgresql-17.4/backups
[postgres@bigdata ~]$ cd /opt/postgresql-17.4/backups/
[postgres@bigdata backups]$ ll
total 196
-rw------- 1 postgres postgres    225 Jul  7 10:04 backup_label
-rw------- 1 postgres postgres 137508 Jul  7 10:04 backup_manifest
drwx------ 5 postgres postgres     33 Jul  7 10:04 base
drwx------ 2 postgres postgres   4096 Jul  7 10:04 global
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_commit_ts
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_dynshmem
-rw------- 1 postgres postgres   5711 Jul  7 10:04 pg_hba.conf
-rw------- 1 postgres postgres   2640 Jul  7 10:04 pg_ident.conf
drwx------ 4 postgres postgres     68 Jul  7 10:04 pg_logical
drwx------ 4 postgres postgres     36 Jul  7 10:04 pg_multixact
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_notify
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_replslot
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_serial
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_snapshots
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_stat
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_stat_tmp
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_subtrans
drwx------ 2 postgres postgres     19 Jul  7 10:04 pg_tblspc
drwx------ 2 postgres postgres      6 Jul  7 10:04 pg_twophase
-rw------- 1 postgres postgres      3 Jul  7 10:04 PG_VERSION
drwx------ 4 postgres postgres     77 Jul  7 10:04 pg_wal
drwx------ 2 postgres postgres     18 Jul  7 10:04 pg_xact
-rw------- 1 postgres postgres     88 Jul  7 10:04 postgresql.auto.conf
-rw------- 1 postgres postgres  30715 Jul  7 10:04 postgresql.conf
drwx------ 2 postgres postgres      6 Jul  7 10:04 tbs_tmp
[postgres@bigdata backups]$ 

相对于老版本的pg_basebackup多了backup_mainfest文件。该备份将PGDATA下的内容拷贝到 /opt/postgresql-17.4/backups 下。修改改目录下的 postgresql.conf 的 port 与原数据库服务器不冲突的端口,可以使用该备份直接启动数据库服务。

[postgres@bigdata backups]$ pg_ctl -D /opt/postgresql-17.4/backups/ start
waiting for server to start....2025-07-07 10:09:23.376 CST [1662] FATAL:  data directory "/opt/postgresql-17.4/backups" has invalid permissions
2025-07-07 10:09:23.376 CST [1662] DETAIL:  Permissions should be u=rwx (0700) or u=rwx,g=rx (0750).
 stopped waiting
pg_ctl: could not start server
Examine the log output.
[postgres@bigdata backups]$ 

根据提示修改目录权限

chmod 0700 /opt/postgresql-17.4/backups
chmod 0700 /opt/postgresql-17.4/backups/tbs_tmp

chown -R postgres:postgres /opt/postgresql-17.4/backups

尝试使用备份启动数据库服务

[postgres@bigdata backups]$ pg_ctl -D /opt/postgresql-17.4/backups/ start
waiting for server to start....2025-07-07 10:10:42.068 CST [1672] LOG:  starting PostgreSQL 17.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 11.5.0 20240719 (Red Hat 11.5.0-5), 64-bit
2025-07-07 10:10:42.069 CST [1672] LOG:  listening on IPv6 address "::1", port 15432
2025-07-07 10:10:42.069 CST [1672] LOG:  listening on IPv4 address "127.0.0.1", port 15432
2025-07-07 10:10:42.077 CST [1672] LOG:  listening on Unix socket "/tmp/.s.PGSQL.15432"
2025-07-07 10:10:42.081 CST [1672] LOG:  could not open directory "pg_tblspc/16388/PG_17_202406281": No such file or directory
2025-07-07 10:10:42.086 CST [1675] LOG:  database system was interrupted; last known up at 2025-07-07 10:04:41 CST
2025-07-07 10:10:42.876 CST [1675] LOG:  starting backup recovery with redo LSN 0/C000028, checkpoint LSN 0/C000080, on timeline ID 1
2025-07-07 10:10:42.876 CST [1675] LOG:  could not open directory "pg_tblspc/16388/PG_17_202406281": No such file or directory
2025-07-07 10:10:42.895 CST [1675] LOG:  could not open directory "pg_tblspc/16388/PG_17_202406281": No such file or directory
2025-07-07 10:10:42.899 CST [1675] LOG:  redo starts at 0/C000028
2025-07-07 10:10:42.905 CST [1675] LOG:  completed backup recovery with redo LSN 0/C000028 and end LSN 0/C000120
2025-07-07 10:10:42.906 CST [1675] LOG:  consistent recovery state reached at 0/C000120
2025-07-07 10:10:42.906 CST [1675] LOG:  redo done at 0/C000120 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2025-07-07 10:10:42.907 CST [1675] LOG:  could not open directory "pg_tblspc/16388/PG_17_202406281": No such file or directory
2025-07-07 10:10:43.087 CST [1673] LOG:  checkpoint starting: end-of-recovery immediate wait
2025-07-07 10:10:43.101 CST [1673] LOG:  checkpoint complete: wrote 3 buffers (0.0%); 0 WAL file(s) added, 0 removed, 1 recycled; write=0.001 s, sync=0.003 s, total=0.017 s; sync files=2, longest=0.003 s, average=0.002 s; distance=16384 kB, estimate=16384 kB; lsn=0/D000028, redo lsn=0/D000028
2025-07-07 10:10:43.105 CST [1672] LOG:  database system is ready to accept connections
 done
server started
[postgres@bigdata backups]$

查看服务是否启动

[postgres@bigdata backups]$ ps -ef|grep 15432
postgres    1680    1445  0 10:10 pts/1    00:00:00 grep --color=auto 15432
[postgres@bigdata backups]$ ss -lntp |grep 15432
LISTEN 0      200        127.0.0.1:15432      0.0.0.0:*    users:(("postgres",pid=1672,fd=7))
LISTEN 0      200            [::1]:15432         [::]:*    users:(("postgres",pid=1672,fd=6))
[postgres@bigdata backups]$ 

当然,也可以备份成.tar文件:

# 将备份保存为单个 tar 文件
# -F:指定输出格式(t=tar)
# -t:将 tar 文件发送到标准输出(而非直接写入目录)
[postgres@bigdata backups]$ pg_basebackup -Ft -D /opt/postgresql-17.4/backups
[postgres@bigdata backups]$ ll
total 39732
-rw------- 1 postgres postgres     1024 Jul  7 10:18 16388.tar
-rw------- 1 postgres postgres   137650 Jul  7 10:18 backup_manifest
-rw------- 1 postgres postgres 23758336 Jul  7 10:18 base.tar
-rw------- 1 postgres postgres 16778752 Jul  7 10:18 pg_wal.tar
[postgres@bigdata backups]$ 


# 将源服务器的 /data/tbs1 表空间恢复到目标服务器的 /new_data/tbs1
pg_basebackup -D /backup -T /data/tbs1=/new_data/tbs1

# 以postgres用户身份执行备份,创建一个基础备份并包含 WAL 日志,适用于完整恢复。
#  -X stream:包含 WAL 日志,确保备份一致性
#  -Pv:显示进度和详细信息
[postgres@bigdata backups]$ pg_basebackup -D /opt/postgresql-17.4/backups -X stream -Pv
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/14000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_1788"
23202/23202 kB (100%), 2/2 tablespaces                                         
pg_basebackup: write-ahead log end point: 0/14000120
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed
[postgres@bigdata backups]$ 

3.backup_mainfest文件

[postgres@bigdata backups]$ cat backup_manifest
{ "PostgreSQL-Backup-Manifest-Version": 2,
"System-Identifier": 7498569821944734286,
"Files": [
{ "Path": "backup_label", "Size": 227, "Last-Modified": "2025-07-07 02:24:25 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "34445b35" },
{ "Path": "global/1262", "Size": 8192, "Last-Modified": "2025-04-29 03:20:46 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "7738cb04" },
{ "Path": "global/2964", "Size": 0, "Last-Modified": "2025-04-29 03:20:46 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
{ "Path": "global/1213", "Size": 8192, "Last-Modified": "2025-04-29 19:10:35 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "ed355585" },
{ "Path": "global/1260", "Size": 8192, "Last-Modified": "2025-04-29 03:20:46 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "c8d4de5c" },
{ "Path": "global/1261", "Size": 8192, "Last-Modified": "2025-04-29 03:20:46 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "4e4eb40d" },
{ "Path": "global/1214", "Size": 0, "Last-Modified": "2025-04-29 03:20:46 GMT", "Checksum-Algorithm": "CRC32C", "Checksum": "00000000" },
],
......
"WAL-Ranges": [
{ "Timeline": 1, "Start-LSN": "0/16000028", "End-LSN": "0/16000120" }
],
"Manifest-Checksum": "d168cc0336ca97c4014400663e3d4dddce489f1073c37c0eb1f21ad598ab3f1e"}
[postgres@bigdata backups]$
[postgres@bigdata backups]$ 

这个文件可以用于校验备份是否完成,也可以用于看下自从上次备份以来改变了哪些东西。

4.做一个全量备份

step 1.创建全量备份

[postgres@bigdata backups]$ rm -rf /opt/postgresql-17.4/backups/*
[postgres@bigdata backups]$ pg_basebackup -Ft -D "/opt/postgresql-17.4/backups/$( date +%Y-%m-%d_%H%M%S-FULL )"
[postgres@bigdata backups]$ ll
total 0
drwx------ 2 postgres postgres 80 Jul  7 10:26 2025-07-07_102633-FULL
[postgres@bigdata backups]$ 

step 2.修改表数据

postgres=# UPDATE bigdata SET last_updated = now();
UPDATE 1
postgres=# select * from bigdata;
         last_updated          
-------------------------------
 2025-07-07 10:27:26.531463+08
(1 row)

postgres=# 

step 3.指定“-i”参数, 做增量备份,:

[postgres@bigdata backups]$ pg_basebackup -i /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/backup_manifest -Ft -D "/opt/postgresql-17.4/backups/$( date +%Y-%m-%d_%H%M%S-INCREMENTAL )"
pg_basebackup: error: could not initiate base backup: ERROR:  incremental backups cannot be taken unless WAL summarization is enabled
pg_basebackup: removing data directory "/opt/postgresql-17.4/backups/2025-07-07_102932-INCREMENTAL"
[postgres@bigdata backups]$ 

根据错误提示,需要开启 wal summarization:

postgres=# ALTER system SET summarize_wal = ON;
ALTER SYSTEM
postgres=# SELECT pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

postgres=# 

再次进行增量备份

[postgres@bigdata backups]$ pg_basebackup -i /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/backup_manifest -Ft -D "/opt/postgresql-17.4/backups/$( date +%Y-%m-%d_%H%M%S-INCREMENTAL )"
[postgres@bigdata backups]$ ll
total 0
drwx------ 2 postgres postgres 80 Jul  7 10:26 2025-07-07_102633-FULL
drwx------ 2 postgres postgres 80 Jul  7 10:30 2025-07-07_103047-INCREMENTAL
[postgres@bigdata backups]$ 

5.wal_summarization

默认为off,开启后会启动一个walsummarizer进程,自动生成 wal summarize 信息;当然还需要wal_level>minimal才能开启。记录到一段WAL的内容中:文件大小的变化、哪些block发生变化、需要被更新或删除、lsn范围。

[postgres@bigdata backups]$ ps -ef|grep post
root        1341    1302  0 09:52 pts/0    00:00:00 su - postgres
postgres    1342    1341  0 09:52 pts/0    00:00:00 -bash
postgres    1369       1  0 09:52 ?        00:00:00 /opt/postgresql-17.4/bin/postgres -D /opt/postgresql-17.4/data
postgres    1370    1369  0 09:52 ?        00:00:00 postgres: checkpointer 
postgres    1371    1369  0 09:52 ?        00:00:00 postgres: background writer 
postgres    1374    1369  0 09:52 ?        00:00:00 postgres: walwriter 
postgres    1375    1369  0 09:52 ?        00:00:00 postgres: autovacuum launcher 
postgres    1376    1369  0 09:52 ?        00:00:00 postgres: logical replication launcher 
root        1444    1405  0 09:58 pts/1    00:00:00 su - postgres
postgres    1445    1444  0 09:58 pts/1    00:00:00 -bash
postgres    1857    1342  0 10:27 pts/0    00:00:00 psql
postgres    1858    1369  0 10:27 ?        00:00:00 postgres: postgres postgres [local] idle
postgres    1870    1369  0 10:30 ?        00:00:00 postgres: walsummarizer 
postgres    1891    1445  0 10:31 pts/1    00:00:00 ps -ef
postgres    1892    1445  0 10:31 pts/1    00:00:00 grep --color=auto post
[postgres@bigdata backups]$ 

每个summary文件包含的信息:
1)某一个TLI上的一个LSN范围
2)每个relation,包括:
a “limit block” which is 0(文件被创建或销毁) if a relation is created or destroyed withina certain range of WAL records
or otherwise the shortest length(文件缩小至某个值) to which the relation was truncated during that range of WAL records
or otherwise InvalidBlockNumber(无效块号).
In addition, it stores a list of blocks which have been modified during that range of WAL records, (被修改过的blocks id). but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. (不记录被truncate并且后面没有被修改过的blocks id)

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=174c480508ac25568561443e6d4a82d5c1103487

Wal summarizer就是哪个LSN范围内的变动?2.1节进行讲述。

6.增量备份

全量备份

[postgres@bigdata 2025-07-07_102633-FULL]$ ll -lah
total 39M
drwx------ 2 postgres postgres   80 Jul  7 10:26 .
drwx------ 4 postgres postgres   73 Jul  7 10:30 ..
-rw------- 1 postgres postgres 1.0K Jul  7 10:26 16388.tar
-rw------- 1 postgres postgres 135K Jul  7 10:26 backup_manifest
-rw------- 1 postgres postgres  23M Jul  7 10:26 base.tar
-rw------- 1 postgres postgres  17M Jul  7 10:26 pg_wal.tar
[postgres@bigdata 2025-07-07_102633-FULL]$ 

增量备份

[postgres@bigdata 2025-07-07_103047-INCREMENTAL]$ ll -lah
total 20M
drwx------ 2 postgres postgres   80 Jul  7 10:30 .
drwx------ 4 postgres postgres   73 Jul  7 10:30 ..
-rw------- 1 postgres postgres 1.0K Jul  7 10:30 16388.tar
-rw------- 1 postgres postgres 141K Jul  7 10:30 backup_manifest
-rw------- 1 postgres postgres 3.7M Jul  7 10:30 base.tar
-rw------- 1 postgres postgres  17M Jul  7 10:30 pg_wal.tar
[postgres@bigdata 2025-07-07_103047-INCREMENTAL]$ 

增量备份的base.tar只有3.7MB,而全量备份有23MB。

增量备份和全量备份中的backup_manifest中文件个数一样:

[postgres@bigdata 2025-07-07_102633-FULL]$ jq .Files[13] /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/backup_manifest 
{
  "Path": "global/4177",
  "Size": 0,
  "Last-Modified": "2025-04-29 03:20:46 GMT",
  "Checksum-Algorithm": "CRC32C",
  "Checksum": "00000000"
}
[postgres@bigdata 2025-07-07_102633-FULL]$ jq .Files[13] /opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL/backup_manifest 
{
  "Path": "global/4177",
  "Size": 0,
  "Last-Modified": "2025-04-29 03:20:46 GMT",
  "Checksum-Algorithm": "CRC32C",
  "Checksum": "00000000"
}
[postgres@bigdata 2025-07-07_102633-FULL]$ 

增量备份有2中类型文件:

[postgres@bigdata 2025-07-07_102633-FULL]$ ll
total 39732
-rw------- 1 postgres postgres     1024 Jul  7 10:26 16388.tar
-rw------- 1 postgres postgres   137652 Jul  7 10:26 backup_manifest
-rw------- 1 postgres postgres 23758336 Jul  7 10:26 base.tar
-rw------- 1 postgres postgres 16778752 Jul  7 10:26 pg_wal.tar
[postgres@bigdata 2025-07-07_102633-FULL]$ 

不以INCREMENTAL开头的文件是普通文件,不是增量的。否则需要拉取一些更早的备份。

7.增量备份的合并

解压备份文件

`全量备份文件`
tar -xvf /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/base.tar 
tar -xvf /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/16388.tar 
tar -xvf /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/pg_wal.tar 

`增量备份文件`
tar -xvf /opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL/base.tar 
tar -xvf /opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL/16388.tar 
tar -xvf /opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL/pg_wal.tar 

pg_combinebackup 将一个全量备份,与一个或多个增量备份合并为一个全新的全量备份:

[postgres@bigdata backups]$ pg_combinebackup -o /opt/postgresql-17.4/backups/combined /opt/postgresql-17.4/backups/2025-07-07_102633-FULL/ /opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL/
pg_combinebackup: warning: manifest file "/opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL//backup_manifest" contains no entry for file "16388.tar"
pg_combinebackup: warning: manifest file "/opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL//backup_manifest" contains no entry for file "base.tar"
pg_combinebackup: warning: manifest file "/opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL//backup_manifest" contains no entry for file "pg_wal.tar"
pg_combinebackup: warning: manifest file "/opt/postgresql-17.4/backups/2025-07-07_103047-INCREMENTAL//backup_manifest" contains no entry for file "00000001000000000000001C"
[postgres@bigdata backups]$ 
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论