PG菜鸟入门学习中,欢迎各位大佬留言技术指导。
pg_basebackup 介绍
pg_basebackup 是一个 PostgreSQL 内置工具,用于创建 PostgreSQL 主服务器的二进制备份,包括主服务器上的所有数据和 WAL 日志。它可以在从服务器上执行,以创建一个完整的 PostgreSQL 数据库副本,也可以在主服务器上执行,以创建一个文件系统级别的备份。
pg_basebackup 常用参数
-D, --pgdata=DIRECTORY 存放备份文件目录
-F, --format=p|t 输出格式(plain(默认),tar)
# 如果选t格式,则将数据目录打成tar包
-rw------- 1 postgres postgres 1.0K Mar 15 17:36 16494.tar
-rw------- 1 postgres postgres 226K Mar 15 17:36 backup_manifest
-rw------- 1 postgres postgres 184M Mar 15 17:36 base.tar
-rw------- 1 postgres postgres 17M Mar 15 17:36 pg_wal.tar
-r, --max-rate=RATE 传输数据目录的最大传输速率(单位为kB/s,或以“k”或“M”为后缀)
#可用于跨机器备份,限制备份速度,以控制备份对网络带宽的影响;或控制备份时导出落盘的速度,以防止出现IO打满的情况
$ pg_basebackup -D /opt/pgsql-152/backup -Fp -Xs -l 'backup_plain' -P -v -r 10M
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/27000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_3768708"
10820/187431 kB (5%), 1/2 tablespaces (/opt/pgsql-152/backup/base/4/2608 )
21323/187431 kB (11%), 1/2 tablespaces (/opt/pgsql-152/backup/base/5/2675 )
42885/187431 kB (22%), 1/2 tablespaces (/opt/pgsql-152/backup/base/5/16509 )
187445/187445 kB (100%), 2/2 tablespaces
...
-P, --progress 显示进度信息
-v, --verbose 输出详细消息
-R, --write-recovery-conf 为复制写配置
# 会在备份目录自动创建名为 standby.signal 的空文件,并自动在文件 postgresql.auto.conf 中加入 primary_conninfo 信息
-rw------- 1 postgres postgres 567 Mar 15 17:55 postgresql.auto.conf
-rw------- 1 postgres postgres 0 Mar 15 17:55 standby.signal
$ grep primary_conninfo postgresql.auto.conf
primary_conninfo = 'user=postgres passfile=''/home/postgres/.pgpass'' channel_binding=disable port=5432 sslmode=disable sslcompression=0 sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any'
额外的,
关于 primary_conninfo
参数,补充说明其包含字段的解释:
- passfile
密码文件,用于存放用户登陆信息,格式为 hostname:port:database:username:password
, 在 Unix 系统,该文件权限需为 0600
,否则会被忽略。
如果文件权限大于 0600
,当连接数据库时,会触发 WARN 警告。且该文件存储的是明文密码,从安全角度来讲,并不建议使用。
$ cat .pgpass
*:5432:*:sbtest:sbtest
$ psql -Usbtest -h192.168.8.11
WARNING: password file "/home/postgres/.pgpass" has group or world access; permissions should be u=rw (0600) or less
Password for user sbtest:
- channel_binding
可以让客户端指定通道绑定作为 SCRAM 的组成部分, 并且,使用一个含密码保护的 TLS 证书的客户端现在可以通过 sslpassword 参数来指定密码。PostgreSQL现在也支持 DER 算法编码的证书。这是 PostgreSQL 13 引入的参数。
- gssencmode
此选项决定是否与服务器协商安全的 GSS TCP/IP 连接,或者以何种优先级进行协商。 有三种模式: disable/prefer (default)/require
- target_session_attrs
从 PostgreSQL 10 开始可以支持在 libpq 中连接多个实例的设置,在 PG 10 中,支持两个选项:
- any (default) 可连任意数据库实例
- read-write 默认情况下会话只接受读写事务 (主机不支持 hot standby 模式, default_transaction_read_only = off)
但是从 PostgreSQL 14 开始,支持 6 个选项(新增4个选项):
- read-only 默认情况下会话不能接受读写事务
- primary 服务器不能处于 hot standby 模式
- standby 服务器必须处于 hot standby 模式
- prefer-standby 首先尝试寻找备机,但如果列出的主机都不是备机,就在 any 模式下再次尝试
使用方式:
host=host1 target_session_attrs=any
postgresql://host1:port2,host2:port2/?target_session_attrs=read-write
更多讨论:
https://www.postgresql.org/message-id/CAF3+xM+8-ztOkaV9gHiJ3wfgENTq97QcjXQt+rbFQ6F7oNzt9A@mail.gmail.com
Recently I put a proposal to support ‘prefer-read’ parameter in
target_session_attrs in libpq. Now I updated the patch with adding content
in the sgml and regression test case.
pg_basebackup 在 PostgreSQL 15 中的更新
LZ4 & zstd
-
[15.0] More options for compression, including support for Zstandard (zstd) compression. This includes support for performing compression on the server side during pg_basebackup.
-
Add support for LZ4 and Zstandard compression of server-side base backups (Jeevan Ladhe, Robert Haas)
-
Allow pg_basebackup to do server-side gzip, LZ4, and Zstandard compression and client-side LZ4 and Zstandard compression of base backup files (Dipesh Pandit, Jeevan Ladhe)
Client-side gzip compression was already supported.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=0ad8032910d5eb8efd32867c45b6a25c85e60f50
-
Allow pg_basebackup to compress on the server side and decompress on the client side before storage (Dipesh Pandit)
This is accomplished by specifying compression on the server side and plain output format.
-
Allow pg_basebackup’s
--compress
option to control the compression location (server or client), compression method, and compression options (Michael Paquier, Robert Haas)
批注:
1, 增加了两种压缩模式,支持 LZ4 和 zstd 压缩模式,编译时需要添加参数 --with-lz4 --with-zstd
以支持该特性。
2, 支持 server 端压缩(gzip/LZ4/zstd),或者 client 端进行压缩(gzip/LZ4/zstd)。
3, 强化 -Z, --compress
参数,支持多种详细描述,比如 -Z server-zstd:2
,在服务器端使用 zstd 算法压缩。
target
- Add new pg_basebackup option
--target
to control the base backup location (Robert Haas)
The new options are server to write the backup locally and blackhole to discard the backup (for testing).
Instructs the server where to place the base backup. The default target is client, which specifies that the backup should be sent to the machine where pg_basebackup is running. If the target is instead set to server:/some/path, the backup will be stored on the machine where the server is running in the /some/path directory. Storing a backup on the server requires superuser privileges or having privileges of the pg_write_server_files role. If the target is set to blackhole, the contents are discarded and not stored anywhere. This should only be used for testing purposes, as you will not end up with an actual backup.
Since WAL streaming is implemented by pg_basebackup rather than by the server, this option cannot be used together with -Xstream. Since that is the default, when this option is specified, you must also specify either -Xfetch or -Xnone.
批注:
1, 在服务器上存储备份需要超级用户权限或具有pg_write_server_files角色的权限。 如果目标设置为黑洞,则内容将被丢弃,不存储在任何地方。 这应该只用于测试目的,因为您最终不会得到实际的备份。
$ pg_basebackup --target=blackhole --wal-method=none
NOTICE: all required WAL segments have been archived
运行此命令后,并不会得到实际的备份集。
2, 如果将目标设定为 server:/some/path
,其效果与 -D backup -Ft
类似。
$ pg_basebackup --target=server:/opt/pgsql-152/backup --wal-method=none
$ pg_basebackup -D /opt/pgsql-152/backup -Ft --wal-method=none
3, 推测:或可支持备份至 S3 路径。
LZ4
LZ4是一种无损数据压缩算法,由Yann Collet于2011年开发。它可以实现非常快速的压缩和解压缩速度,同时提供比LZO、Snappy和Zlib等其他压缩算法更高的压缩比。
LZ4算法的压缩和解压缩过程基于哈希表和字典序列的概念。它将输入数据分割成若干个片段,每个片段分别使用一个字典来进行压缩。在压缩过程中,LZ4算法会将输入数据分成长度相等的子块,并利用这些子块之间的重叠来实现更高的压缩比。在解压缩过程中,LZ4算法使用相同的字典来逆向生成原始数据。
LZ4算法的特点是快速、简单和高效。在多核CPU上可以并行化,同时支持多种压缩级别和压缩块大小的设置。LZ4算法常被用于高速数据传输和实时数据处理领域,例如网络传输、分布式存储、日志处理、大数据分析等场景。在PostgreSQL数据库中,LZ4算法也是一种常用的数据压缩方式,可以用于减少数据库的存储空间和提高数据传输速度。
在 PostgreSQL 15 中,如果采用编译安装,需要添加选项 --with-lz4
。
LZ4 源码仓库:https://github.com/lz4/lz4
zstd
Zstandard,或简称zstd,是一种快速无损压缩算法,针对实时压缩场景在zlib级别和更好的压缩比。 它有一个非常快的熵阶段,由Huff0和FSE库提供。
Zstandard的格式是稳定的,并记录在RFC8878中。 多个独立的实现已经可用。 这个存储库代表了参考实现,作为一个开源的双BSD和GPLv2许可的C库提供,以及一个生成和解码.zst、.gz、.xz和.lz4文件的命令行实用程序。
在 PostgreSQL 15 中,如果采用编译安装,需要添加选项 --with-zstd
。
zstd 源码仓库:https://github.com/facebook/zstd
备份测试
该测试主要是针对 PG 15 中 pg_basebackup 的新参数进行功能性测试验证,所以数据量较小。
-
数据源:
https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads -
创建表:
-- Create table:
CREATE TABLE land_registry_price_paid_uk(
transaction uuid,
price numeric,
transfer_date date,
postcode text,
property_type char(1),
newly_built boolean,
duration char(1),
paon text,
saon text,
street text,
locality text,
city text,
district text,
county text,
ppd_category_type char(1),
record_status char(1));
- 导入测试数据:
-- Copy CSV data, with appropriate munging:
COPY land_registry_price_paid_uk FROM '/tmp/pp-2022.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
- 导入结果:
(postgres@[local]) [postgres] 22:48:25# COPY land_registry_price_paid_uk FROM '/tmp/pp-2022.csv' with (format csv, encoding 'win1252', header false, null '', quote '"', force_null (postcode, saon, paon, street, locality, city, district));
COPY 766786
Time: 4067.681 ms (00:04.068)
-Z none
$ time pg_basebackup -D b1 -Xs -Ft -Z none
2023-03-15 23:05:18.929 CST [84988] LOG: checkpoint starting: force wait
2023-03-15 23:05:19.046 CST [84988] LOG: checkpoint complete: wrote 4 buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.106 s, sync=0.005 s, total=0.117 s; sync files=3, longest=0.002 s, average=0.002 s; distance=405 kB, estimate=405 kB
real 0m1.120s
user 0m0.037s
sys 0m0.531s
-Z server-gzip
$ time pg_basebackup -D b2 -Xs -Ft -Z server-gzip
2023-03-15 23:06:08.393 CST [84988] LOG: checkpoint starting: force wait
2023-03-15 23:06:08.399 CST [84988] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
real 0m6.221s
user 0m0.023s
sys 0m0.249s
-Z server-lz4
$ time pg_basebackup -D b3 -Xs -Ft -Z server-lz4
2023-03-15 23:08:07.705 CST [84988] LOG: checkpoint starting: force wait
2023-03-15 23:08:07.711 CST [84988] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
real 0m0.850s
user 0m0.019s
sys 0m0.178s
-Z server-zstd
$ time pg_basebackup -D b4 -Xs -Ft -Z server-zstd
2023-03-15 23:08:59.603 CST [84988] LOG: checkpoint starting: force wait
2023-03-15 23:08:59.608 CST [84988] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
real 0m1.202s
user 0m0.016s
sys 0m0.148s
-Z server-zstd:2
$ time pg_basebackup -D b5 -Xs -Ft -Z server-zstd:2
2023-03-15 23:09:36.944 CST [84988] LOG: checkpoint starting: force wait
2023-03-15 23:09:36.949 CST [84988] LOG: checkpoint complete: wrote 0 buffers (0.0%); 0 WAL file(s) added, 0 removed, 2 recycled; write=0.001 s, sync=0.001 s, total=0.006 s; sync files=0, longest=0.000 s, average=0.000 s; distance=32768 kB, estimate=32768 kB
real 0m0.958s
user 0m0.015s
sys 0m0.139s
汇总测试结果,整理如下表:
compress | cost time | backup size |
---|---|---|
-Z none |
1.120s | 145M |
-Z server-gzip |
6.221s | 51M |
-Z server-lz4 |
0.850s | 65M |
-Z server-zstd |
1.202s | 47M |
-Z server-zstd:2 |
0.958s | 49M |
可以看出不同的压缩参数会影响备份的大小和备份时间。使用不同的压缩算法和压缩级别需要在时间和空间之间做出权衡。
总结
pg_basebackup 主要用于热备份,以及快速创建可用备库,压缩算法的支持大大减轻了磁盘空间的需求,网络传输压力,以及磁盘IO负载。
此外,TOAST 压缩方式 (default_toast_compression) 已支持 LZ4 压缩算法,WAL 日志压缩方法(wal_compression)已支持 LZ4/zstd 压缩算法。
(postgres@[local]) [postgres] 23:00:32# \dconfig *compression
List of configuration parameters
+---------------------------+-------+
| Parameter | Value |
+---------------------------+-------+
| default_toast_compression | lz4 |
| wal_compression | zstd |
+---------------------------+-------+
(2 rows)