PostgreSQL日志归档
WAL日志在数据库中非常重要,数据库崩溃时,可以依靠WAL进行恢复。在高可用中,备节点可以通过流复制得到WAL日志进行回放,同步到与主节点相同的状态。除此之外,还可以利用WAL进行进行备份,并通过PITR恢复到指定状态。那么如何对WAL进行备份呢,PostgreSQL可通过日志归档功能对WAL进行归档备份。这里我们总结一下WAL日志归档方法。关于源码分析,可参考文章PostgreSQL源码分析——日志归档。另外需要注意15版本与之前版本的区别。
归档方法
归档涉及到如下GUC参数。archive_mode = on开启归档。设置archive_command,如果是PG15版本之后的版本,则增加了archive_library功能,可参考文章New WAL Archive Module/Library in PostgreSQL 15。
# - Archiving -
archive_mode = on # enables archiving; off, on, or always (change requires restart)
#archive_library = '' # library to use to archive a logfile segment
# (empty string indicates archive_command should
# be used)
archive_command = 'test ! -f /home/postgres/pgdata/archive/%f && cp %p /home/postgres/pgdata/archive/%f' # command to use to archive a logfile segment
# placeholders: %p = path of file to archive
# %f = file name only
# e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f'
archive_timeout = 1800 # force a logfile segment switch after this
# number of seconds; 0 disables
归档到本地目录示例:archive_command = 'test ! -f /home/postgres/pgdata/archive/%f && cp %p /home/postgres/pgdata/archive/%f',归档一个日志段文件到/home/postgres/pgdata/archive目录下。
也可以归档到远程主机:archive_command = 'sshpass -p asdf scp %p postgres@192.168.109.136:~/pgdata/archive/%f'。通过scp,或者rsync等工具或者其他工具备份到其他主机中。
归档恢复时,设置restore_command。从本地目录恢复的示例:restore_command = 'cp /home/postgres/pgdata/archive/%f %p',从/home/postgres/pgdata/archive目录拷贝日志到pg_wal目录下。
# - Archive Recovery -
# These are only used in recovery mode.
restore_command = 'cp /home/postgres/pgdata/archive/%f %p' # command to use to restore an archived logfile segment
# placeholders: %p = path of file to restore
# %f = file name only
# e.g. 'cp /mnt/server/archivedir/%f %p'
#archive_cleanup_command = '' # command to execute at every restartpoint 清理归档日志文件,可以用pg_archivecleanup
#recovery_end_command = '' # command to execute at completion of recovery
也可以从远程主机的归档日志进行恢复,restore_command = 'sshpass -p asdf scp postgres@192.168.109.136:/home/postgres/pgdata/archive/%f %p'。将远程主机的归档日志拷贝到本地pg_wal目录下。
备机启动进行归档恢复时打印的相关日志如下:
2023-12-25 16:45:38.121 CST [postmaster] LOG: starting PostgreSQL 15.5 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
2023-12-25 16:45:38.122 CST [postmaster] LOG: listening on IPv4 address "0.0.0.0", port 5431
2023-12-25 16:45:38.122 CST [postmaster] LOG: listening on IPv6 address "::", port 5431
2023-12-25 16:45:38.129 CST [postmaster] LOG: listening on Unix socket "/tmp/.s.PGSQL.5431"
2023-12-25 16:45:38.135 CST [checkpointer] DEBUG: checkpointer updated shared memory configuration values
2023-12-25 16:45:38.136 CST [startup] LOG: database system was shut down in recovery at 2023-12-25 16:43:42 CST
scp: /home/postgres/pgdata/archive/00000002.history: No such file or directory
2023-12-25 16:45:38.470 CST [startup] DEBUG: could not restore file "00000002.history" from archive: child process exited with exit code 1
2023-12-25 16:45:38.470 CST [startup] LOG: entering standby mode
2023-12-25 16:45:38.965 CST [startup] LOG: restored log file "00000001000000000000001B" from archive 从归档中获取WAL日志段文件
2023-12-25 16:45:38.995 CST [startup] DEBUG: got WAL segment from archive
2023-12-25 16:45:38.995 CST [startup] DEBUG: checkpoint record is at 0/1B000060
2023-12-25 16:45:38.995 CST [startup] DEBUG: redo record is at 0/1B000060; shutdown true
2023-12-25 16:45:38.995 CST [startup] DEBUG: next transaction ID: 242689; next OID: 24668
2023-12-25 16:45:38.995 CST [startup] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0
2023-12-25 16:45:38.995 CST [startup] DEBUG: oldest unfrozen transaction ID: 716, in database 1
2023-12-25 16:45:38.995 CST [startup] DEBUG: oldest MultiXactId: 1, in database 1
2023-12-25 16:45:38.995 CST [startup] DEBUG: commit timestamp Xid oldest/newest: 0/0
2023-12-25 16:45:38.995 CST [startup] DEBUG: transaction ID wrap limit is 2147484363, limited by database with OID 1
2023-12-25 16:45:38.996 CST [startup] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2023-12-25 16:45:38.996 CST [startup] DEBUG: starting up replication slots
2023-12-25 16:45:38.996 CST [startup] DEBUG: xmin required by slots: data 0, catalog 0
2023-12-25 16:45:38.996 CST [startup] DEBUG: starting up replication origin progress state
2023-12-25 16:45:38.996 CST [startup] DEBUG: reading stats file "pg_stat/pgstat.stat"
2023-12-25 16:45:38.997 CST [startup] DEBUG: removing permanent stats file "pg_stat/pgstat.stat"
2023-12-25 16:45:38.999 CST [startup] DEBUG: resetting unlogged relations: cleanup 1 init 0
2023-12-25 16:45:38.999 CST [startup] DEBUG: initializing for hot standby
2023-12-25 16:45:38.999 CST [startup] DEBUG: recovery snapshots are now enabled
2023-12-25 16:45:38.999 CST [startup] LOG: redo starts at 0/1B0000D8
2023-12-25 16:45:39.481 CST [startup] LOG: restored log file "00000001000000000000001C" from archive
2023-12-25 16:45:39.520 CST [startup] DEBUG: got WAL segment from archive
2023-12-25 16:45:40.026 CST [startup] LOG: restored log file "00000001000000000000001D" from archive
归档到S3
对象存储成本较低,可以将WAL日志归档到S3中存储。方法与前面的过程相同,需要写一个工具,源码可参考https://github.com/chirpyli/walarch 。实现日志段文件上传到S3以及从S3中下载。
#归档命令
archive_command = 'walarch put --access_key *** --secret_key *** --bucket jfs1218 --endpoint eos-wuxi-3.cmecloud.cn --region wuxi3 --file %p'
#恢复命令
restore_command = 'walarch get --access_key *** --secret_key *** --bucket jfs1218 --endpoint eos-wuxi-3.cmecloud.cn --file %f --path %p --region wuxi3'




