暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL源码分析——基础备份

原创 chirpyli 2023-09-18
694

进行基础备份有2中方式,可使用pg_basebackup工具或其他备份工具进行备份,另一种是使用底层命令进行基础备份。pg_basebackup等工具其实是封装了底层命令,所以,为了更好的理解基础备份的过程,这里我们使用底层命令进行备份。并分析其中的源码实现。

基础备份过程

备份的方式有多种,可以进行SQL Dump,也可以停止数据库实例,对实例物理文件进行复制拷贝,有其各自的优缺点与适用场景。这里的基础备份,其中一个最大的优势就是可以不停机,不停业务进行物理备份,在备份过程中,不需要获取表上的锁,正常业务受备份的影响较小。另外非常强大的一个功能就是PIRT,后面再去分析,这里我们分析一下基础备份的全过程。

基础备份的过程如下:

  1. 连接到数据库
  2. 执行select pg_start_backup('lable')命令。(会强制发生一次checkpoint,并将检查点记录到backup_label文件中)
  3. 执行备份,把数据目录进行复制(包含backup_label)
  4. 执行select pg_stop_backup命令,(删除backup_label文件,并在WAL日志中写入一条XLOG_BACKUP_END的记录,当备节点回放到该记录时,就知道备份结束了,数据达到了一致点,可以对外提供服务了)
  5. 备份过程中产生的WAL日志进行复制

操作执行过程分析

在分析源码之前,我们先执行基础备份操作过程,进行基础备份,帮助我们理解其中的备份过程。

  1. initdb,创建数据库
    查看pg_control
postgres@slpc:~/pgsql$ pg_controldata -D pgdata/ pg_control version number: 1300 Catalog version number: 202107181 Database system identifier: 7279971345653503170 Database cluster state: shut down pg_control last modified: 2023年09月18日 星期一 09时26分56秒 Latest checkpoint location: 0/167E598 Latest checkpoint's REDO location: 0/167E598 Latest checkpoint's REDO WAL file: 000000010000000000000001 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on

查看WAL日志:

postgres@slpc:~/pgsql$ pg_waldump -p pgdata/pg_wal/ 000000010000000000000001
// 省略...
rmgr: Transaction len (rec/tot):     66/    66, tx:        732, lsn: 0/0167E550, prev 0/0167E4B0, desc: COMMIT 2023-09-18 09:26:56.640405 CST; inval msgs: snapshot 2396
rmgr: XLOG        len (rec/tot):    114/   114, tx:          0, lsn: 0/0167E598, prev 0/0167E550, desc: CHECKPOINT_SHUTDOWN redo 0/167E598; tli 1; prev tli 1; fpw true; xid 0:733; oid 13011; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown
  1. 启动数据库
  2. 连接数据库,建表,插入数据
  3. 执行pg_start_backup('bak1')函数
postgres@slpc:~/pgsql/pgdata/pg_wal$ psql -p 7432 psql (14.8) Type "help" for help. postgres=# create table t1(a int); CREATE TABLE postgres=# insert into t1 values(1); INSERT 0 1 postgres=# select pg_start_backup('bak1'); pg_start_backup ----------------- 0/2000028 (1 row)

首先是日志文件发生切换,切换后再执行checkpoint操作

postgres@slpc:~/pgsql/pgdata/pg_wal$ ls 000000010000000000000001 archive_status postgres@slpc:~/pgsql/pgdata/pg_wal$ ls -- 强制切换WAL段,回收WAL文件, 从000000010000000000000002开始,后的WAL文件都要拷贝到备份文件中,回收的WAL文件则不需要 000000010000000000000002 000000010000000000000003 archive_status 查看日志,观察运行过程, 执行过程中,会进行checkpoint操作: ```sql 2023-09-18 10:12:21.139 CST [417435] DEBUG: 00000: attempting to remove WAL segments older than log file 000000000000000000000001 2023-09-18 10:12:21.139 CST [417435] LOCATION: RemoveOldXlogFiles, xlog.c:4114 2023-09-18 10:12:21.141 CST [417435] DEBUG: 00000: recycled write-ahead log file "000000010000000000000001" 2023-09-18 10:12:21.141 CST [417435] LOCATION: RemoveXlogFile, xlog.c:4256 2023-09-18 10:12:21.141 CST [417435] DEBUG: 00000: SlruScanDirectory invoking callback on pg_subtrans/0000 2023-09-18 10:12:21.141 CST [417435] LOCATION: SlruScanDirectory, slru.c:1574 2023-09-18 10:12:21.141 CST [417435] LOG: 00000: checkpoint complete: wrote 31 buffers (0.2%); 0 WAL file(s) added, 0 removed, 1 recycled; write=2.846 s, sync=0.005 s, total=2.860 s; sync files=22, longest=0.004 s, average=0.001 s; distance=9734 kB, estimate=9734 kB 2023-09-18 10:12:21.141 CST [417435] LOCATION: LogCheckpointEnd, xlog.c:8925 2023-09-18 10:12:39.283 CST [417436] DEBUG: 00000: snapshot of 0+0 running transaction ids (lsn 0/2000148 oldest xid 735 latest complete 734 next xid 735)

观察wal日志:

postgres@slpc:~/pgsql$ pg_waldump -p pgdata/pg_wal/ 000000010000000000000002 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000028, prev 0/01696D18, desc: RUNNING_XACTS nextXid 735 latestCompletedXid 734 oldestRunningXid 735 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000060, prev 0/02000028, desc: RUNNING_XACTS nextXid 735 latestCompletedXid 734 oldestRunningXid 735 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/02000098, prev 0/02000060, desc: CHECKPOINT_ONLINE redo 0/2000028; tli 1; prev tli 1; fpw true; xid 0:735; oid 24576; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 735; online rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000110, prev 0/02000098, desc: RUNNING_XACTS nextXid 735 latestCompletedXid 734 oldestRunningXid 735 rmgr: Heap len (rec/tot): 54/ 150, tx: 735, lsn: 0/02000148, prev 0/02000110, desc: INSERT off 2 flags 0x00, blkref #0: rel 1663/13010/16384 blk 0 FPW rmgr: Transaction len (rec/tot): 34/ 34, tx: 735, lsn: 0/020001E0, prev 0/02000148, desc: COMMIT 2023-09-18 10:23:57.688476 CST rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000208, prev 0/020001E0, desc: RUNNING_XACTS nextXid 736 latestCompletedXid 735 oldestRunningXid 736

观察pg_control

postgres@slpc:~/pgsql$ pg_controldata -D pgdata/ pg_control version number: 1300 Catalog version number: 202107181 Database system identifier: 7279971345653503170 Database cluster state: in production pg_control last modified: 2023年09月18日 星期一 10时12分21秒 Latest checkpoint location: 0/2000098 -- 最新检测点位置 Latest checkpoint's REDO location: 0/2000028 Latest checkpoint's REDO WAL file: 000000010000000000000002 -- REDO WAL文件,即checkpoint REDO location开始的文件 Latest checkpoint's TimeLineID: 1 Latest checkpoint's PrevTimeLineID: 1 Latest checkpoint's full_page_writes: on Latest checkpoint's NextXID: 0:735 Latest checkpoint's NextOID: 24576

生成backup_label文件(非常重要,后续从备份文件中进行恢复时,从这里记录的位置开始,而不是读取pg_control文件中的位置):

postgres@slpc:~/pgsql/pgdata$ cat backup_label START WAL LOCATION: 0/2000028 (file 000000010000000000000002) CHECKPOINT LOCATION: 0/2000098 BACKUP METHOD: pg_start_backup BACKUP FROM: primary START TIME: 2023-09-18 10:12:21 CST LABEL: bak1 START TIMELINE: 1
  1. 拷贝数据库实例到备份文件

  2. 执行pg_stop_backup(),结束基础备份

postgres=# select pg_stop_backup(); NOTICE: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup pg_stop_backup ---------------- 0/2000268 (1 row)

观察日志

2023-09-18 10:47:41.095 CST [447083] DEBUG:  00000: removing WAL backup history file "000000010000000000000002.00000028.backup"
2023-09-18 10:47:41.095 CST [447083] LOCATION:  CleanupBackupHistory, xlog.c:4375
2023-09-18 10:47:41.095 CST [447083] NOTICE:  00000: WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
2023-09-18 10:47:41.095 CST [447083] LOCATION:  do_pg_stop_backup, xlog.c:11912
2023-09-18 10:47:41.263 CST [417436] DEBUG:  00000: snapshot of 0+0 running transaction ids (lsn 0/3000060 oldest xid 736 latest complete 735 next xid 736)

查看wal日志:

postgres@slpc:~/pgsql$ pg_waldump -p pgdata/pg_wal/ 000000010000000000000002 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000028, prev 0/01696D18, desc: RUNNING_XACTS nextXid 735 latestCompletedXid 734 oldestRunningXid 735 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000060, prev 0/02000028, desc: RUNNING_XACTS nextXid 735 latestCompletedXid 734 oldestRunningXid 735 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/02000098, prev 0/02000060, desc: CHECKPOINT_ONLINE redo 0/2000028; tli 1; prev tli 1; fpw true; xid 0:735; oid 24576; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 735; online rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000110, prev 0/02000098, desc: RUNNING_XACTS nextXid 735 latestCompletedXid 734 oldestRunningXid 735 rmgr: Heap len (rec/tot): 54/ 150, tx: 735, lsn: 0/02000148, prev 0/02000110, desc: INSERT off 2 flags 0x00, blkref #0: rel 1663/13010/16384 blk 0 FPW rmgr: Transaction len (rec/tot): 34/ 34, tx: 735, lsn: 0/020001E0, prev 0/02000148, desc: COMMIT 2023-09-18 10:23:57.688476 CST rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/02000208, prev 0/020001E0, desc: RUNNING_XACTS nextXid 736 latestCompletedXid 735 oldestRunningXid 736 rmgr: XLOG len (rec/tot): 34/ 34, tx: 0, lsn: 0/02000240, prev 0/02000208, desc: BACKUP_END 0/2000028 rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/02000268, prev 0/02000240, desc: SWITCH postgres@slpc:~/pgsql$ pg_waldump -p pgdata/pg_wal/ 000000010000000000000003 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/03000028, prev 0/02000268, desc: RUNNING_XACTS nextXid 736 latestCompletedXid 735 oldestRunningXid 736

删除了源数据库实例中的backup_label文件,因为这个是给备库用的,已经被拷贝到了备份文件中,等待恢复使用。

  1. 备份文件进行恢复
    启动备份的数据库实例,读backup_label文件,
    观察日志:
2023-09-18 11:09:59.964 CST [1237713] LOG:  00000: starting PostgreSQL 14.8 on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0, 64-bit
2023-09-18 11:09:59.965 CST [1237713] LOG:  00000: listening on IPv4 address "0.0.0.0", port 7431
2023-09-18 11:09:59.965 CST [1237713] LOG:  00000: listening on IPv6 address "::", port 7431
2023-09-18 11:09:59.970 CST [1237713] LOG:  00000: listening on Unix socket "/tmp/.s.PGSQL.7431"
2023-09-18 11:09:59.976 CST [1237717] LOG:  00000: database system was interrupted; last known up at 2023-09-18 10:12:21 CST
2023-09-18 11:09:59.976 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6585
2023-09-18 11:09:59.976 CST [1237717] DEBUG:  00000: removing all temporary WAL segments
2023-09-18 11:09:59.976 CST [1237717] LOCATION:  RemoveTempXlogFiles, xlog.c:4070
2023-09-18 11:09:59.993 CST [1237717] DEBUG:  00000: backup time 2023-09-18 10:12:21 CST in file "backup_label"
2023-09-18 11:09:59.993 CST [1237717] LOCATION:  read_backup_label, xlog.c:12143
2023-09-18 11:09:59.993 CST [1237717] DEBUG:  00000: backup label bak1 in file "backup_label"
2023-09-18 11:09:59.993 CST [1237717] LOCATION:  read_backup_label, xlog.c:12148
2023-09-18 11:09:59.993 CST [1237717] DEBUG:  00000: backup timeline 1 in file "backup_label"
2023-09-18 11:09:59.993 CST [1237717] LOCATION:  read_backup_label, xlog.c:12165
2023-09-18 11:09:59.993 CST [1237717] DEBUG:  00000: checkpoint record is at 0/2000098
2023-09-18 11:09:59.993 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6729
2023-09-18 11:09:59.993 CST [1237717] DEBUG:  00000: redo record is at 0/2000028; shutdown false
2023-09-18 11:09:59.993 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6936
2023-09-18 11:09:59.993 CST [1237717] DEBUG:  00000: next transaction ID: 735; next OID: 24576
2023-09-18 11:09:59.993 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6940
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: next MultiXactId: 1; next MultiXactOffset: 0
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6944
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: oldest unfrozen transaction ID: 726, in database 1
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6947
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: oldest MultiXactId: 1, in database 1
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6950
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: commit timestamp Xid oldest/newest: 0/0
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  StartupXLOG, xlog.c:6953
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: transaction ID wrap limit is 2147484373, limited by database with OID 1
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  SetTransactionIdLimit, varsup.c:427
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  SetMultiXactIdLimit, multixact.c:2283
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: starting up replication slots
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  StartupReplicationSlots, slot.c:1394
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: xmin required by slots: data 0, catalog 0
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  ProcArraySetReplicationSlotXmin, procarray.c:3984
2023-09-18 11:09:59.994 CST [1237717] DEBUG:  00000: starting up replication origin progress state
2023-09-18 11:09:59.994 CST [1237717] LOCATION:  StartupReplicationOrigin, origin.c:706
2023-09-18 11:09:59.996 CST [1237717] DEBUG:  00000: resetting unlogged relations: cleanup 1 init 0
2023-09-18 11:09:59.996 CST [1237717] LOCATION:  ResetUnloggedRelations, reinit.c:55
2023-09-18 11:10:00.008 CST [1237717] LOG:  00000: redo starts at 0/2000028
2023-09-18 11:10:00.008 CST [1237717] LOCATION:  StartupXLOG, xlog.c:7387
2023-09-18 11:10:00.008 CST [1237717] DEBUG:  00000: end of backup reached
2023-09-18 11:10:00.008 CST [1237717] CONTEXT:  WAL redo at 0/2000240 for XLOG/BACKUP_END: 0/2000028
2023-09-18 11:10:00.008 CST [1237717] LOCATION:  xlog_redo, xlog.c:10595
2023-09-18 11:10:00.010 CST [1237717] LOG:  00000: consistent recovery state reached at 0/2000268    到达一致性点,也就是pg_stop_backup的位置
2023-09-18 11:10:00.010 CST [1237717] LOCATION:  CheckRecoveryConsistency, xlog.c:8331      

观察WAL文件

postgres@slpc:~/pgsql$ pg_waldump -p pgbak/pg_wal/ 000000010000000000000003 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/03000028, prev 0/02000268, desc: RUNNING_XACTS nextXid 736 latestCompletedXid 735 oldestRunningXid 736 rmgr: XLOG len (rec/tot): 114/ 114, tx: 0, lsn: 0/03000060, prev 0/03000028, desc: CHECKPOINT_SHUTDOWN redo 0/3000060; tli 1; prev tli 1; fpw true; xid 0:736; oid 24576; multi 1; offset 0; oldest xid 726 in DB 1; oldest multi 1 in DB 1; oldest/newest commit timestamp xid: 0/0; oldest running xid 0; shutdown rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/030000D8, prev 0/03000060, desc: RUNNING_XACTS nextXid 736 latestCompletedXid 735 oldestRunningXid 736
  1. 检测备份的数据库实例是否启动成功
postgres@slpc:~/pgsql/pgdata$ psql -p 7431 psql (14.8) Type "help" for help. postgres=# \d List of relations Schema | Name | Type | Owner --------+------+-------+---------- public | t1 | table | postgres (1 row) postgres=# select * from t1; a --- 1 2 (2 rows)

下面我们进行源码分析

pg_start_backup

pg_start_backup开始为制作基础备份进行准备工作,恢复过程从重做点开始,因此pg_start_backup必须执行检查点,以便在制作基础备份开始的时刻显式创建一个重做点。这个检查点的位置需要保存在非pg_control文件中,因为备份过程中,业务并没有停,期间可能会执行多次常规检查点。

pg_start_backup ( label text [, fast boolean [, exclusive boolean ]] ) → pg_lsn

准备开始在线备份。唯一需要的参数是用于备份的任意用户定义的标签。(通常,备份转储文件将存储在这个名称下。) 如果可选的第二个参数被指定为true,它将指定尽可能快地执行pg_start_backup。这将强制产生一个即时检查点,这将导致I/O操作突增,从而降低并发执行的查询的速度。第三个可选参数指定是执行排他或非排他备份(默认为排他备份)。在排他模式下使用时,该函数将写一个备份标签文件(backup_label),如果pg_tblspc/目录中有任何链接, 则将一个表空间映射文件(tablespace_map)写入数据库集群的数据目录,然后执行检查点,然后返回备份的开始写-提前日志位置。 (用户可以忽略这个结果值,但在有用的情况下会提供它。) 在非排他模式下使用时,这些文件的内容将由pg_stop_backup函数返回,并且应该由用户复制到备份区域。

源码分析,调用pg_start_backup,调用的中间过程略,直接看函数实现。

pg_start_backup --> do_pg_start_backup

pg_start_backup函数实现如下:

/* * pg_start_backup: set up for taking an on-line backup dump * * Essentially what this does is to create a backup label file in $PGDATA, * where it will be archived as part of the backup dump. The label file * contains the user-supplied label string (typically this would be used * to tell where the backup dump will be stored) and the starting time and * starting WAL location for the dump. */ Datum pg_start_backup(PG_FUNCTION_ARGS) { text *backupid = PG_GETARG_TEXT_PP(0); // 参数1:用来唯一标识这次备份操作的任意字符串 // 默认情况下,pg_start_backup可能需要较长的时间完成。 这是因为它会执行一个检查点,并且该检查点所需要的 I/O 将会分散到一段 显著的时间上,默认情况下是你的检查点间隔(见配置参数 checkpoint_completion_target)的一半。这通常 是你所想要的,因为它可以最小化对查询处理的影响。如果你想要尽可能快地 开始备份,请把第二个参数改成true,这将会发出一个立即的检查点并且使用尽可能多的I/O。 bool fast = PG_GETARG_BOOL(1); bool exclusive = PG_GETARG_BOOL(2); // 开始一次非排他基础备份 char *backupidstr; XLogRecPtr startpoint; SessionBackupState status = get_backup_status(); backupidstr = text_to_cstring(backupid); if (status == SESSION_BACKUP_NON_EXCLUSIVE) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("a backup is already in progress in this session"))); if (exclusive) // 是否排他备份 { startpoint = do_pg_start_backup(backupidstr, fast, NULL, NULL, NULL, NULL); } else { MemoryContext oldcontext; /* Label file and tablespace map file need to be long-lived, since they are read in pg_stop_backup. */ oldcontext = MemoryContextSwitchTo(TopMemoryContext); label_file = makeStringInfo(); tblspc_map_file = makeStringInfo(); MemoryContextSwitchTo(oldcontext); register_persistent_abort_backup_handler(); startpoint = do_pg_start_backup(backupidstr, fast, NULL, label_file, NULL, tblspc_map_file); } PG_RETURN_LSN(startpoint); // 返回LSN }

实际实现在do_pg_start_backup中,主要工作:

  • 强制开启full_page_writes = on, 备份结束再还原
  • 切换到一个新的WAL日志文件,命名规则如下: (方便进行日志归档,拷贝等操作)
/* Generate a WAL segment file name.*/ #define XLogFileName(fname, tli, logSegNo, wal_segsz_bytes) \ snprintf(fname, MAXFNAMELEN, "%08X%08X%08X", tli, \ (uint32) ((logSegNo) / XLogSegmentsPerXLogId(wal_segsz_bytes)), \ (uint32) ((logSegNo) % XLogSegmentsPerXLogId(wal_segsz_bytes)))
  • 进行checkpoint
  • 构造backup_lable文件,存储检查点位置等信息
    返回最小的WAL LSN,以及timeline。这个LSN表示备份恢复需要的起始WAL日志的位置。
/* * do_pg_start_backup * * Utility function called at the start of an online backup. It creates the * necessary starting checkpoint and constructs the backup label file. * Returns the minimum WAL location that must be present to restore from this * backup, and the corresponding timeline ID in *starttli_p. */ XLogRecPtr do_pg_start_backup(const char *backupidstr, bool fast, TimeLineID *starttli_p, StringInfo labelfile, List **tablespaces, StringInfo tblspcmapfile) { bool exclusive = (labelfile == NULL); bool backup_started_in_recovery = false; XLogRecPtr checkpointloc; XLogRecPtr startpoint; TimeLineID starttli; pg_time_t stamp_time; char strfbuf[128]; char xlogfilename[MAXFNAMELEN]; XLogSegNo _logSegNo; struct stat stat_buf; FILE *fp; backup_started_in_recovery = RecoveryInProgress(); // 在恢复阶段,不能进行排他备份 /* Currently only non-exclusive backup can be taken during recovery.*/ if (backup_started_in_recovery && exclusive) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("recovery is in progress"), errhint("WAL control functions cannot be executed during recovery."))); /* During recovery, we don't need to check WAL level. Because, if WAL * level is not sufficient, it's impossible to get here during recovery. */ if (!backup_started_in_recovery && !XLogIsNeeded()) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("WAL level not sufficient for making an online backup"), errhint("wal_level must be set to \"replica\" or \"logical\" at server start."))); // ... /* * Mark backup active in shared memory. We must do full-page WAL writes * during an on-line backup even if not doing so at other times, because * it's quite possible for the backup dump to obtain a "torn" (partially * written) copy of a database page if it reads the page concurrently with * our write to the same page. This can be fixed as long as the first * write to the page in the WAL sequence is a full-page write. Hence, we * turn on forcePageWrites and then force a CHECKPOINT, to ensure there * are no dirty pages in shared memory that might get dumped while the * backup is in progress without having a corresponding WAL record. (Once * the backup is complete, we need not force full-page writes anymore, * since we expect that any pages not modified during the backup interval * must have been correctly captured by the backup.) * * Note that forcePageWrites has no effect during an online backup from * the standby. * * We must hold all the insertion locks to change the value of * forcePageWrites, to ensure adequate interlocking against * XLogInsertRecord(). */ WALInsertLockAcquireExclusive(); if (exclusive) { /* At first, mark that we're now starting an exclusive backup, to * ensure that there are no other sessions currently running pg_start_backup() or pg_stop_backup(). */ if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_NONE) { WALInsertLockRelease(); ereport(ERROR,(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("a backup is already in progress"), errhint("Run pg_stop_backup() and try again."))); } XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STARTING; } else XLogCtl->Insert.nonExclusiveBackups++; XLogCtl->Insert.forcePageWrites = true; /* 强制开启full_page_writes */ WALInsertLockRelease(); /* Ensure we release forcePageWrites if fail below */ PG_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive)); { bool gotUniqueStartpoint = false; DIR *tblspcdir; struct dirent *de; tablespaceinfo *ti; int datadirpathlen; /* * Force an XLOG file switch before the checkpoint, to ensure that the * WAL segment the checkpoint is written to doesn't contain pages with * old timeline IDs. That would otherwise happen if you called * pg_start_backup() right after restoring from a PITR archive: the * first WAL segment containing the startup checkpoint has pages in * the beginning with the old timeline ID. That can cause trouble at * recovery: we won't have a history file covering the old timeline if * pg_wal directory was not included in the base backup and the WAL * archive was cleared too before starting the backup. * * This also ensures that we have emitted a WAL page header that has * XLP_BKP_REMOVABLE off before we emit the checkpoint record. * Therefore, if a WAL archiver (such as pglesslog) is trying to * compress out removable backup blocks, it won't remove any that * occur after this point. * * During recovery, we skip forcing XLOG file switch, which means that * the backup taken during recovery is not available for the special * recovery case described above. */ if (!backup_started_in_recovery) RequestXLogSwitch(false); // 切换到一个新的WAL日志文件,默认是16MB后才切换 do { bool checkpointfpw; // 进行强制checkpoint /* * Force a CHECKPOINT. Aside from being necessary to prevent torn * page problems, this guarantees that two successive backup runs * will have different checkpoint positions and hence different * history file names, even if nothing happened in between. * * During recovery, establish a restartpoint if possible. We use * the last restartpoint as the backup starting checkpoint. This * means that two successive backup runs can have same checkpoint * positions. * * Since the fact that we are executing do_pg_start_backup() * during recovery means that checkpointer is running, we can use * RequestCheckpoint() to establish a restartpoint. * * We use CHECKPOINT_IMMEDIATE only if requested by user (via * passing fast = true). Otherwise this can take awhile. */ RequestCheckpoint(CHECKPOINT_FORCE | CHECKPOINT_WAIT | (fast ? CHECKPOINT_IMMEDIATE : 0)); /* * Now we need to fetch the checkpoint record location, and also * its REDO pointer. The oldest point in WAL that would be needed * to restore starting from the checkpoint is precisely the REDO pointer. */ LWLockAcquire(ControlFileLock, LW_SHARED); checkpointloc = ControlFile->checkPoint; // 获取最新的检查点信息 startpoint = ControlFile->checkPointCopy.redo; starttli = ControlFile->checkPointCopy.ThisTimeLineID; checkpointfpw = ControlFile->checkPointCopy.fullPageWrites; LWLockRelease(ControlFileLock); // ... /* * If two base backups are started at the same time (in WAL sender * processes), we need to make sure that they use different * checkpoints as starting locations, because we use the starting * WAL location as a unique identifier for the base backup in the * end-of-backup WAL record and when we write the backup history * file. Perhaps it would be better generate a separate unique ID * for each backup instead of forcing another checkpoint, but * taking a checkpoint right after another is not that expensive * either because only few buffers have been dirtied yet. */ WALInsertLockAcquireExclusive(); if (XLogCtl->Insert.lastBackupStart < startpoint) { XLogCtl->Insert.lastBackupStart = startpoint; gotUniqueStartpoint = true; } WALInsertLockRelease(); } while (!gotUniqueStartpoint); XLByteToSeg(startpoint, _logSegNo, wal_segment_size); //Compute a segment number from an XLogRecPtr. XLogFileName(xlogfilename, starttli, _logSegNo, wal_segment_size); // 生成WAL日志文件名 /* Construct tablespace_map file. */ if (tblspcmapfile == NULL) tblspcmapfile = makeStringInfo(); datadirpathlen = strlen(DataDir); /* Collect information about all tablespaces */ tblspcdir = AllocateDir("pg_tblspc"); while ((de = ReadDir(tblspcdir, "pg_tblspc")) != NULL) { // ... } FreeDir(tblspcdir); //创建backup_label文件,构造信息 /* Construct backup label file. */ if (labelfile == NULL) labelfile = makeStringInfo(); /* Use the log timezone here, not the session timezone */ stamp_time = (pg_time_t) time(NULL); pg_strftime(strfbuf, sizeof(strfbuf), "%Y-%m-%d %H:%M:%S %Z", pg_localtime(&stamp_time, log_timezone)); appendStringInfo(labelfile, "START WAL LOCATION: %X/%X (file %s)\n", LSN_FORMAT_ARGS(startpoint), xlogfilename); appendStringInfo(labelfile, "CHECKPOINT LOCATION: %X/%X\n", LSN_FORMAT_ARGS(checkpointloc)); appendStringInfo(labelfile, "BACKUP METHOD: %s\n", exclusive ? "pg_start_backup" : "streamed"); appendStringInfo(labelfile, "BACKUP FROM: %s\n", backup_started_in_recovery ? "standby" : "primary"); appendStringInfo(labelfile, "START TIME: %s\n", strfbuf); appendStringInfo(labelfile, "LABEL: %s\n", backupidstr); appendStringInfo(labelfile, "START TIMELINE: %u\n", starttli); // 写backup_lable文件到磁盘 /* Okay, write the file, or return its contents to caller. */ if (exclusive) { /* Check for existing backup label --- implies a backup is already * running. (XXX given that we checked exclusiveBackupState * above, maybe it would be OK to just unlink any such label file?) */ if (stat(BACKUP_LABEL_FILE, &stat_buf) != 0) { if (errno != ENOENT) ereport(ERROR, (errcode_for_file_access(), errmsg("could not stat file \"%s\": %m", BACKUP_LABEL_FILE))); } else ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("a backup is already in progress"), errhint("If you're sure there is no backup in progress, remove file \"%s\" and try again.", BACKUP_LABEL_FILE))); fp = AllocateFile(BACKUP_LABEL_FILE, "w"); if (!fp) ereport(ERROR,(errcode_for_file_access(), errmsg("could not create file \"%s\": %m",BACKUP_LABEL_FILE))); if (fwrite(labelfile->data, labelfile->len, 1, fp) != 1 ||fflush(fp) != 0 ||pg_fsync(fileno(fp)) != 0 ||ferror(fp) ||FreeFile(fp)) ereport(ERROR,(errcode_for_file_access(), errmsg("could not write file \"%s\": %m",BACKUP_LABEL_FILE))); /* Allocated locally for exclusive backups, so free separately */ pfree(labelfile->data); pfree(labelfile); /* Write backup tablespace_map file. */ if (tblspcmapfile->len > 0) { if (stat(TABLESPACE_MAP, &stat_buf) != 0) { if (errno != ENOENT) ereport(ERROR,(errcode_for_file_access(), errmsg("could not stat file \"%s\": %m",TABLESPACE_MAP))); } else ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("a backup is already in progress"), errhint("If you're sure there is no backup in progress, remove file \"%s\" and try again.", TABLESPACE_MAP))); fp = AllocateFile(TABLESPACE_MAP, "w"); if (!fp) ereport(ERROR,(errcode_for_file_access(), errmsg("could not create file \"%s\": %m",TABLESPACE_MAP))); if (fwrite(tblspcmapfile->data, tblspcmapfile->len, 1, fp) != 1 || fflush(fp) != 0 ||pg_fsync(fileno(fp)) != 0 ||ferror(fp) ||FreeFile(fp)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not write file \"%s\": %m", TABLESPACE_MAP))); } /* Allocated locally for exclusive backups, so free separately */ pfree(tblspcmapfile->data); pfree(tblspcmapfile); } } PG_END_ENSURE_ERROR_CLEANUP(pg_start_backup_callback, (Datum) BoolGetDatum(exclusive)); /* * Mark that start phase has correctly finished for an exclusive backup. * Session-level locks are updated as well to reflect that state. * * Note that CHECK_FOR_INTERRUPTS() must not occur while updating backup * counters and session-level lock. Otherwise they can be updated * inconsistently, and which might cause do_pg_abort_backup() to fail. */ if (exclusive) { WALInsertLockAcquireExclusive(); XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_IN_PROGRESS; /* Set session-level lock */ sessionBackupState = SESSION_BACKUP_EXCLUSIVE; WALInsertLockRelease(); } else sessionBackupState = SESSION_BACKUP_NON_EXCLUSIVE; /* We're done. As a convenience, return the starting WAL location.*/ if (starttli_p) *starttli_p = starttli; return startpoint; }

执行如下命令:

postgres=# select pg_start_backup('bak1'); pg_start_backup ----------------- 7/F7000148 (1 row) -- 生成的backup_label文件内容 postgres@slpc:~/pgsql/pgdata$ cat backup_label START WAL LOCATION: 7/F7000148 (file 0000000100000007000000F7) CHECKPOINT LOCATION: 7/F7000180 BACKUP METHOD: pg_start_backup BACKUP FROM: primary START TIME: 2023-09-15 15:05:13 CST LABEL: bak1 START TIMELINE: 1

pg_stop_backup

结束备份操作,主要内容如下:

  • 如果强制开启了full_page_writes,则关闭
  • 写入一条备份结束的XLOG记录
  • 切换WAL段文件
  • 创建一个备份历史记录文件
  • 删除backup_label文件, 这个文件最开始是放在源数据库实例目录下,必须删除,不然源数据库重启时,会读该文件从而影响正常的恢复过程。
pg_stop_backup ( exclusive boolean [, wait_for_archive boolean ] ) → setof record ( lsn pg_lsn, labelfile text, spcmapfile text )

完成排他或非排他联机备份。exclusive参数必须与前面的pg_start_backup调用相匹配。 在排他备份中, pg_stop_backup删除备份标签文件,如果存在,则删除pg_start_backup创建的表空间映射文件。 在非排他备份中,这些文件的所需内容将作为函数结果的一部分返回,并且应该写入备份区域(不在数据目录)中的文件。

还有一个可选的boolean类型的第二个参数。如果为假,则该函数将在备份完成后立即返回,而无需等待WAL被归档。 这种行为只有在独立监控WAL归档的备份软件中才有用。否则,使备份一致所需的WAL可能会丢失,从而使备份无效。 默认情况下或当该参数为真时,pg_stop_backup将在启用归档时等待WAL被归档。 (在备用状态下,这意味着只有当archive_mode = always时,它才会等待。 如果主节点上的写活动很少,那么可以在主节点上运行pg_switch_wal来触发立即段切换。)

当在主节点上执行时,这个函数还会在预写式日志归档区域中创建一个备份历史文件。 历史文件包括给予pg_start_backup的标签,备份的开始和结束写前预写式日志的位置,以及备份的开始和结束时间。 记录完结束位置后,当前的预写式日志插入点自动移到下一个预写式日志文件,以便结束的预写式日志文件可以立即归档,从而完成备份。

该函数的结果是一条记录。lsn列保持备份的结束预写式日志位置(可以再忽略)。 当结束排他备份时,第二和第三列为NULL;在非排他备份之后,它们保持标签和表空间映射文件所需的内容。

还有另外一个函数,无参数。

pg_stop_backup () → pg_lsn

结束执行排他在线备份。这个简化版本等同于pg_stop_backup(true, true),只是它只返回pg_lsn结果。

源码如下:

/* * pg_stop_backup: finish taking an on-line backup dump * * We write an end-of-backup WAL record, and remove the backup label file * created by pg_start_backup, creating a backup history file in pg_wal * instead (whence it will immediately be archived). The backup history file * contains the same info found in the label file, plus the backup-end time * and WAL location. * * Note: this version is only called to stop an exclusive backup. The function * pg_stop_backup_v2 (overloaded as pg_stop_backup in SQL) is called to stop non-exclusive backups. */ Datum pg_stop_backup(PG_FUNCTION_ARGS) { XLogRecPtr stoppoint; SessionBackupState status = get_backup_status(); if (status == SESSION_BACKUP_NON_EXCLUSIVE) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("non-exclusive backup in progress"), errhint("Did you mean to use pg_stop_backup('f')?"))); /* * Exclusive backups were typically started in a different connection, so * don't try to verify that status of backup is set to * SESSION_BACKUP_EXCLUSIVE in this function. Actual verification that an * exclusive backup is in fact running is handled inside * do_pg_stop_backup. */ stoppoint = do_pg_stop_backup(NULL, true, NULL); PG_RETURN_LSN(stoppoint); } /* * do_pg_stop_backup * * Utility function called at the end of an online backup. It cleans up the * backup state and can optionally wait for WAL segments to be archived. * * If labelfile is NULL, this stops an exclusive backup. Otherwise this stops * the non-exclusive backup specified by 'labelfile'. * * Returns the last WAL location that must be present to restore from this * backup, and the corresponding timeline ID in *stoptli_p. */ XLogRecPtr do_pg_stop_backup(char *labelfile, bool waitforarchive, TimeLineID *stoptli_p) { bool exclusive = (labelfile == NULL); bool backup_started_in_recovery = false; XLogRecPtr startpoint; XLogRecPtr stoppoint; TimeLineID stoptli; pg_time_t stamp_time; char strfbuf[128]; char histfilepath[MAXPGPATH]; char startxlogfilename[MAXFNAMELEN]; char stopxlogfilename[MAXFNAMELEN]; char lastxlogfilename[MAXFNAMELEN]; char histfilename[MAXFNAMELEN]; char backupfrom[20]; XLogSegNo _logSegNo; FILE *lfp; FILE *fp; char ch; int seconds_before_warning; int waits = 0; bool reported_waiting = false; char *remaining; char *ptr; uint32 hi,lo; // ... if (exclusive) { /* * At first, mark that we're now stopping an exclusive backup, to * ensure that there are no other sessions currently running * pg_start_backup() or pg_stop_backup(). */ WALInsertLockAcquireExclusive(); if (XLogCtl->Insert.exclusiveBackupState != EXCLUSIVE_BACKUP_IN_PROGRESS) { WALInsertLockRelease(); ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("exclusive backup not in progress"))); } XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_STOPPING; WALInsertLockRelease(); /* * Remove backup_label. In case of failure, the state for an exclusive * backup is switched back to in-progress. */ PG_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive)); { // ... // 删除backup_label文件 /* * Close and remove the backup label file */ if (r != 1 || ferror(lfp) || FreeFile(lfp)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not read file \"%s\": %m", BACKUP_LABEL_FILE))); durable_unlink(BACKUP_LABEL_FILE, ERROR); /* * Remove tablespace_map file if present, it is created only if * there are tablespaces. */ durable_unlink(TABLESPACE_MAP, DEBUG1); } PG_END_ENSURE_ERROR_CLEANUP(pg_stop_backup_callback, (Datum) BoolGetDatum(exclusive)); } /* * OK to update backup counters, forcePageWrites and session-level lock. * * Note that CHECK_FOR_INTERRUPTS() must not occur while updating them. * Otherwise they can be updated inconsistently, and which might cause * do_pg_abort_backup() to fail. */ WALInsertLockAcquireExclusive(); if (exclusive) { XLogCtl->Insert.exclusiveBackupState = EXCLUSIVE_BACKUP_NONE; } else { /* * The user-visible pg_start/stop_backup() functions that operate on * exclusive backups can be called at any time, but for non-exclusive * backups, it is expected that each do_pg_start_backup() call is * matched by exactly one do_pg_stop_backup() call. */ Assert(XLogCtl->Insert.nonExclusiveBackups > 0); XLogCtl->Insert.nonExclusiveBackups--; } if (XLogCtl->Insert.exclusiveBackupState == EXCLUSIVE_BACKUP_NONE && XLogCtl->Insert.nonExclusiveBackups == 0) { XLogCtl->Insert.forcePageWrites = false; // 关闭强制full_page_writes } /* * Clean up session-level lock. * * You might think that WALInsertLockRelease() can be called before * cleaning up session-level lock because session-level lock doesn't need * to be protected with WAL insertion lock. But since * CHECK_FOR_INTERRUPTS() can occur in it, session-level lock must be * cleaned up before it. */ sessionBackupState = SESSION_BACKUP_NONE; WALInsertLockRelease(); /* * Read and parse the START WAL LOCATION line (this code is pretty crude, * but we are not expecting any variability in the file format). */ if (sscanf(labelfile, "START WAL LOCATION: %X/%X (file %24s)%c", &hi, &lo, startxlogfilename, &ch) != 4 || ch != '\n') ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); startpoint = ((uint64) hi) << 32 | lo; remaining = strchr(labelfile, '\n') + 1; /* %n is not portable enough */ /* * Parse the BACKUP FROM line. If we are taking an online backup from the * standby, we confirm that the standby has not been promoted during the * backup. */ ptr = strstr(remaining, "BACKUP FROM:"); if (!ptr || sscanf(ptr, "BACKUP FROM: %19s\n", backupfrom) != 1) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("invalid data in file \"%s\"", BACKUP_LABEL_FILE))); if (strcmp(backupfrom, "standby") == 0 && !backup_started_in_recovery) ereport(ERROR, (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), errmsg("the standby was promoted during online backup"), errhint("This means that the backup being taken is corrupt " "and should not be used. " "Try taking another online backup."))); /* * During recovery, we don't write an end-of-backup record. We assume that * pg_control was backed up last and its minimum recovery point can be * available as the backup end location. Since we don't have an * end-of-backup record, we use the pg_control value to check whether * we've reached the end of backup when starting recovery from this * backup. We have no way of checking if pg_control wasn't backed up last * however. * * We don't force a switch to new WAL file but it is still possible to * wait for all the required files to be archived if waitforarchive is * true. This is okay if we use the backup to start a standby and fetch * the missing WAL using streaming replication. But in the case of an * archive recovery, a user should set waitforarchive to true and wait for * them to be archived to ensure that all the required files are * available. * * We return the current minimum recovery point as the backup end * location. Note that it can be greater than the exact backup end * location if the minimum recovery point is updated after the backup of * pg_control. This is harmless for current uses. * * XXX currently a backup history file is for informational and debug * purposes only. It's not essential for an online backup. Furthermore, * even if it's created, it will not be archived during recovery because * an archiver is not invoked. So it doesn't seem worthwhile to write a * backup history file during recovery. */ if (backup_started_in_recovery) { // ... } else { // 写入一条备份结束XLOG记录 /* Write the backup-end xlog record */ XLogBeginInsert(); XLogRegisterData((char *) (&startpoint), sizeof(startpoint)); stoppoint = XLogInsert(RM_XLOG_ID, XLOG_BACKUP_END); stoptli = ThisTimeLineID; /* * Force a switch to a new xlog segment file, so that the backup is * valid as soon as archiver moves out the current segment file. */ RequestXLogSwitch(false); // 切换日志段文件,以便尽快归档,减少等待归档结束的时间 XLByteToPrevSeg(stoppoint, _logSegNo, wal_segment_size); XLogFileName(stopxlogfilename, stoptli, _logSegNo, wal_segment_size); /* Use the log timezone here, not the session timezone */ stamp_time = (pg_time_t) time(NULL); pg_strftime(strfbuf, sizeof(strfbuf),"%Y-%m-%d %H:%M:%S %Z", pg_localtime(&stamp_time, log_timezone)); /* Write the backup history file */ XLByteToSeg(startpoint, _logSegNo, wal_segment_size); BackupHistoryFilePath(histfilepath, stoptli, _logSegNo, startpoint, wal_segment_size); fp = AllocateFile(histfilepath, "w"); if (!fp) ereport(ERROR, (errcode_for_file_access(), errmsg("could not create file \"%s\": %m", histfilepath))); fprintf(fp, "START WAL LOCATION: %X/%X (file %s)\n", LSN_FORMAT_ARGS(startpoint), startxlogfilename); fprintf(fp, "STOP WAL LOCATION: %X/%X (file %s)\n", LSN_FORMAT_ARGS(stoppoint), stopxlogfilename); /* Transfer remaining lines including label and start timeline to history file.*/ fprintf(fp, "%s", remaining); fprintf(fp, "STOP TIME: %s\n", strfbuf); fprintf(fp, "STOP TIMELINE: %u\n", stoptli); if (fflush(fp) || ferror(fp) || FreeFile(fp)) ereport(ERROR, (errcode_for_file_access(), errmsg("could not write file \"%s\": %m", histfilepath))); /* Clean out any no-longer-needed history files. As a side effect, * this will post a .ready file for the newly created history file, * notifying the archiver that history file may be archived immediately. */ CleanupBackupHistory(); } // 等待归档结束 /* * If archiving is enabled, wait for all the required WAL files to be * archived before returning. If archiving isn't enabled, the required WAL * needs to be transported via streaming replication (hopefully with * wal_keep_size set high enough), or some more exotic mechanism like * polling and copying files from pg_wal with script. We have no knowledge * of those mechanisms, so it's up to the user to ensure that he gets all * the required WAL. * * We wait until both the last WAL file filled during backup and the * history file have been archived, and assume that the alphabetic sorting * property of the WAL files ensures any earlier WAL files are safely * archived as well. * * We wait forever, since archive_command is supposed to work and we * assume the admin wanted his backup to work completely. If you don't * wish to wait, then either waitforarchive should be passed in as false, * or you can set statement_timeout. Also, some notices are issued to * clue in anyone who might be doing this interactively. */ if (waitforarchive && ((!backup_started_in_recovery && XLogArchivingActive()) || (backup_started_in_recovery && XLogArchivingAlways()))) { XLByteToPrevSeg(stoppoint, _logSegNo, wal_segment_size); XLogFileName(lastxlogfilename, stoptli, _logSegNo, wal_segment_size); XLByteToSeg(startpoint, _logSegNo, wal_segment_size); BackupHistoryFileName(histfilename, stoptli, _logSegNo, startpoint, wal_segment_size); seconds_before_warning = 60; waits = 0; while (XLogArchiveIsBusy(lastxlogfilename) || XLogArchiveIsBusy(histfilename)) { CHECK_FOR_INTERRUPTS(); if (!reported_waiting && waits > 5) { ereport(NOTICE, (errmsg("base backup done, waiting for required WAL segments to be archived"))); reported_waiting = true; } pgstat_report_wait_start(WAIT_EVENT_BACKUP_WAIT_WAL_ARCHIVE); pg_usleep(1000000L); pgstat_report_wait_end(); if (++waits >= seconds_before_warning) { seconds_before_warning *= 2; /* This wraps in >10 years... */ ereport(WARNING, (errmsg("still waiting for all required WAL segments to be archived (%d seconds elapsed)", waits), errhint("Check that your archive_command is executing properly. " "You can safely cancel this backup, " "but the database backup will not be usable without all the WAL segments."))); } } ereport(NOTICE, (errmsg("all required WAL segments have been archived"))); } else if (waitforarchive) ereport(NOTICE, (errmsg("WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup"))); /* We're done. As a convenience, return the ending WAL location.*/ if (stoptli_p) *stoptli_p = stoptli; return stoppoint; }

恢复过程

可参考PostgreSQL源码分析——备份恢复


参考文档:
9.27. 系统管理函数
He3DB恢复过程源码分析系列

最后修改时间:2024-02-20 17:34:57
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

文章被以下合辑收录

评论