暂无图片
暂无图片
3
暂无图片
暂无图片
暂无图片

8_PostgreSQL XLOG Record内部结构解析_Non-Backup Block

8_PostgreSQL XLOG Record内部结构解析_Non-Backup Block

内容概述

PG WAL中的XLOG Record记录了数据库的变更日志信息,类似Oracle中Redo Log的VC(change vector),XLOG Record由header portion 和data portion两部分组成,data portion由header part和data part两部分组成,由于full_page_writes特性,data portion的类型可分为 Backup Block、Non-Backup Block和其它。本文对Non-Backup Block结构进行详细解析。

XLOG Record结构说明

XLOG Record组成

XLOG Record由header portion 和data portion两部分组成。

image-20221028154042676

data portion组成

data portion由header part和data part两部分组成

image-20221028170738866

data portion类型

data portion的类型可分为 Backup Block、Non-Backup Block和其它

image-20221028170845374

测试脚本

### step 1. 创建table t1切换日志并查询当前日志 drop table t1; create table t1(a1 varchar); insert into t1 values(33333333); checkpoint; select pg_switch_wal(); SELECT pg_walfile_name(pg_current_wal_lsn()); ### step 2. 插入一条记录,并生成新的日志文件 insert into t1 values(66666666); insert into t1 values(99999999); select pg_switch_wal(); SELECT pg_walfile_name(pg_current_wal_lsn()); postgres=# SELECT pg_walfile_name(pg_current_wal_lsn()); pg_walfile_name -------------------------- 00000001000000000000000F (1 row) postgres=# select * from t1; postgres=# select pg_relation_filepath('t1'); pg_relation_filepath ---------------------- base/5/32783 (1 row) ### step 3. dump日志二进制格式 [postgres@enmo pgwal]$ dd if=00000001000000000000000F bs=8192 skip=0 seek=0 count=1 |hexdump -C 00000000 10 d1 06 00 01 00 00 00 00 00 00 0f 00 00 00 00 |................| 00000010 00 00 00 00 00 00 00 00 c3 16 52 34 5f 3e 59 63 |..........R4_>Yc| 00000020 00 00 00 01 00 20 00 00 32 00 00 00 00 00 00 00 |..... ..2.......| 00000030 c8 ad 01 0e 00 00 00 00 10 08 00 00 67 f8 a5 5d |............g..]| 00000040 ff 18 00 00 00 00 00 00 00 00 00 b3 d3 52 f7 02 |.............R..| 00000050 00 00 f7 02 00 00 f6 02 00 00 00 00 00 00 00 00 |................| 00000060 a6 00 00 00 f7 02 00 00 28 00 00 0f 00 00 00 00 |........(.......| 00000070 00 0a 00 00 c2 0a aa c2 00 10 00 00 70 00 20 00 |............p. .| 00000080 03 7f 06 00 00 05 00 00 00 0f 80 00 00 00 00 00 |................| 00000090 00 ff 03 00 00 00 00 b8 ac 01 0e 00 00 00 00 20 |............... | 000000a0 00 b0 1f 00 20 04 20 00 00 00 00 d8 9f 42 00 b0 |.... . ......B..| 000000b0 9f 42 00 f7 02 00 00 00 00 00 00 00 00 00 00 00 |.B..............| 000000c0 00 00 00 02 00 01 00 02 08 18 00 13 36 36 36 36 |............6666| 000000d0 36 36 36 36 00 00 00 00 00 00 00 f6 02 00 00 00 |6666............| 000000e0 00 00 00 00 00 00 00 00 00 00 00 01 00 01 00 02 |................| 000000f0 08 18 00 13 33 33 33 33 33 33 33 33 00 00 00 00 |....33333333....| 00000100 00 00 00 02 00 00 00 00 22 00 00 00 f7 02 00 00 |........".......| 00000110 60 00 00 0f 00 00 00 00 00 01 00 00 0a 44 6b 1f |`............Dk.| 00000120 ff 08 33 bc 06 df 12 8f 02 00 00 00 00 00 00 00 |..3.............| 00000130 40 00 00 00 f8 02 00 00 08 01 00 0f 00 00 00 00 |@...............| 00000140 00 0a 00 00 ed 2a 59 40 00 20 0f 00 7f 06 00 00 |.....*Y@. ......| 00000150 05 00 00 00 0f 80 00 00 00 00 00 00 ff 03 01 00 |................| 00000160 02 08 18 00 13 39 39 39 39 39 39 39 39 03 00 00 |.....99999999...| 00000170 22 00 00 00 f8 02 00 00 30 01 00 0f 00 00 00 00 |".......0.......| 00000180 00 01 00 00 4d 92 8c cb ff 08 6a f7 06 df 12 8f |....M.....j.....| 00000190 02 00 00 00 00 00 00 00 18 00 00 00 00 00 00 00 |................| 000001a0 70 01 00 0f 00 00 00 00 40 00 00 00 36 e1 24 75 |p.......@...6.$u| 000001b0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 00002000 [postgres@enmo pgwal]$ ### step 4. pg_waldump日志信息 [postgres@enmo pgwal]$ pg_waldump 00000001000000000000000F rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/0F000028, prev 0/0E01ADC8, desc: RUNNING_XACTS nextXid 759 latestCompletedXid 758 oldestRunningXid 759 rmgr: Heap len (rec/tot): 54/ 166, tx: 759, lsn: 0/0F000060, prev 0/0F000028, desc: INSERT off 2 flags 0x00, blkref #0: rel 1663/5/32783 blk 0 FPW rmgr: Transaction len (rec/tot): 34/ 34, tx: 759, lsn: 0/0F000108, prev 0/0F000060, desc: COMMIT 2022-10-28 16:32:47.365171 CST rmgr: Heap len (rec/tot): 64/ 64, tx: 760, lsn: 0/0F000130, prev 0/0F000108, desc: INSERT off 3 flags 0x00, blkref #0: rel 1663/5/32783 blk 0 rmgr: Transaction len (rec/tot): 34/ 34, tx: 760, lsn: 0/0F000170, prev 0/0F000130, desc: COMMIT 2022-10-28 16:32:47.380330 CST rmgr: XLOG len (rec/tot): 24/ 24, tx: 0, lsn: 0/0F000198, prev 0/0F000170, desc: SWITCH [postgres@enmo pgwal]$ ### step 5. 安装pageinspect扩展 cd /soft/postgresql-15beta1/contrib/pageinspect postgres=# create extension pageinspect; CREATE EXTENSION postgres=# postgres=# select * from heap_page_items(get_raw_page('t1', 0)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data ----+--------+----------+--------+--------+--------+----------+--------+-------------+------------+--------+--------+-------+---------------------- 1 | 8152 | 1 | 33 | 758 | 0 | 0 | (0,1) | 1 | 2050 | 24 | | | \x133333333333333333 2 | 8112 | 1 | 33 | 759 | 0 | 0 | (0,2) | 1 | 2050 | 24 | | | \x133636363636363636 3 | 8072 | 1 | 33 | 760 | 0 | 0 | (0,3) | 1 | 2050 | 24 | | | \x133939393939393939 (3 rows) postgres=# ### step 6. 查询表空间信息 postgres=# select * from pg_tablespace; oid | spcname | spcowner | spcacl | spcoptions ------+------------+----------+--------+------------ 1663 | pg_default | 10 | | 1664 | pg_global | 10 | | (2 rows) postgres=#

WAL结构解析

第一部分为XLogLongPageHeaderData结构,上篇文章已解析过,后面6个XLogRecord结构,我们只解析2个insert操作,分别对应data portion类型的 Backup Block和Non-Backup Block。

image-20221028165216782

Non-Backup Block 结构解析

insert语句产生的non-backup由以下4个数据结构体和1个数据对象组成,

  1. 结构体header-portion of XLogRecord
  2. 结构体XLogRecordBlockHeader
  3. 结构体XLogRecordDataHeaderShort
  4. an inserted tuple (to be exact, a xl_heap_header structure and an inserted data entire)
  5. 结构体xl_heap_insert

image-20221028170445111

解析header-portion of XLogRecord

结构体定义

/* * The overall layout of an XLOG record is: * Fixed-size header (XLogRecord struct) * XLogRecordBlockHeader struct * XLogRecordBlockHeader struct * ... * XLogRecordDataHeader[Short|Long] struct * block data * block data * ... * main data * * There can be zero or more XLogRecordBlockHeaders, and 0 or more bytes of * rmgr-specific data not associated with a block. XLogRecord structs * always start on MAXALIGN boundaries in the WAL files, but the rest of * the fields are not aligned. * * The XLogRecordBlockHeader, XLogRecordDataHeaderShort and * XLogRecordDataHeaderLong structs all begin with a single 'id' byte. It's * used to distinguish between block references, and the main data structs. */ typedef struct XLogRecord <--大小:24 bytes { uint32 xl_tot_len; /* total len of entire record */ --typedef unsigned int uint32; /* == 32 bits */ TransactionId xl_xid; /* xact id */ --typedef uint32 TransactionId; XLogRecPtr xl_prev; /* ptr to previous record in log */--typedef uint64 XLogRecPtr; uint8 xl_info; /* flag bits, see below */ RmgrId xl_rmid; /* resource manager for this record */--typedef uint8 RmgrId; /* 2 bytes of padding here, initialize to zero */ pg_crc32c xl_crc; /* CRC for this record */--typedef uint32 pg_crc32c; /* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */ } XLogRecord;

XLogRecord结构体:./src/include/access/xlogrecord.h

二进制解析

header-portion of XLogRecord由前24bytes组成

image-20221028173223476

uint32 xl_tot_len: 40 00 00 00 TransactionId xl_xid: f8 02 00 00 XLogRecPtr xl_prev: 08 01 00 0f 00 00 00 00 uint8 xl_info: 00 RmgrId xl_rmid: 0a /* 2 bytes of padding here, initialize to zero */ pg_crc32c xl_crc: ed 2a 59 40

解析XLogRecordBlockHeader of data portion

结构体定义

### XLogRecordBlockHeader /* * Header info for block data appended to an XLOG record. * * 'data_length' is the length of the rmgr-specific payload data associated * with this block. It does not include the possible full page image, nor * XLogRecordBlockHeader struct itself. * * Note that we don't attempt to align the XLogRecordBlockHeader struct! * So, the struct must be copied to aligned local storage before use. */ typedef struct XLogRecordBlockHeader { uint8 id; /* block reference ID */ uint8 fork_flags; /* fork within the relation, and flags */ uint16 data_length; /* number of payload bytes (not including page * image) */ /* If BKPBLOCK_HAS_IMAGE, an XLogRecordBlockImageHeader struct follows */ /* If BKPBLOCK_SAME_REL is not set, a RelFileNode follows */ /* BlockNumber follows */ } XLogRecordBlockHeader; #define SizeOfXLogRecordBlockHeader (offsetof(XLogRecordBlockHeader, data_length) + sizeof(uint16)) ### RelFileNode /* * RelFileNode must provide all that we need to know to physically access * a relation, with the exception of the backend ID, which can be provided * separately. Note, however, that a "physical" relation is comprised of * multiple files on the filesystem, as each fork is stored as a separate * file, and each fork can be divided into multiple segments. See md.c. * * spcNode identifies the tablespace of the relation. It corresponds to * pg_tablespace.oid. * * dbNode identifies the database of the relation. It is zero for * "shared" relations (those common to all databases of a cluster). * Nonzero dbNode values correspond to pg_database.oid. * * relNode identifies the specific relation. relNode corresponds to * pg_class.relfilenode (NOT pg_class.oid, because we need to be able * to assign new physical files to relations in some situations). * Notice that relNode is only unique within a database in a particular * tablespace. * * Note: spcNode must be GLOBALTABLESPACE_OID if and only if dbNode is * zero. We support shared relations only in the "global" tablespace. * * Note: in pg_class we allow reltablespace == 0 to denote that the * relation is stored in its database's "default" tablespace (as * identified by pg_database.dattablespace). However this shorthand * is NOT allowed in RelFileNode structs --- the real tablespace ID * must be supplied when setting spcNode. * * Note: in pg_class, relfilenode can be zero to denote that the relation * is a "mapped" relation, whose current true filenode number is available * from relmapper.c. Again, this case is NOT allowed in RelFileNodes. * * Note: various places use RelFileNode in hashtable keys. Therefore, * there *must not* be any unused padding bytes in this struct. That * should be safe as long as all the fields are of type Oid. */ typedef struct RelFileNode { Oid spcNode; /* tablespace */ --typedef unsigned int Oid; Oid dbNode; /* database */ Oid relNode; /* relation */ } RelFileNode; ### BlockNumber typedef uint32 BlockNumber;

XLogRecordBlockHeader结构体:./src/include/access/xlogrecord.h

二进制解析

XLogRecordBlockHeader由header-portion of XLogRecord后面的20个bytes组成。

image-20221028215104705

uint8 id: 00 uint8 fork_flags: 20 uint16 data_length: 0f 00 RelFileNode: rel 1663/5/32783 Oid spcNode: 7f 06 00 00 Oid dbNode: 05 00 00 00 Oid relNode: 0f 80 00 00 BlockNumber: blk 0 <-- 000000

解析XLogRecordDataHeaderShort

结构体定义

/* * XLogRecordDataHeaderShort/Long are used for the "main data" portion of * the record. If the length of the data is less than 256 bytes, the short * form is used, with a single byte to hold the length. Otherwise the long * form is used. * * (These structs are currently not used in the code, they are here just for * documentation purposes). */ typedef struct XLogRecordDataHeaderShort { uint8 id; /* XLR_BLOCK_ID_DATA_SHORT */ uint8 data_length; /* number of payload bytes */ } XLogRecordDataHeaderShort; #define SizeOfXLogRecordDataHeaderShort (sizeof(uint8) * 2) typedef struct XLogRecordDataHeaderLong { uint8 id; /* XLR_BLOCK_ID_DATA_LONG */ /* followed by uint32 data_length, unaligned */ } XLogRecordDataHeaderLong; #define SizeOfXLogRecordDataHeaderLong (sizeof(uint8) + sizeof(uint32)) /* * Block IDs used to distinguish different kinds of record fragments. Block * references are numbered from 0 to XLR_MAX_BLOCK_ID. A rmgr is free to use * any ID number in that range (although you should stick to small numbers, * because the WAL machinery is optimized for that case). A few ID * numbers are reserved to denote the "main" data portion of the record, * as well as replication-supporting transaction metadata. * * The maximum is currently set at 32, quite arbitrarily. Most records only * need a handful of block references, but there are a few exceptions that * need more. */ #define XLR_MAX_BLOCK_ID 32 #define XLR_BLOCK_ID_DATA_SHORT 255 #define XLR_BLOCK_ID_DATA_LONG 254 #define XLR_BLOCK_ID_ORIGIN 253 #define XLR_BLOCK_ID_TOPLEVEL_XID 252

二进制解析

XLogRecordDataHeaderShort由2 bytes组成

image-20221102193343253

二进制解析

uint8 id: ff uint8 data_length: 03

解析xl_heap_header

结构体定义

/* * We don't store the whole fixed part (HeapTupleHeaderData) of an inserted * or updated tuple in WAL; we can save a few bytes by reconstructing the * fields that are available elsewhere in the WAL record, or perhaps just * plain needn't be reconstructed. These are the fields we must store. */ typedef struct xl_heap_header { uint16 t_infomask2; uint16 t_infomask; uint8 t_hoff; } xl_heap_header; #define SizeOfHeapHeader (offsetof(xl_heap_header, t_hoff) + sizeof(uint8)) /* This is what we need to know about insert */ typedef struct xl_heap_insert { OffsetNumber offnum; /* inserted tuple's offset */ --typedef uint16 OffsetNumber; uint8 flags; /* xl_heap_header & TUPLE DATA in backup block 0 */ } xl_heap_insert; #define SizeOfHeapInsert (offsetof(xl_heap_insert, flags) + sizeof(uint8))

二进制解析

image-20221102194610618

二进制解析

typedef struct xl_heap_header { uint16 t_infomask2; -- 01 00 uint16 t_infomask; -- 02 08 uint8 t_hoff; -- 18 } xl_heap_header; typedef struct xl_heap_insert { OffsetNumber offnum; --03 00 * inserted tuple's offset */ --typedef uint16 OffsetNumber; uint8 flags; --00 /* xl_heap_header & TUPLE DATA in backup block 0 */ } xl_heap_insert;
最后修改时间:2022-11-02 19:48:38
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论