暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 数据文件灾难恢复 - 解析与数据pg_filedump

digoal 2017-03-10
592

作者

digoal

日期

2017-03-10

标签

PostgreSQL , 数据文件 , pg_filedump , 安全 , TDE


背景

俗话说常在河边站哪有不湿鞋,作为一名战斗在一线的DBA或者开发者,可能有遇到过磁盘损坏,磁盘阵列损坏,如果有备份或者备库的话,还好。

如果没有备份,或者没有备库(通常有一些小型或者创业型的企业),那么遇到磁盘损坏或者其他原因(比如掉电文件系统损坏),导致数据库的数据文件并不完整时,如何从有限的资料中找出数据呢?

比如PostgreSQL,如果读到坏块,会报块不可读的错误,这种情况下通过设置zero_damaged_pages=on可以跳过损坏的数据块。

如果连元数据都损坏了,又或者坏了一些磁盘,只有某些表空间被幸免于难,这些情况下你的数据库都已经无法启动时,如何能从有限的数据文件中找回数据呢?

数据文件解析pg_filedump

pg_filedump是PostgreSQL社区托管的一个项目,类似于pg_xlogdump,不需要开启数据库,可以直接从数据文件中将数据dump出来。

pg_filedump实际上可以DUMP 堆表、索引数据文件,控制文件的内容。(从pg_filedump引用的头文件也能看出端倪)

安装很简单

```
git clone git://git.postgresql.org/git/pg_filedump.git

cd pg_filedump

export PATH=/home/digoal/pgsql9.6/bin:$PATH

make ; make install
```

命令帮助如下,通常来说,你只需要指定需要DUMP的文件即可。

如果文件的块头损坏了,那么你可以手工指定一些信息,包括块大小,段大小,解析哪个块,根据什么格式解析(字段类型列表)等。

```
pg_filedump [-abcdfhikxy] [-R startblock [endblock]] [-D attrlist] [-S blocksize] [-s segsize] [-n segnumber] file

Defaults are: relative addressing, range of the entire file, block size
as listed on block 0 in the file

The following options are valid for heap and index files:
-a Display absolute addresses when formatting (Block header
information is always block relative)
-b Display binary block images within a range (Option will turn
off all formatting options)
-d Display formatted block content dump (Option will turn off
all other formatting options)
-D Try to decode tuples using given comma separated list of types.
List of supported types:
* bigint
* bigserial
* bool
* char
* charN -- char(n)
* date
* float
* float4
* float8
* int
* json
* macaddr
* name
* oid
* real
* serial
* smallint
* smallserial
* text
* time
* timestamp
* timetz
* uuid
* varchar
* varcharN -- varchar(n)
* xid
* xml
* ~ -- ignores are attributes left in a tuple
-f Display formatted block content dump along with interpretation
-h Display this information
-i Display interpreted item details
-k Verify block checksums
-R Display specific block ranges within the file (Blocks are
indexed from 0)
[startblock]: block to start at
[endblock]: block to end at
A startblock without an endblock will format the single block
-s Force segment size to [segsize]
-n Force segment number to [segnumber]
-S Force block size to [blocksize]
-x Force interpreted formatting of block items as index items
-y Force interpreted formatting of block items as heap items

The following options are valid for control files:
-c Interpret the file listed as a control file
-f Display formatted content dump along with interpretation
-S Force block size to [blocksize]

In most cases it's recommended to use the -i and -f options to get
the most useful dump output.
```

pg_filedump使用举例

1. 创建测试表

postgres=# create table digoal(id int, info text, crt_time timestamp); CREATE TABLE

2. 插入测试数据

postgres=# insert into digoal select generate_series(1,1000000),md5(random()::text), clock_timestamp(); INSERT 0 1000000

3. 找出表对应的数据文件

```
postgres=# select pg_relation_filepath('digoal');
pg_relation_filepath


base/13269/173369
(1 row)
```

4. 调用checkpoint,把数据刷盘,便于我们接下来的观察

checkpoint;

5. 使用pg_filedump直接读取数据文件,导出数据

```
-> cd $PGDATA/base/13269
-> pg_filedump -i -f ./173369

输出截取


  • PostgreSQL File/Block Formatted Dump Utility - Version 9.6.0
  • File: ./173369
  • Options used: -i -f
  • Dump created on: Sun Mar 12 00:28:56 2017

首先是块的头部内容

Block 0 ********

-----
Block Offset: 0x00000000 Offsets: Lower 452 (0x01c4)
Block: Size 8192 Version 4 Upper 488 (0x01e8)
LSN: logid 61 recoff 0xe69d6490 Special 8192 (0x2000)
Items: 107 Free Space: 36
Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 452

...

然后是DATA部分, ------ Item 1 -- Length: 72 Offset: 8120 (0x1fb8) Flags: NORMAL XMIN: 88072212 XMAX: 88072214 CID|XVAC: 0 Block Id: 9345 linp Index: 86 Attributes: 3 Size: 24 infomask: 0x0102 (HASVARWIDTH|XMIN_COMMITTED)

1fb8: 14e03f05 16e03f05 00000000 00008124 ..?...?........$ 1fc8: 56000300 02011800 01000000 43363134 V...........C614 1fd8: 62653439 31616339 65356636 64633136 be491ac9e5f6dc16 1fe8: 35653065 31323162 36316563 33000000 5e0e121b61ec3... 1ff8: 791cce69 7ced0100 y..i|...

COPY: 1 614be491ac9e5f6dc165e0e121b61ec3 2017-03-12 00:26:23.553657 Item 2 -- Length: 72 Offset: 8048 (0x1f70) Flags: NORMAL XMIN: 88072212 XMAX: 0 CID|XVAC: 0 Block Id: 0 linp Index: 2 Attributes: 3 Size: 24 infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)

1f70: 14e03f05 00000000 00000000 00000000 ..?............. 1f80: 02000300 02091800 02000000 43383335 ............C835 1f90: 39653064 31623462 61323261 64336139 9e0d1b4ba22ad3a9 1fa0: 65386634 38316231 61633336 31000000 e8f481b1ac361... 1fb0: df1cce69 7ced0100 ...i|...

COPY: 2 8359e0d1b4ba22ad3a9e8f481b1ac361 2017-03-12 00:26:23.553759 ...... ```

为了得到记录,需要提供一下字段类型LIST,必须保证与表结构一致

```
-> pg_filedump -D int,text,timestamp ./173369|less


  • PostgreSQL File/Block Formatted Dump Utility - Version 9.6.0
  • File: ./173369
  • Options used: -D int,text,timestamp
  • Dump created on: Sun Mar 12 00:31:25 2017

Block 0 ********

-----
Block Offset: 0x00000000 Offsets: Lower 452 (0x01c4)
Block: Size 8192 Version 4 Upper 488 (0x01e8)
LSN: logid 61 recoff 0xe69d6490 Special 8192 (0x2000)
Items: 107 Free Space: 36
Checksum: 0x0000 Prune XID: 0x00000000 Flags: 0x0000 ()
Length (including item array): 452

------
Item 1 -- Length: 72 Offset: 8120 (0x1fb8) Flags: NORMAL
COPY: 1 614be491ac9e5f6dc165e0e121b61ec3 2017-03-12 00:26:23.553657
Item 2 -- Length: 72 Offset: 8048 (0x1f70) Flags: NORMAL
COPY: 2 8359e0d1b4ba22ad3a9e8f481b1ac361 2017-03-12 00:26:23.553759
Item 3 -- Length: 72 Offset: 7976 (0x1f28) Flags: NORMAL
COPY: 3 0dc8c441e91217897f994ae163510653 2017-03-12 00:26:23.553764
..........
```

COPY得到的就是使用-D提供的类型列表decode拼装的记录。

是不是可以从文件中DUMP数据了呢?莫急,还要看看掩码哦,否则你不知道这条记录是否为你需要的,因为它可能是DEAD TUPLE。

例子

```
-> pg_filedump -D int,text,timestamp -i -f ./173369|less

COPY: 1 614be491ac9e5f6dc165e0e121b61ec3 2017-03-12 00:26:23.553657
Item 2 -- Length: 72 Offset: 8048 (0x1f70) Flags: NORMAL
XMIN: 88072212 XMAX: 0 CID|XVAC: 0
Block Id: 0 linp Index: 2 Attributes: 3 Size: 24
infomask: 0x0902 (HASVARWIDTH|XMIN_COMMITTED|XMAX_INVALID)
```

记录头的infomask解释

每条记录,头部都有infomask, infomask2掩码,掩码表示的意思,可以参考头文件

比如什么是DEAD TUPLE呢?

```
src/include/access/htup_details.h

/
* information stored in t_infomask:
/

define HEAP_HASNULL 0x0001 / has null attribute(s) /

define HEAP_HASVARWIDTH 0x0002 / has variable-width attribute(s) /

define HEAP_HASEXTERNAL 0x0004 / has external stored attribute(s) /

define HEAP_HASOID 0x0008 / has an object-id field /

define HEAP_XMAX_KEYSHR_LOCK 0x0010 / xmax is a key-shared locker /

define HEAP_COMBOCID 0x0020 / t_cid is a combo cid /

define HEAP_XMAX_EXCL_LOCK 0x0040 / xmax is exclusive locker /

define HEAP_XMAX_LOCK_ONLY 0x0080 / xmax, if valid, is only a locker /

/ xmax is a shared locker /

define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)

define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \

                                             HEAP_XMAX_KEYSHR_LOCK)

define HEAP_XMIN_COMMITTED 0x0100 / t_xmin committed /

define HEAP_XMIN_INVALID 0x0200 / t_xmin invalid/aborted /

define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

define HEAP_XMAX_COMMITTED 0x0400 / t_xmax committed /

define HEAP_XMAX_INVALID 0x0800 / t_xmax invalid/aborted /

define HEAP_XMAX_IS_MULTI 0x1000 / t_xmax is a MultiXactId /

define HEAP_UPDATED 0x2000 / this is UPDATEd version of row /

define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0

                                                                             * VACUUM FULL; kept for binary  
                                                                             * upgrade support */

define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0

                                                                             * VACUUM FULL; kept for binary  
                                                                             * upgrade support */

define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)

define HEAP_XACT_MASK 0xFFF0 / visibility-related bits /

```

观察deadtuple

postgres=# update digoal set info='new' where id=1; UPDATE 1 postgres=# checkpoint; CHECKPOINT

观察

```
pg_filedump -D int,text,timestamp -i -f ./173369|less

Item 1 -- Length: 72 Offset: 8120 (0x1fb8) Flags: NORMAL
包含了xmax,说明是更新过的记录
XMIN: 88072212 XMAX: 88072214 CID|XVAC: 0
blockid表示记录指向,即新版本记录在哪个新数据块,Linp指新数据块的第几条记录。
Block Id: 9345 linp Index: 86 Attributes: 3 Size: 24
infomask 参考前面的头文件,解读
infomask: 0x0102 (HASVARWIDTH|XMIN_COMMITTED)

1fb8: 14e03f05 16e03f05 00000000 00008124 ..?...?........$
1fc8: 56000300 02011800 01000000 43363134 V...........C614
1fd8: 62653439 31616339 65356636 64633136 be491ac9e5f6dc16
1fe8: 35653065 31323162 36316563 33000000 5e0e121b61ec3...
1ff8: 791cce69 7ced0100 y..i|...

COPY: 1 614be491ac9e5f6dc165e0e121b61ec3 2017-03-12 00:26:23.553657
```

查看新版本(编号为9345数据块,第86条记录)

```
pg_filedump -D int,text,timestamp -i -f -R 9345 ./173369|less

Item 86 -- Length: 40 Offset: 2032 (0x07f0) Flags: NORMAL
XMIN: 88072214 XMAX: 0 CID|XVAC: 0
Block Id: 9345 linp Index: 86 Attributes: 3 Size: 24
infomask: 0x2802 (HASVARWIDTH|XMAX_INVALID|UPDATED)

07f0: 16e03f05 00000000 00000000 00008124 ..?............$
0800: 56000300 02281800 01000000 096e6577 V....(.......new
0810: 791cce69 7ced0100 y..i|...

COPY: 1 new 2017-03-12 00:26:23.553657
```

PostgreSQL数据块的简介

对于数据文件的组织形式,可以参考头文件

src/include/storage

```
/
* BlockNumber:

* each data file (heap or index) is divided into postgres disk blocks
* (which may be thought of as the unit of i/o -- a postgres buffer
* contains exactly one disk block). the blocks are numbered
* sequentially, 0 to 0xFFFFFFFE.

* InvalidBlockNumber is the same thing as P_NEW in buf.h.

* the access methods, the buffer manager and the storage manager are
* more or less the only pieces of code that should be accessing disk
* blocks directly.
*/
typedef uint32 BlockNumber;

define InvalidBlockNumber ((BlockNumber) 0xFFFFFFFF)

define MaxBlockNumber ((BlockNumber) 0xFFFFFFFE)

```

每个块内的组织,与对象类型有关,比如堆表,B-TREE,HASH等索引,TOAST, FSM等。

可以参考数据layout介绍

https://www.postgresql.org/docs/9.6/static/storage.html

也可以参考对应类型的头文件

阅读pg_filedump的源码,也有助于你对PostgreSQL存储构造的理解

不妨仔细阅读以下头文件

```

include "access/gin_private.h"

include "access/gist.h"

include "access/hash.h"

include "access/htup.h"

include "access/htup_details.h"

include "access/itup.h"

include "access/nbtree.h"

include "access/spgist_private.h"

include "catalog/pg_control.h"

include "storage/bufpage.h"

```

防止脱裤

我们已经看到,使用pg_filedump可直接decode数据文件的内容,因此泄露数据文件其实是比较危险的。

那么如何防止脱裤呢?TDE是一个很好的手段,即数据文件透明加密。你可以参考我末尾的文章。

另外还有加密方法,比如对敏感数据,使用加密字段存储。加解密交给程序完成。彻底杜绝因泄露文件导致的数据泄露。

《PostgreSQL 数据库安全指南》

参考

《PostgreSQL 透明加密(TDE,FDE) - 块级加密》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论