PostgreSQL数据库由于MVCC的机制,对于DML的操作,更改的数据暂时标记为无效并未真正的在物理上清理,直到autovacuum运行时时才清理这些无效数据。
pg_dirtyread是PostgreSQL数据库的一个扩展插件,它可以从表中读取未被vacuum的无效数据的功能。支持 9.2.9 及以后的版本。
安装
下载地址
https://github.com/df7cb/pg_dirtyread
wget https://github.com/df7cb/pg_dirtyread/archive/refs/tags/2.4.tar.gz
mv 2.4.tar.gz pg_dirtyread-2.4.tar.gz
编译安装
[postgres@yejf ~]$ tar -zxvf pg_dirtyread-2.4.tar.gz
[postgres@yejf ~]$ cd pg_dirtyread-2.4/
[postgres@yejf pg_dirtyread-2.4]$ make
[postgres@yejf pg_dirtyread-2.4]$ make install
/bin/mkdir -p '/usr/local/pgsql12/lib'
/bin/mkdir -p '/usr/local/pgsql12/share/extension'
/bin/mkdir -p '/usr/local/pgsql12/share/extension'
/bin/install -c -m 755 pg_dirtyread.so '/usr/local/pgsql12/lib/pg_dirtyread.so'
/bin/install -c -m 644 .//pg_dirtyread.control '/usr/local/pgsql12/share/extension/'
/bin/install -c -m 644 .//pg_dirtyread--1.0.sql .//pg_dirtyread--1.0--2.sql .//pg_dirtyread--2.sql '/usr/local/pgsql12/share/extension/'
使用示例
不用重启即可使用
在对应库创建EXTENSION
postgres=# create extension pg_dirtyread ;
CREATE EXTENSION
创建表并禁用autovacuum
drop table if exists foo;
CREATE TABLE foo (bar bigint, baz text);
ALTER TABLE foo SET (
autovacuum_enabled = false, toast.autovacuum_enabled = false
);
插入和删除数据
INSERT INTO foo VALUES (1, 'aaa'), (2, 'bbb'),(3,'ccc');
DELETE FROM foo WHERE bar = 1;
DELETE FROM foo WHERE bar = 2;
删除后的数据
postgres=# select * from foo;
bar | baz
-----+-----
3 | ccc
查看表没有被vacuum,可以进行pg_dirtyread
postgres=# select * from pg_stat_all_tables where relname='foo';
-[ RECORD 1 ]-------+-------
relid | 27465
schemaname | public
relname | foo
seq_scan | 6
seq_tup_read | 33
idx_scan |
idx_tup_fetch |
n_tup_ins | 3
n_tup_upd | 0
n_tup_del | 2
n_tup_hot_upd | 0
n_live_tup | 1
n_dead_tup | 2
n_mod_since_analyze | 5
last_vacuum |
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 0
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0
使用pg_dirtyread查看表,dead为t表示数据已经删除(删除数据后,需要查询一下,dead的结果才会更新)
SELECT * FROM pg_dirtyread('foo')
AS t(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,bar bigint, baz text);
tableoid | ctid | xmin | xmax | cmin | cmax | dead | bar | baz
----------+-------+------------+------------+------+------+------+-----+-----
27465 | (0,1) | 2149415018 | 2149415019 | 0 | 0 | t | 1 | aaa
27465 | (0,2) | 2149415018 | 2149415020 | 0 | 0 | t | 2 | bbb
27465 | (0,3) | 2149415018 | 0 | 0 | 0 | f | 3 | ccc
如果只想恢复到其中的某一个时间点的数据,需要通过事务ID来转换成时间,可以通过函数 pg_xact_commit_timestamp 进行转换。
使用
pg_xact_commit_timestamp
函数,需要将参数track_commit_timestamp
设置为on
# vim postgresql.conf
track_commit_timestamp = on # collect timestamp of transaction commit
# (change requires restart)
SELECT pg_xact_commit_timestamp(xmin) xmin_time
,pg_xact_commit_timestamp(xmax) xmax_time
,*
FROM pg_dirtyread('foo')
AS t(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,bar bigint, baz text)
where xmax<>0;
-[ RECORD 1 ]----------------------------
xmin_time | 2023-03-15 09:57:33.861739+08
xmax_time | 2023-03-15 09:57:38.059478+08
tableoid | 27465
ctid | (0,1)
xmin | 2149415018
xmax | 2149415019
cmin | 0
cmax | 0
dead | t
bar | 1
baz | aaa
-[ RECORD 2 ]----------------------------
xmin_time | 2023-03-15 09:57:33.861739+08
xmax_time | 2023-03-15 09:57:42.519042+08
tableoid | 27465
ctid | (0,2)
xmin | 2149415018
xmax | 2149415020
cmin | 0
cmax | 0
dead | t
bar | 2
baz | bbb
可以看到dead为t时具体的事务时间
pg_dirtyread还支持查看未被重写表中被删除的字段
postgres=# select * from foo;
bar | baz
-----+-----
3 | ccc
postgres=# ALTER TABLE foo DROP COLUMN baz;
ALTER TABLE
postgres=# SELECT * FROM pg_dirtyread('foo') foo(bar bigint, dropped_2 text);
bar | dropped_2
-----+-----------
1 | aaa
2 | bbb
3 | ccc
postgres=# select * from foo;
bar
-----
3
对表设置或对表进行vacuum回收死元组
ALTER TABLE foo SET (
autovacuum_enabled = on, toast.autovacuum_enabled = on
);
再次用pg_dirtyread查看死元组的数据已经被清理了
postgres=# vacuum foo;
VACUUM
postgres=# select * from pg_stat_all_tables where relname='foo';
-[ RECORD 1 ]-------+------------------------------
relid | 27465
schemaname | public
relname | foo
seq_scan | 6
seq_tup_read | 39
idx_scan |
idx_tup_fetch |
n_tup_ins | 3
n_tup_upd | 0
n_tup_del | 2
n_tup_hot_upd | 0
n_live_tup | 1
n_dead_tup | 0
n_mod_since_analyze | 5
last_vacuum | 2023-03-15 10:06:01.157739+08
last_autovacuum |
last_analyze |
last_autoanalyze |
vacuum_count | 1
autovacuum_count | 0
analyze_count | 0
autoanalyze_count | 0
.
postgres=# SELECT * FROM pg_dirtyread('foo') foo(bar bigint, dropped_2 text);
bar | dropped_2
-----+-----------
3 | ccc
总结
对于DML的操作误删了数据时,使用pg_dirtyread能够快速的恢复数据,具体恢复步骤如下:
对表执行禁用vacuum的操作(重要)
ALTER TABLE foo SET (autovacuum_enabled = false, toast.autovacuum_enabled = false);使用pg_dirtyread插件查询被删除的数据并进行备份或恢复
对于DDL的操作,PostgreSQL是直接移动了底层的文件,使用pg_dirtyread是恢复不了的,需要从备份中恢复数据。
另外,在PostgreSQL中进行数据恢复有以下常用的方式:
逻辑备份,只恢复到备份的时间点
物理备份+PITR,需要开启归档并保留,可以恢复到备份后在有效的wal日志内的任意时间点
WalMiner,通过解析wal日志生成undo sql




