pg_dirtyread是一个脏读插件,对于误操作删除的数据或者列,在未对表执行vacuum之前可以读取出来并进行恢复,但是对truncate操作是没有效果的,使用此插件需要让vacuum相关参数不敏感,或者调整关键表的autovacuum/vacuum存储参数设置,另外使用此参数,也要考虑对性能的影响和维护成本。
PostgreSQL 版本:PG14.2
pg_dirtyread 版本:2.4
链接地址:https://github.com/ChristophBerg/pg_dirtyread
安装部署
软件获取
--下载
wget https://github.com/df7cb/pg_dirtyread/archive/refs/tags/2.4.tar.gz
--将pg_dirtyread放到数据库源码的contrib目录下,并给777权限
# pwd
/opt/software/postgresql-14.2/contrib
# ll |grep -i pg_dirtyread
drwxrwxrwx 8 777 root 4096 Mar 15 14:31 pg_dirtyread
编译安装
--配置pg_config
# vim Makefile
PG_CONFIG = /opt/postgresql-14.2/bin/pg_config
--编译
# make && make install
--安装
postgres=# select * from pg_available_extensions() where name='pg_dirtyread';
name | default_version | comment
--------------+-----------------+------------------------------------------
pg_dirtyread | 2 | Read dead but unvacuumed rows from table
postgres=# create extension pg_dirtyread;
CREATE EXTENSION
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------+---------+------------+------------------------------------------
pg_dirtyread | 2 | public | Read dead but unvacuumed rows from table
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)
场景测试
--创建表
postgres=# create table fb_dirty(id int,col1 varchar(8)) with(autovacuum_enabled = false, toast.autovacuum_enabled = false);
CREATE TABLE
postgres=# \d+ fb_dirty
Table "public.fb_dirty"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | integer | | | | plain | | |
col1 | character varying(8) | | | | extended | | |
Access method: heap
Options: autovacuum_enabled=false
--插入数据
postgres=# insert into fb_dirty values(1,'test1'),(2,'test2');
postgres=# select xmin,xmax,ctid,* from fb_dirty;
xmin | xmax | ctid | id | col1
------+------+-------+----+-------
765 | 0 | (0,1) | 1 | test1
765 | 0 | (0,2) | 2 | test2
(2 rows)
--查看脏读数据,如果表在非public模式下,直接用schema.table即可
postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,col1 varchar(8));
tableoid | ctid | xmin | xmax | cmin | cmax | dead | id | col1
----------+-------+------+------+------+------+------+----+-------
16394 | (0,1) | 765 | 0 | 0 | 0 | f | 1 | test1
16394 | (0,2) | 765 | 0 | 0 | 0 | f | 2 | test2
(2 rows)
删除数据
postgres=# delete from fb_dirty where id=2;
DELETE 1
postgres=# select xmin,xmax,ctid,* from fb_dirty;
xmin | xmax | ctid | id | col1
------+------+-------+----+-------
765 | 0 | (0,1) | 1 | test1
(1 row)
postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,col1 varchar(8));
tableoid | ctid | xmin | xmax | cmin | cmax | dead | id | col1
----------+-------+------+------+------+------+------+----+-------
16394 | (0,1) | 765 | 0 | 0 | 0 | f | 1 | test1
16394 | (0,2) | 765 | 766 | 0 | 0 | t | 2 | test2
(2 rows)
当我们通过delete操作将id=2的这条数据删除后,在表里我们已经看不到这条记录,但是在脏读中可以看到这条数据,但是dead那列标示这条数据已经是t,说明这条数据被删除。
更新数据
postgres=# update fb_dirty set col1='t1' where id=1;
UPDATE 1
postgres=# select xmin,xmax,ctid,* from fb_dirty;
xmin | xmax | ctid | id | col1
------+------+-------+----+------
767 | 0 | (0,3) | 1 | t1
(1 row)
postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,col1 varchar(8));
tableoid | ctid | xmin | xmax | cmin | cmax | dead | id | col1
----------+-------+------+------+------+------+------+----+-------
16394 | (0,1) | 765 | 767 | 0 | 0 | t | 1 | test1
16394 | (0,2) | 765 | 766 | 0 | 0 | t | 2 | test2
16394 | (0,3) | 767 | 0 | 0 | 0 | f | 1 | t1
(3 rows)
当我们更新id=1这条数据之后,在脏读中我们可以看到id=1的事务765已经被事务767覆盖,事务765的两个操作都已经dead,事务767中的col1这列的值’test1’替换成了’t1’,同时我们也能看到这个表中的历史操作流水。
删除列
postgres=# alter table fb_dirty drop column col1;
ALTER TABLE
postgres=# select xmin,xmax,ctid,* from fb_dirty;
xmin | xmax | ctid | id
------+------+-------+----
767 | 0 | (0,3) | 1
(1 row)
postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,dropped_2 varchar(8));
tableoid | ctid | xmin | xmax | cmin | cmax | dead | id | dropped_2
----------+-------+------+------+------+------+------+----+-----------
16394 | (0,1) | 765 | 767 | 0 | 0 | t | 1 | test1
16394 | (0,2) | 765 | 766 | 0 | 0 | t | 2 | test2
16394 | (0,3) | 767 | 0 | 0 | 0 | f | 1 | t1
(3 rows)
我们删除列col1后,在脏读中依然能看到之前的数据,但是这里要注意一个变更,就是col1已经没有了,需要用dropped_n来替代,n代表表中的第几列。
truncate数据
postgres=# select xmin,xmax,ctid,* from fb_dirty;
xmin | xmax | ctid | id
------+------+-------+----
767 | 0 | (0,3) | 1
(1 row)
postgres=# truncate fb_dirty;
TRUNCATE TABLE
postgres=# select xmin,xmax,ctid,* from fb_dirty;
xmin | xmax | ctid | id
------+------+------+----
(0 rows)
postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,dropped_2 varchar(8));
tableoid | ctid | xmin | xmax | cmin | cmax | dead | id | dropped_2
----------+------+------+------+------+------+------+----+-----------
(0 rows)
truncate 这个操作很危险,它不但将表中的数据清除,同时也将脏读的历史流水也同步清理。
性能测试
我们直接用pgbench对同一个实例下不通的数据库做压测,其中一个数据库安装插件,看一下两个数据库大小以及pgbench的效率
环境准备
dirtyread=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
dirtyread | postgres | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | | 8569 kB | pg_default |
postgres | postgres | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | | 8569 kB | pg_default | default administrative connection database
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------+---------+------------+------------------------------------------
pg_dirtyread | 2 | public | Read dead but unvacuumed rows from table
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(2 rows)
postgres=# \c dirtyread
You are now connected to database "dirtyread" as user "postgres".
dirtyread=# \dx
List of installed extensions
Name | Version | Schema | Description
---------+---------+------------+------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
(1 row)
dirtyread压测
$ pgbench -i dirtyread -p 1432 -s 10
$ pgbench -n -r -T 60 -P 1 -c 32 -j 4 -p 1432 dirtyread
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 32
number of threads: 4
duration: 60 s
number of transactions actually processed: 272811
latency average = 7.008 ms
latency stddev = 4.515 ms
initial connection time = 17.247 ms
tps = 4546.851288 (without initial connection time)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.001 \set bid random(1, 1 * :scale)
0.001 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.354 BEGIN;
0.460 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.423 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
1.293 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
2.880 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.413 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
1.183 END;
postgres压测
$ pgbench -i postgres -p 1432 -s 10
$ pgbench -n -r -T 60 -P 1 -c 32 -j 4 -p 1432 postgres
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 32
number of threads: 4
duration: 60 s
number of transactions actually processed: 276637
latency average = 6.915 ms
latency stddev = 4.418 ms
initial connection time = 15.362 ms
tps = 4610.552273 (without initial connection time)
statement latencies in milliseconds:
0.001 \set aid random(1, 100000 * :scale)
0.000 \set bid random(1, 1 * :scale)
0.001 \set tid random(1, 10 * :scale)
0.000 \set delta random(-5000, 5000)
0.344 BEGIN;
0.450 UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
0.414 SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
1.275 UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
2.849 UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
0.404 INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
1.176 END;
结果对比
postgres=# \l+
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges | Size | Tablespace | Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
dirtyread | postgres | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | | 188 MB | pg_default |
postgres | postgres | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | | 189 MB | pg_default | default administrative connection database
template0 | postgres | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | =c/postgres +| 8409 kB | pg_default | unmodifiable empty database
| | | | | postgres=CTc/postgres | | |
template1 | postgres | UTF8 | zh_CN.UTF-8 | zh_CN.UTF-8 | =c/postgres +| 8409 kB | pg_default | default template for new databases
| | | | | postgres=CTc/postgres | | |
(4 rows)
postgres=# select count(*) from pg_dirtyread('pgbench_accounts') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,aid int,bid int,abalance int,filler char(84));
count
---------
1016703
同样的命令,同一个实例,数据库的总数据量相差不大,pgbench工具用32c,在cpu跑满的情况下,相差也不是很大,从这个例子可以看出pg_dirtyread对性能影响较小。
注:此次测试仅作参考
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




