PostgreSQL 扩展之 pg_dirtyread

原创高云龙云和恩墨技术服务团队 2022-03-16

4835

pg_dirtyread是一个脏读插件，对于误操作删除的数据或者列，在未对表执行vacuum之前可以读取出来并进行恢复，但是对truncate操作是没有效果的，使用此插件需要让vacuum相关参数不敏感，或者调整关键表的autovacuum/vacuum存储参数设置，另外使用此参数，也要考虑对性能的影响和维护成本。

PostgreSQL 版本：PG14.2
pg_dirtyread 版本：2.4
链接地址：https://github.com/ChristophBerg/pg_dirtyread

安装部署

软件获取

--下载
wget https://github.com/df7cb/pg_dirtyread/archive/refs/tags/2.4.tar.gz

--将pg_dirtyread放到数据库源码的contrib目录下,并给777权限
# pwd
/opt/software/postgresql-14.2/contrib

# ll |grep -i pg_dirtyread
drwxrwxrwx 8  777 root 4096 Mar 15 14:31 pg_dirtyread

编译安装

--配置pg_config
# vim Makefile
PG_CONFIG = /opt/postgresql-14.2/bin/pg_config

--编译
# make && make install

--安装
postgres=# select * from pg_available_extensions() where name='pg_dirtyread';
     name     | default_version |                 comment
--------------+-----------------+------------------------------------------
 pg_dirtyread | 2               | Read dead but unvacuumed rows from table

postgres=# create extension pg_dirtyread;
CREATE EXTENSION
postgres=# \dx
                          List of installed extensions
     Name     | Version |   Schema   |               Description
--------------+---------+------------+------------------------------------------
 pg_dirtyread | 2       | public     | Read dead but unvacuumed rows from table
 plpgsql      | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

场景测试

--创建表
postgres=# create table fb_dirty(id int,col1 varchar(8)) with(autovacuum_enabled = false, toast.autovacuum_enabled = false);
CREATE TABLE
postgres=# \d+ fb_dirty
                                               Table "public.fb_dirty"
 Column |         Type         | Collation | Nullable | Default | Storage  | Compression | Stats target | Description
--------+----------------------+-----------+----------+---------+----------+-------------+--------------+-------------
 id     | integer              |           |          |         | plain    |             |              |
 col1   | character varying(8) |           |          |         | extended |             |              |
Access method: heap
Options: autovacuum_enabled=false

--插入数据
postgres=# insert into fb_dirty values(1,'test1'),(2,'test2');
postgres=# select xmin,xmax,ctid,* from fb_dirty;
 xmin | xmax | ctid  | id | col1
------+------+-------+----+-------
  765 |    0 | (0,1) |  1 | test1
  765 |    0 | (0,2) |  2 | test2
(2 rows)

--查看脏读数据,如果表在非public模式下，直接用schema.table即可
postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,col1 varchar(8));
 tableoid | ctid  | xmin | xmax | cmin | cmax | dead | id | col1
----------+-------+------+------+------+------+------+----+-------
    16394 | (0,1) |  765 |    0 |    0 |    0 | f    |  1 | test1
    16394 | (0,2) |  765 |    0 |    0 |    0 | f    |  2 | test2
(2 rows)

删除数据

postgres=# delete from fb_dirty where id=2;
DELETE 1
postgres=# select xmin,xmax,ctid,* from fb_dirty;
 xmin | xmax | ctid  | id | col1
------+------+-------+----+-------
  765 |    0 | (0,1) |  1 | test1
(1 row)

postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,col1 varchar(8));
 tableoid | ctid  | xmin | xmax | cmin | cmax | dead | id | col1
----------+-------+------+------+------+------+------+----+-------
    16394 | (0,1) |  765 |    0 |    0 |    0 | f    |  1 | test1
    16394 | (0,2) |  765 |  766 |    0 |    0 | t    |  2 | test2
(2 rows)

当我们通过delete操作将id=2的这条数据删除后，在表里我们已经看不到这条记录，但是在脏读中可以看到这条数据，但是dead那列标示这条数据已经是t，说明这条数据被删除。

更新数据

postgres=# update fb_dirty set col1='t1' where id=1;
UPDATE 1
postgres=# select xmin,xmax,ctid,* from fb_dirty;
 xmin | xmax | ctid  | id | col1
------+------+-------+----+------
  767 |    0 | (0,3) |  1 | t1
(1 row)

postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,col1 varchar(8));
 tableoid | ctid  | xmin | xmax | cmin | cmax | dead | id | col1
----------+-------+------+------+------+------+------+----+-------
    16394 | (0,1) |  765 |  767 |    0 |    0 | t    |  1 | test1
    16394 | (0,2) |  765 |  766 |    0 |    0 | t    |  2 | test2
    16394 | (0,3) |  767 |    0 |    0 |    0 | f    |  1 | t1
(3 rows)

当我们更新id=1这条数据之后，在脏读中我们可以看到id=1的事务765已经被事务767覆盖，事务765的两个操作都已经dead，事务767中的col1这列的值’test1’替换成了’t1’，同时我们也能看到这个表中的历史操作流水。

删除列

postgres=# alter table fb_dirty drop column col1;
ALTER TABLE
postgres=# select xmin,xmax,ctid,* from fb_dirty;
 xmin | xmax | ctid  | id
------+------+-------+----
  767 |    0 | (0,3) |  1
(1 row)

postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,dropped_2 varchar(8));
 tableoid | ctid  | xmin | xmax | cmin | cmax | dead | id | dropped_2
----------+-------+------+------+------+------+------+----+-----------
    16394 | (0,1) |  765 |  767 |    0 |    0 | t    |  1 | test1
    16394 | (0,2) |  765 |  766 |    0 |    0 | t    |  2 | test2
    16394 | (0,3) |  767 |    0 |    0 |    0 | f    |  1 | t1
(3 rows)

我们删除列col1后，在脏读中依然能看到之前的数据，但是这里要注意一个变更，就是col1已经没有了，需要用dropped_n来替代，n代表表中的第几列。

truncate数据

postgres=# select xmin,xmax,ctid,* from fb_dirty;
 xmin | xmax | ctid  | id
------+------+-------+----
  767 |    0 | (0,3) |  1
(1 row)

postgres=# truncate fb_dirty;
TRUNCATE TABLE
postgres=# select xmin,xmax,ctid,* from fb_dirty;
 xmin | xmax | ctid | id
------+------+------+----
(0 rows)

postgres=# select * from pg_dirtyread('fb_dirty') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,id int,dropped_2 varchar(8));
 tableoid | ctid | xmin | xmax | cmin | cmax | dead | id | dropped_2
----------+------+------+------+------+------+------+----+-----------
(0 rows)

truncate 这个操作很危险，它不但将表中的数据清除，同时也将脏读的历史流水也同步清理。

性能测试

我们直接用pgbench对同一个实例下不通的数据库做压测，其中一个数据库安装插件，看一下两个数据库大小以及pgbench的效率

环境准备

dirtyread=# \l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 dirtyread | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 |                       | 8569 kB | pg_default |
 postgres  | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 |                       | 8569 kB | pg_default | default administrative connection database

postgres=# \dx
                          List of installed extensions
     Name     | Version |   Schema   |               Description
--------------+---------+------------+------------------------------------------
 pg_dirtyread | 2       | public     | Read dead but unvacuumed rows from table
 plpgsql      | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)

postgres=# \c dirtyread
You are now connected to database "dirtyread" as user "postgres".
dirtyread=# \dx
                 List of installed extensions
  Name   | Version |   Schema   |         Description
---------+---------+------------+------------------------------
 plpgsql | 1.0     | pg_catalog | PL/pgSQL procedural language
(1 row)

dirtyread压测

$ pgbench -i dirtyread -p 1432 -s 10
$ pgbench -n -r -T 60 -P 1 -c 32 -j 4 -p 1432 dirtyread
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 32
number of threads: 4
duration: 60 s
number of transactions actually processed: 272811
latency average = 7.008 ms
latency stddev = 4.515 ms
initial connection time = 17.247 ms
tps = 4546.851288 (without initial connection time)
statement latencies in milliseconds:
         0.001  \set aid random(1, 100000 * :scale)
         0.001  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.354  BEGIN;
         0.460  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.423  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         1.293  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         2.880  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.413  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.183  END;

postgres压测

$ pgbench -i postgres -p 1432 -s 10
$ pgbench -n -r -T 60 -P 1 -c 32 -j 4 -p 1432 postgres
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 10
query mode: simple
number of clients: 32
number of threads: 4
duration: 60 s
number of transactions actually processed: 276637
latency average = 6.915 ms
latency stddev = 4.418 ms
initial connection time = 15.362 ms
tps = 4610.552273 (without initial connection time)
statement latencies in milliseconds:
         0.001  \set aid random(1, 100000 * :scale)
         0.000  \set bid random(1, 1 * :scale)
         0.001  \set tid random(1, 10 * :scale)
         0.000  \set delta random(-5000, 5000)
         0.344  BEGIN;
         0.450  UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
         0.414  SELECT abalance FROM pgbench_accounts WHERE aid = :aid;
         1.275  UPDATE pgbench_tellers SET tbalance = tbalance + :delta WHERE tid = :tid;
         2.849  UPDATE pgbench_branches SET bbalance = bbalance + :delta WHERE bid = :bid;
         0.404  INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP);
         1.176  END;

结果对比

postgres=# \l+
                                                                    List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges   |  Size   | Tablespace |                Description
-----------+----------+----------+-------------+-------------+-----------------------+---------+------------+--------------------------------------------
 dirtyread | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 |                       | 188 MB  | pg_default |
 postgres  | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 |                       | 189 MB  | pg_default | default administrative connection database
 template0 | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 | =c/postgres          +| 8409 kB | pg_default | unmodifiable empty database
           |          |          |             |             | postgres=CTc/postgres |         |            |
 template1 | postgres | UTF8     | zh_CN.UTF-8 | zh_CN.UTF-8 | =c/postgres          +| 8409 kB | pg_default | default template for new databases
           |          |          |             |             | postgres=CTc/postgres |         |            |
(4 rows)

postgres=# select count(*) from pg_dirtyread('pgbench_accounts') as foo(tableoid oid, ctid tid, xmin xid, xmax xid, cmin cid, cmax cid, dead boolean,aid int,bid int,abalance int,filler char(84));
  count
---------
 1016703

同样的命令，同一个实例，数据库的总数据量相差不大，pgbench工具用32c,在cpu跑满的情况下，相差也不是很大，从这个例子可以看出pg_dirtyread对性能影响较小。

注：此次测试仅作参考

墨力计划 postgresql

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

PostgreSQL 点点滴滴（共13篇）

PostgreSQL 数据库运维的点点滴滴