作者
digoal
日期
2022-04-01
标签
PostgreSQL , dml , hot , 收缩 , ctid
问题:
1、同一行在发生若干当前页更新后, 会产生多个tuple版本以及对应的ctid: 例如 ctid1 ... ctid6 (ctid6为最新版本可见)
2、delete from x where ctid=(select ctid from xxx where ...) -- 如果不是HOT ctid6 传给delete ; 如果是HOT, 传ctid1 ?
3、在delete发生前, 如果发生了 vacuum HOT
3.1、vacuum 后会收缩ctid链路, 索引内存储的是ctid1, 在heap page内ctid1指向ctid6, 并且所有垃圾tuple版本会全部被回收. -- page内ctid1到ctid5对应tuple都是旧版本(对当前delete语句不可见)符合vacuum条件, 因为是在delete发起之前就产生的旧版本. vacuum 收缩ctid, 入口变成index->ctid1->ctid6->tuple
3.2 如果不是vacuum HOT, 那么ctid1-ctid5都会回收, 只剩下ctid6. 也比较好理解, 不会有本文的争议.
4、假设是HOT, delete from x where ctid = (此时返回的是ctid1?) -- 能不能成功delete? 有没有问题?
1、建表和索引
create table tbl (id int, c1 int, c2 int);
create index idx_tbl_1 on tbl (c1);
2、关闭autovacuum
alter table tbl set (autovacuum_enabled =false);
3、插入1条记录
insert into tbl values (1,1,1);
4、更新10次 (非索引字段)
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2 returning ctid,*;
5、会话1:
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =on;
set enable_seqscan=off;
alter function pg_sleep(double precision) cost 100000; -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1) and exists (select 1 from pg_sleep(30));
这样计划就很清晰了, 优先执行select, 然后sleep, 然后再delete.
postgres=# explain delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1) and exists (select 1 from pg_sleep(30));
QUERY PLAN
------------------------------------------------------------------------------------------
Delete on tbl (cost=251.06..252.17 rows=0 width=0)
InitPlan 1 (returns $0)
-> Limit (cost=0.15..1.05 rows=1 width=6)
-> Index Scan using idx_tbl_1 on tbl tbl_1 (cost=0.15..9.13 rows=10 width=6)
Index Cond: (c1 = 1)
InitPlan 2 (returns $1)
-> Function Scan on pg_sleep (cost=250.00..250.01 rows=1 width=0)
-> Result (cost=0.00..1.11 rows=1 width=6)
One-Time Filter: $1
-> Tid Scan on tbl (cost=0.00..1.11 rows=1 width=6)
TID Cond: (ctid = $0)
(11 rows)
6、会话2(30秒内): vacuum 表, HOT收缩
postgres=# vacuum verbose tbl; -- 由于是HOT, 所以索引只有1个版本, 没有垃圾tuple
INFO: vacuuming "public.tbl"
INFO: table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 5431698
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
postgres=# select ctid,* from tbl;
ctid | id | c1 | c2
--------+----+----+----
(0,11) | 1 | 1 | 2
(1 row)
7、观察会话1结果? 删除成功.
DELETE 1
再次执行垃圾回收, 发现了1个垃圾索引版本, 1个垃圾tuple版本.
postgres=# vacuum verbose tbl;
INFO: vacuuming "public.tbl"
INFO: scanned index "idx_tbl_1" to remove 1 row versions
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: table "tbl": removed 1 dead item identifiers in 1 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
INFO: index "idx_tbl_1" now contains 0 row versions in 2 pages
DETAIL: 1 index row versions were removed.
0 index pages were newly deleted.
0 index pages are currently deleted, of which 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: table "tbl": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 5431699
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: table "tbl": truncated 1 to 0 pages
DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
其他, 如果采用seqscan则毫无疑问的select会直接返回ctid6, 第5步改成如下, 一样不会有问题:
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =off;
set enable_seqscan=on;
alter function pg_sleep(double precision) cost 100000; -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1) and exists (select 1 from pg_sleep(30));
1、建表和索引
create table tbl (id int, c1 int, c2 int);
create index idx_tbl_1 on tbl (c1);
2、关闭autovacuum
alter table tbl set (autovacuum_enabled =false);
3、插入1条记录
insert into tbl values (1,1,1);
4、更新10次 (非索引字段)
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2;
update tbl set c2=2 returning ctid,*;
5、会话1:
index scan
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =on;
set enable_seqscan=off;
alter function pg_sleep(double precision) cost 100000; -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));
或 seqscan
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =off;
set enable_seqscan=on;
alter function pg_sleep(double precision) cost 100000; -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));
6、会话2(30秒内): vacuum 表, HOT收缩
postgres=# vacuum verbose tbl; -- 由于是HOT, 所以索引只有1个版本, 没有垃圾tuple
INFO: vacuuming "public.tbl"
INFO: table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 5431698
Skipped 0 pages due to buffer pins, 0 frozen pages.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
7、会话1, seq scan, index scan返回的ctid都是11.
ctid
--------
(0,11)
(1 row)
HOT查询返回的并不是index中存储的ctid1, 虽然入口是ctid1.
create table tbl (id int, c1 int, c2 int);
create index idx_tbl_1 on tbl (id);
insert into tbl values (1,1,1);
set enable_indexonlyscan =off;
set enable_bitmapscan =off;
set enable_indexscan =on;
set enable_seqscan=off;
alter function pg_sleep(double precision) cost 100000; -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.
并发问题复现:
session1: update tbl set id = 3 where ctid in (select ctid FROM TBL where id=1 and exists (select 1 from pg_sleep(30))); -- 进行到一半,只进行了子查询结果为(0,1)
session2: update tbl set id=2 where id = 1; 返回更新了一行
session1: 按子查询的ctid 做变更,会找到当前id已经是2的那一行, 将那一行的id改成3 返回更新成功,修改了一行 . 因为这里存在HOT, ctid(0,1) 链到 ctid(0,2) 再到tuple2的page offset.
解决这个问题, 可以使用recheck, 或者RR模式隔离级别(相当于加锁或者抛出错误), 或者for update加锁 . 详见:
《PostgreSQL CTID 物理行号在并发DML时的隔离性问题》
《在PostgreSQL中实现update | delete limit - CTID扫描实践 (高效阅后即焚)》
期望 PostgreSQL 增加什么功能?
PolarDB for PostgreSQL云原生分布式开源数据库
PostgreSQL 解决方案集合
德哥 / digoal's github - 公益是一辈子的事.





