暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL HOT vacuum 收缩链路对DML where CTID=ctid 安全吗?

原创 digoal 2022-01-20
472

作者

digoal

日期

2022-04-01

标签

PostgreSQL , dml , hot , 收缩 , ctid


问题:

1、同一行在发生若干当前页更新后, 会产生多个tuple版本以及对应的ctid: 例如 ctid1 ... ctid6 (ctid6为最新版本可见)

2、delete from x where ctid=(select ctid from xxx where ...) -- 如果不是HOT ctid6 传给delete ; 如果是HOT, 传ctid1 ?

3、在delete发生前, 如果发生了 vacuum HOT

3.1、vacuum 后会收缩ctid链路, 索引内存储的是ctid1, 在heap page内ctid1指向ctid6, 并且所有垃圾tuple版本会全部被回收. -- page内ctid1到ctid5对应tuple都是旧版本(对当前delete语句不可见)符合vacuum条件, 因为是在delete发起之前就产生的旧版本. vacuum 收缩ctid, 入口变成index->ctid1->ctid6->tuple

3.2 如果不是vacuum HOT, 那么ctid1-ctid5都会回收, 只剩下ctid6. 也比较好理解, 不会有本文的争议.

4、假设是HOT, delete from x where ctid = (此时返回的是ctid1?) -- 能不能成功delete? 有没有问题?

1、建表和索引

create table tbl (id int, c1 int, c2 int);   
create index idx_tbl_1 on tbl (c1);  

2、关闭autovacuum

alter table tbl set (autovacuum_enabled =false);  

3、插入1条记录

insert into tbl values (1,1,1);  

4、更新10次 (非索引字段)

update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2;  
update tbl set c2=2 returning ctid,*;  

5、会话1:

set enable_indexonlyscan =off;  
set enable_bitmapscan =off;  
set enable_indexscan =on;  
set enable_seqscan=off;  
alter function pg_sleep(double precision) cost 100000;  -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.   
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1)  and exists (select 1 from pg_sleep(30));  

这样计划就很清晰了, 优先执行select, 然后sleep, 然后再delete.

postgres=# explain delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1)  and exists (select 1 from pg_sleep(30));  
                                        QUERY PLAN                                          
------------------------------------------------------------------------------------------  
 Delete on tbl  (cost=251.06..252.17 rows=0 width=0)  
   InitPlan 1 (returns $0)  
     ->  Limit  (cost=0.15..1.05 rows=1 width=6)  
           ->  Index Scan using idx_tbl_1 on tbl tbl_1  (cost=0.15..9.13 rows=10 width=6)  
                 Index Cond: (c1 = 1)  
   InitPlan 2 (returns $1)  
     ->  Function Scan on pg_sleep  (cost=250.00..250.01 rows=1 width=0)  
   ->  Result  (cost=0.00..1.11 rows=1 width=6)  
         One-Time Filter: $1  
         ->  Tid Scan on tbl  (cost=0.00..1.11 rows=1 width=6)  
               TID Cond: (ctid = $0)  
(11 rows)  

6、会话2(30秒内): vacuum 表, HOT收缩

postgres=# vacuum verbose tbl;  -- 由于是HOT, 所以索引只有1个版本, 没有垃圾tuple    
INFO:  vacuuming "public.tbl"  
INFO:  table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 5431698  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
VACUUM  
postgres=# select ctid,* from tbl;  
  ctid  | id | c1 | c2   
--------+----+----+----  
 (0,11) |  1 |  1 |  2  
(1 row)  

7、观察会话1结果? 删除成功.

DELETE 1    

再次执行垃圾回收, 发现了1个垃圾索引版本, 1个垃圾tuple版本.

postgres=# vacuum verbose tbl;   
INFO:  vacuuming "public.tbl"  
INFO:  scanned index "idx_tbl_1" to remove 1 row versions  
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s  
INFO:  table "tbl": removed 1 dead item identifiers in 1 pages  
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s  
INFO:  index "idx_tbl_1" now contains 0 row versions in 2 pages  
DETAIL:  1 index row versions were removed.  
0 index pages were newly deleted.  
0 index pages are currently deleted, of which 0 are currently reusable.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
INFO:  table "tbl": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 5431699  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
INFO:  table "tbl": truncated 1 to 0 pages  
DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s  
VACUUM  

其他, 如果采用seqscan则毫无疑问的select会直接返回ctid6, 第5步改成如下, 一样不会有问题:

set enable_indexonlyscan =off;  
set enable_bitmapscan =off;  
set enable_indexscan =off;  
set enable_seqscan=on;  
alter function pg_sleep(double precision) cost 100000;  -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.   
delete from tbl where ctid=(select ctid from tbl where c1=1 limit 1)  and exists (select 1 from pg_sleep(30));  

1、建表和索引

create table tbl (id int, c1 int, c2 int);     
create index idx_tbl_1 on tbl (c1);    

2、关闭autovacuum

alter table tbl set (autovacuum_enabled =false);    

3、插入1条记录

insert into tbl values (1,1,1);    

4、更新10次 (非索引字段)

update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2;    
update tbl set c2=2 returning ctid,*;    

5、会话1:
index scan

set enable_indexonlyscan =off;    
set enable_bitmapscan =off;    
set enable_indexscan =on;    
set enable_seqscan=off;    
alter function pg_sleep(double precision) cost 100000;  -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.     
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));   

或 seqscan

set enable_indexonlyscan =off;    
set enable_bitmapscan =off;    
set enable_indexscan =off;    
set enable_seqscan=on;    
alter function pg_sleep(double precision) cost 100000;  -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.     
select ctid from tbl where c1=1 and exists(select 1 from pg_sleep(30));   

6、会话2(30秒内): vacuum 表, HOT收缩

postgres=# vacuum verbose tbl;  -- 由于是HOT, 所以索引只有1个版本, 没有垃圾tuple      
INFO:  vacuuming "public.tbl"    
INFO:  table "tbl": found 10 removable, 1 nonremovable row versions in 1 out of 1 pages    
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 5431698    
Skipped 0 pages due to buffer pins, 0 frozen pages.    
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.    
VACUUM    

7、会话1, seq scan, index scan返回的ctid都是11.

  ctid    
--------  
 (0,11)  
(1 row)  

HOT查询返回的并不是index中存储的ctid1, 虽然入口是ctid1.

create table tbl (id int, c1 int, c2 int);     
create index idx_tbl_1 on tbl (id);    
insert into tbl values (1,1,1);  
set enable_indexonlyscan =off;    
set enable_bitmapscan =off;    
set enable_indexscan =on;    
set enable_seqscan=off;    
alter function pg_sleep(double precision) cost 100000;  -- 把代价调大, 优先执行select, 然后sleep, 然后再delete.     

并发问题复现:

session1:  update tbl set id = 3 where ctid in (select ctid FROM TBL where id=1 and exists (select 1 from pg_sleep(30)));   --  进行到一半,只进行了子查询结果为(0,1)     
session2:  update tbl set id=2 where id = 1;  返回更新了一行    
session1:  按子查询的ctid 做变更,会找到当前id已经是2的那一行, 将那一行的id改成3 返回更新成功,修改了一行 . 因为这里存在HOT, ctid(0,1) 链到 ctid(0,2) 再到tuple2的page offset.   

解决这个问题, 可以使用recheck, 或者RR模式隔离级别(相当于加锁或者抛出错误), 或者for update加锁 . 详见:

《PostgreSQL CTID 物理行号在并发DML时的隔离性问题》

《在PostgreSQL中实现update | delete limit - CTID扫描实践 (高效阅后即焚)》

期望 PostgreSQL 增加什么功能?

PolarDB for PostgreSQL云原生分布式开源数据库

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论