PostgreSQL 14 preview - tid range scan方法, 一种page级别应用可自控的并行扫描、处理方法, 结合RR snapshot export功能

digoal 2021-01-02

1658

作者

digoal

日期

2021-02-28

背景

什么是TID?

可以参考PG的page layout, heap table是以page来组织的, 每个page里面若干tuple. TID 是tuple的寻址地址: (pageid, itemid), pageid即第几个数据块, itemid即这个page内的第几条记录.
例如pagesize=8k时, tid=(10,1)表示第11个数据块的第一条记录. (page从0开始, item从1开始)

postgres=# create table tidtest(id int, info text); CREATE TABLE postgres=# insert into tidtest select generate_series(1,10000), md5(Random()::text); INSERT 0 10000 postgres=# select ctid,* from tidtest limit 1000; ctid | id | info ---------+-----+---------------------------------- (0,1) | 1 | 37118b76c2b8c62f78e785d52403f173 (0,2) | 2 | 14ff83346ede4618d14b66822387c710 (0,3) | 3 | a208b625a31d192e5e39bdf26a76adbb ...... (0,119) | 119 | fcb98794a1a326382954f1faddf27386 (0,120) | 120 | 0f5616b4c4d620095f0fefc5e767fb22 (1,1) | 121 | 8d9300a963198f97bacdfb2850192557 (1,2) | 122 | 15d1e0f71388652b15fa1c20a440f0bb ....

PostgreSQL 14以前的版本支持tid scan, 即, 我们可以直接指定要查询的行号. 他只需要扫描一个page. 通过itemid offset直接拿到tuple, 所以速度非常快.

```
postgres=# select ctid,* from tidtest where ctid='(10,1)';
ctid | id | info
--------+------+----------------------------------
(10,1) | 1201 | 598591b1b45498bdbc05c5aea2cb09e3
(1 row)

postgres=# explain select ctid,* from tidtest where ctid='(10,1)';
QUERY PLAN

Tid Scan on tidtest (cost=0.00..1.11 rows=1 width=43)
TID Cond: (ctid = '(10,1)'::tid)
(2 rows)
```

那么有没有可能一次获取一个page的数据呢?

PostgreSQL 14可以了

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=bb437f995d47405ecd92cf66df71f7f7e40ed460

```
Add TID Range Scans to support efficient scanning ranges of TIDs
author David Rowley drowley@postgresql.org
Sat, 27 Feb 2021 09:59:36 +0000 (22:59 +1300)
committer David Rowley drowley@postgresql.org
Sat, 27 Feb 2021 09:59:36 +0000 (22:59 +1300)
commit bb437f995d47405ecd92cf66df71f7f7e40ed460
tree 0ee50f8a501e1ecc30d5cfd0eeb6ed0bcd41e2b2 tree | snapshot
parent f4adc41c4f92cc91d507b19e397140c35bb9fd71 commit | diff
Add TID Range Scans to support efficient scanning ranges of TIDs

This adds a new executor node named TID Range Scan. The query planner
will generate paths for TID Range scans when quals are discovered on base
relations which search for ranges on the table's ctid column. These
ranges may be open at either end. For example, WHERE ctid >= '(10,0)';
will return all tuples on page 10 and over.

To support this, two new optional callback functions have been added to
table AM. scan_set_tidrange is used to set the scan range to just the
given range of TIDs. scan_getnextslot_tidrange fetches the next tuple
in the given range.

For AMs were scanning ranges of TIDs would not make sense, these functions
can be set to NULL in the TableAmRoutine. The query planner won't
generate TID Range Scan Paths in that case.

Author: Edmund Horner, David Rowley
Reviewed-by: David Rowley, Tomas Vondra, Tom Lane, Andres Freund, Zhihong Yu
Discussion: https://postgr.es/m/CAMyN-kB-nFTkF=VA_JPwFNo08S0d-Yk0F741S2B7LDmYAi8eyA@mail.gmail.com
```

select * from tidtest where ctid >= '(10,0)' and ctid < '(11,0)';

range tid scan有什么应用? 我暂时能想到的, 例如