PostgreSQL实例—VACUUM收缩表数据规则

原创李先生 2023-02-06

2076

生产中遇到了VACUUM 后表大小不变，让我产生了很多疑虑：为什么VACUUM没有让PostgreSQL表变小，为什么VACUUM不收缩数据文件？

那么本文将讲解其中的秘密，了解清理后空间不会返回给操作系统的原因。

OS版本	数据库版本
redhat 7.6	PostgreSQL 15.1

元组可见性

PostgreSQL 中的 VACUUM，首先需要知道 PostgreSQL 如何处理可见性。整个概念基于一组隐藏的列，这些列是行的一部分。以下是它的工作原理：

postgres=# CREATE TABLE t_lxs (id int);
CREATE TABLE
postgres=# INSERT INTO t_lxs VALUES (1),(2),(3);
INSERT 0 3
postgres=# INSERT INTO t_lxs VALUES (4),(5),(6);
INSERT 0 3
postgres=#

为了方便测试，刚创建的表t_lxs只包含一列id，这里需要注意的是，我使用了两个单独的事务将来插入数据，每个事务分别插入了 3 行，隐藏列中就能够非常清楚看到这一点：

postgres=# SELECT ctid, xmin, xmax, cmin, cmax, * FROM t_lxs;
 ctid  | xmin | xmax | cmin | cmax | id 
-------+------+------+------+------+----
 (0,1) |  752 |    0 |    0 |    0 |  1
 (0,2) |  752 |    0 |    0 |    0 |  2
 (0,3) |  752 |    0 |    0 |    0 |  3
 (0,4) |  753 |    0 |    0 |    0 |  4
 (0,5) |  753 |    0 |    0 |    0 |  5
 (0,6) |  753 |    0 |    0 |    0 |  6
(6 rows)

postgres=#

xmin、xmax，cmin和cmax都是包含事务 ID 的隐藏列。

可以看到，前三行数据由事务号 752 写入，后三行数据是使用事务号 753 写入。
隐藏列将处理可见性，PostgreSQL将根据这些列确定某个事务是否可以看到某行。

下面再运行 UPDATE 语句，看看会发生什么：

postgres=# begin;
BEGIN
postgres=*# UPDATE t_lxs SET id = id * 10 WHERE id > 5 RETURNING *;
 id 
----
 60
(1 row)

UPDATE 1
postgres=*# SELECT ctid, xmin, xmax, cmin, cmax, * FROM t_lxs;
 ctid  | xmin | xmax | cmin | cmax | id 
-------+------+------+------+------+----
 (0,1) |  752 |    0 |    0 |    0 |  1
 (0,2) |  752 |    0 |    0 |    0 |  2
 (0,3) |  752 |    0 |    0 |    0 |  3
 (0,4) |  753 |    0 |    0 |    0 |  4
 (0,5) |  753 |    0 |    0 |    0 |  5
 (0,7) |  754 |    0 |    0 |    0 | 60
(6 rows)

postgres=*#

更改了其中一行。但我们看到了该行在磁盘上的物理位置的 CTID发生了变化，(0, 6)不见了，因为 PostgreSQL 的MVCC是复制行。如果再运行一次UPDATE，该行将被再次复制：

postgres=*# UPDATE t_lxs SET id = id * 10 WHERE id > 5 RETURNING *;
 id  
-----
 600
(1 row)

UPDATE 1
postgres=*# UPDATE t_lxs SET id = id * 10 WHERE id > 5 RETURNING *;
  id  
------
 6000
(1 row)

UPDATE 1
postgres=*#

在PostgreSQL中，这些复制行很重要，因为必须保留旧行。否则ROLLBACK将无法工作，因此必须保留旧版本。
再次查看表格：

postgres=*# SELECT ctid, xmin, xmax, cmin, cmax, * FROM t_lxs;
 ctid  | xmin | xmax | cmin | cmax |  id  
-------+------+------+------+------+------
 (0,1) |  752 |    0 |    0 |    0 |    1
 (0,2) |  752 |    0 |    0 |    0 |    2
 (0,3) |  752 |    0 |    0 |    0 |    3
 (0,4) |  753 |    0 |    0 |    0 |    4
 (0,5) |  753 |    0 |    0 |    0 |    5
 (0,9) |  754 |    0 |    2 |    2 | 6000
(6 rows)

postgres=*# commit;
COMMIT
postgres=#

物理位置(0,5) 到 (0,9) 之间的所有内容都充满了需要删除的死行。

VACUUM 清楚行

COMMIT 过程中无法清理死行。因此清理过程必须异步完成。这正是VACUUM所做的。现在运行，看看会发生什么：

postgres=# vacuum verbose t_lxs;
INFO:  vacuuming "postgres.public.t_lxs"
INFO:  finished vacuuming "postgres.public.t_lxs": index scans: 0
pages: 0 removed, 1 remain, 1 scanned (100.00% of total)
tuples: 3 removed, 6 remain, 0 are dead but not yet removable
removable cutoff: 755, which was 0 XIDs old when operation ended
new relfrozenxid: 752, which is 1 XIDs ahead of previous value
index scan not needed: 0 pages from table (0.00% of total) had 0 dead item identifiers removed
I/O timings: read: 0.047 ms, write: 0.000 ms
avg read rate: 47.637 MB/s, avg write rate: 71.456 MB/s
buffer usage: 4 hits, 4 misses, 6 dirtied
WAL usage: 3 records, 3 full page images, 10848 bytes
system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
VACUUM
postgres=#

VACUUM寻找不再被别的任何事务任何人看到的行。这些行可以位于数据文件的中间某处。发生的事情是VACUUM允许 PostgreSQL 重用该空间，但是，它不会将该空间返回给操作系统。

VACUUM 截断

但是，该规则有一个例外：

postgres=# SELECT pg_relation_size('t_lxs');
 pg_relation_size 
------------------
             8192
(1 row)

postgres=# DELETE FROM t_lxs;
DELETE 6
postgres=# SELECT pg_relation_size('t_lxs');
 pg_relation_size 
------------------
             8192
(1 row)

postgres=#

在DELETE语句之后，该表仍将保持其大小。因为清理是异步完成的。因此VACUUM可以调用删除那些行：

postgres=# VACUUM t_lxs;
VACUUM
postgres=#

这有点特殊。规则是：如果从表中的某个位置开始，ALL rows are dead，VACUUM可以截断表。

postgres=# SELECT pg_relation_size('t_lxs');
 pg_relation_size 
------------------
                0
(1 row)

postgres=#

毕竟，在大表中，数据文件的末尾总是有几行（在正常情况下）。因此，不要指望VACUUM来收缩表。

VACUUM FULL

对抗表格膨胀的一种方法是使用VACUUM FULL. 但是VACUUM FULL需要一个表锁，如果处于关键业务环境中，这可能是一个真正的问题。

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者

文章被以下合辑收录

PostgreSQL分享—李先生（共44篇）

日常的PostgreSQL分享，包含运维、升级、安装部署、疑难问题等

PostgreSQL实例—VACUUM收缩表数据规则

元组可见性

VACUUM 清楚行

VACUUM 截断

VACUUM FULL

文章被以下合辑收录

评论