PgSQL · 应用案例 · 什么情况下可能表膨胀

PostgreSQL中文社区 2020-09-01

1320

背景

PostgreSQL heap TABLE AM引擎，使用多版本来解决快照问题，版本处于当前数据文件中，有垃圾回收进程进行回收，那么哪些垃圾不能被回收呢？

WAL是PG的REDO文件，哪些WAL不能被回收重复利用？什么情况下可能会一直增长不清理呢？

heap或INDEX的膨胀有些时候并不是因为回收慢，而是有些是无法被回收的垃圾，通常被称为膨胀点。本文对膨胀点进行逐一解释(回收慢不解释，可能: worker太少,io太差,worker睡眠太长或频繁,vacuum mem太少放不下所有垃圾行CTID导致多次扫描索引,launcher唤醒周期太长,表太大未支持并行垃圾回收, …)。

除了snapshot too old以外，12新增AM例如zedstore, zheap将彻底解决heap的垃圾版本带来的膨胀问题。

全局catalog 膨胀点

全局catalog包括tbs,db,role等，如下：

postgres=# select relname from pg_class   
where reltablespace in   
(select oid from pg_tablespace where spcname='pg_global')   
and relkind='r';    
        relname          
-----------------------  
 pg_authid  
 pg_subscription  
 pg_database  
 pg_db_role_setting  
 pg_tablespace  
 pg_pltemplate  
 pg_auth_members  
 pg_shdepend  
 pg_shdescription  
 pg_replication_origin  
 pg_shseclabel  
(11 rows)

哪些垃圾不能被回收？

1、当前实例中最老事务快照之后产生的垃圾记录

2、SLOT catalog_xmin后产生的垃圾记录

3、年龄小于vacuum_defer_cleanup_age设置的垃圾记录

4、备库开启了feedback后，备库中最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录

什么时候可能膨胀？

1、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),

2、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),

3、vacuum_defer_cleanup_age 设置太大

4、整个实例中的 : 长事务, 慢SQL, 慢2pc,

库级catalog 膨胀点

库级catalog包括如下:

postgres=#   
select relname from pg_class where relkind='r'   
and relnamespace ='pg_catalog'::regnamespace   
except   
select relname from pg_class where reltablespace in   
(select oid from pg_tablespace where spcname = 'pg_global')   
and relkind='r';

         relname           
-------------------------  
 pg_language  
 pg_sequence  
 pg_largeobject  
 pg_policy  
 pg_ts_template  
 pg_attrdef  
 pg_operator  
 pg_ts_parser  
 pg_depend  
 pg_attribute  
 pg_ts_config  
 pg_conversion  
 pg_inherits  
 pg_subscription_rel  
 pg_publication  
 pg_foreign_table  
 pg_largeobject_metadata  
 pg_ts_dict  
 pg_statistic  
 pg_init_privs  
 pg_opfamily  
 pg_type  
 pg_am  
 pg_default_acl  
 pg_proc  
 pg_index  
 pg_rewrite  
 pg_statistic_ext  
 pg_constraint  
 pg_opclass  
 pg_partitioned_table  
 pg_namespace  
 pg_trigger  
 pg_enum  
 pg_amop  
 pg_event_trigger  
 pg_collation  
 pg_foreign_server  
 pg_foreign_data_wrapper  
 pg_user_mapping  
 pg_description  
 pg_cast  
 pg_publication_rel  
 pg_aggregate  
 pg_transform  
 pg_extension  
 pg_class  
 pg_seclabel  
 pg_amproc  
 pg_range  
 pg_ts_config_map  
(51 rows)

哪些垃圾不能被回收？

1、当前数据库中最老事务快照之后产生的垃圾记录

2、年龄小于vacuum_defer_cleanup_age设置的垃圾记录

3、备库开启了feedback后，备库返回的最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录

4、SLOT catalog_xmin后产生的垃圾记录(create table, drop table, pg_class, pg_att等)。影响全局（所有DB）

什么时候可能膨胀？

1、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),

2、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),

3、vacuum_defer_cleanup_age 设置太大

4、当前数据库中的 : 长事务, 慢SQL, 慢2pc,

普通对象膨胀点

用户创建的表、物化视图、索引等。

哪些垃圾不能被回收？

1、当前数据库中最老事务快照之后产生的垃圾记录

2、年龄小于vacuum_defer_cleanup_age设置的垃圾记录

3、备库开启了feedback后，备库返回的最老事务快照(仅指 global xmin)之后产生的垃圾记录。(catalog xmin无影响)

什么时候可能膨胀？

1、standby 开启了 feedback (且standby有慢事务, LONG SQL),

2、vacuum_defer_cleanup_age 设置太大

3、当前数据库中的 : 长事务, 慢SQL, 慢2pc,

WAL文件膨胀点

wal是指PG的REDO文件。

哪些WAL不能被回收或不能被重复利用？

1、从最后一次已正常结束的检查点(检查点开始时刻, 不是结束时刻)开始，所有的REDO文件都不能被回收

2、归档开启后，所有未归档的REDO。（.ready对应的redo文件）

3、启用SLOT后，还没有被SLOT消费的REDO文件

4、设置wal_keep_segments时，当REDO文件数还没有达到wal_keep_segments个时。

什么时候可能膨胀？

1、archive failed ，归档失败

2、user defined archive BUG，用户开启了归档，但是没有正常的将.ready改成.done，使得WAL堆积

3、wal_keep_segments 设置太大，WAL保留过多

4、max_wal_size设置太大，并且checkpoint_completion_target设置太大，导致检查点跨度很大，保留WAL文件很多

5、slot slow(dead) ，包括(physical | logical replication) ， restart_lsn 开始的所有WAL文件都要被保留

一些例子

1、创建slot

postgres=# select pg_create_logical_replication_slot('a','test_decoding');  
 pg_create_logical_replication_slot   
------------------------------------  
 (a,0/92C9C038)  
(1 row)

2、查看slot的位点信息

postgres=# select * from pg_get_replication_slots();  
 slot_name |    plugin     | slot_type | datoid | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn   
-----------+---------------+-----------+--------+-----------+--------+------------+------+--------------+-------------+---------------------  
 a         | test_decoding | logical   |  13585 | f         | f      |            |      |      1982645 | 0/92C9BFE8  | 0/92C9C038  
(1 row)

3、查看catalog_xmin对应XID的事务提交时间，需要开启事务时间跟踪track_commit_timestamp

postgres=# select pg_xact_commit_timestamp(xmin),pg_xact_commit_timestamp(catalog_xmin) from pg_get_replication_slots();  
psql: ERROR:  could not get commit timestamp data  
HINT:  Make sure the configuration parameter "track_commit_timestamp" is set.

4、从RESTART_LSN找到对应WAL文件，从文件中也可以查到大概的时间。

postgres=# select pg_walfile_name(restart_lsn) from pg_get_replication_slots();  
     pg_walfile_name        
--------------------------  
 000000010000000000000092  
(1 row)

postgres=# select * from pg_stat_file('pg_wal/000000010000000000000092');  
   size   |         access         |      modification      |         change         | creation | isdir   
----------+------------------------+------------------------+------------------------+----------+-------  
 16777216 | 2019-06-29 22:56:16+08 | 2019-07-01 09:50:16+08 | 2019-07-01 09:50:16+08 |          | f  
(1 row)

postgres=# select * from pg_ls_waldir() where name='000000010000000000000092';  
           name           |   size   |      modification        
--------------------------+----------+------------------------  
 000000010000000000000092 | 16777216 | 2019-07-01 09:50:16+08  
(1 row)

5、建表

postgres=# create table b(id int);  
CREATE TABLE  
postgres=# insert into b values (1);  
INSERT 0 1

6、消费SLOT WAL

postgres=# select * from pg_logical_slot_get_changes('a',pg_current_wal_lsn(),1);  
    lsn     |   xid   |      data        
------------+---------+----------------  
 0/92C9C0C0 | 1982645 | BEGIN 1982645  
 0/92CA4A40 | 1982645 | COMMIT 1982645  
(2 rows)

postgres=# select * from pg_logical_slot_get_changes('a',pg_current_wal_lsn(),1);  
    lsn     |   xid   |                 data                    
------------+---------+---------------------------------------  
 0/92CA4A78 | 1982646 | BEGIN 1982646  
 0/92CA4A78 | 1982646 | table public.b: INSERT: id[integer]:1  
 0/92CA4AE8 | 1982646 | COMMIT 1982646  
(3 rows)

7、删除记录

postgres=# delete from b;  
DELETE 1

8、垃圾回收，正常。本地表垃圾不受slot catalog_xmin影响

postgres=# vacuum verbose b;  
psql: INFO:  vacuuming "public.b"  
psql: INFO:  "b": removed 1 row versions in 1 pages  
psql: INFO:  "b": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1982648  
There were 0 unused item identifiers.  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
0 pages are entirely empty.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
psql: INFO:  "b": truncated 1 to 0 pages  
DETAIL:  CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s  
VACUUM

9、建表，删表，使得CATALOG发生变化，产生CATALOG垃圾

postgres=# create table c (id int);  
CREATE TABLE  
postgres=# drop table c;  
DROP TABLE  
postgres=# create table c (id int);  
CREATE TABLE  
postgres=# drop table c;  
DROP TABLE

10、垃圾回收catalog，无法回收SLOT后产生的CATALOG垃圾，因为还需要这个CATALOG版本去解析对应WAL的LOGICAL 日志

postgres=# vacuum verbose pg_class;  
psql: INFO:  vacuuming "pg_catalog.pg_class"  
psql: INFO:  "pg_class": found 0 removable, 465 nonremovable row versions in 13 out of 13 pages  
DETAIL:  2 dead row versions cannot be removed yet, oldest xmin: 1982646  
There were 111 unused item identifiers.  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
0 pages are entirely empty.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
VACUUM

catalog 受影响

postgres=# vacuum verbose pg_attribute ;  
psql: INFO:  vacuuming "pg_catalog.pg_attribute"  
psql: INFO:  "pg_attribute": found 0 removable, 293 nonremovable row versions in 6 out of 62 pages  
DETAIL:  14 dead row versions cannot be removed yet, oldest xmin: 1982646  
There were 55 unused item identifiers.  
Skipped 0 pages due to buffer pins, 55 frozen pages.  
0 pages are entirely empty.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
VACUUM

11、长事务不影响其他库的垃圾回收

postgres

postgres=# begin;  
BEGIN  
postgres=# delete from a;  
DELETE 1

db1

db1=# create table b(id int);  
CREATE TABLE  
db1=# insert into b values (1);  
INSERT 0 1  
db1=# delete from b;  
DELETE 1  
db1=# vacuum verbose b;  
psql: INFO:  vacuuming "public.b"  
psql: INFO:  "b": removed 1 row versions in 1 pages  
psql: INFO:  "b": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages  
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 1982671  
There were 0 unused item identifiers.  
Skipped 0 pages due to buffer pins, 0 frozen pages.  
0 pages are entirely empty.  
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.  
psql: INFO:  "b": truncated 1 to 0 pages  
DETAIL:  CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s  
VACUUM

小结

1 全局catalog 膨胀点

哪些垃圾不能被回收？

1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录

2、当前实例中最老事务快照之后产生的垃圾记录

3、SLOT catalog_xmin后产生的垃圾记录

4、备库开启了feedback后，备库中最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录

什么时候可能膨胀？

1、vacuum_defer_cleanup_age 设置太大

2、整个实例中的 : 长事务, 慢SQL, 慢2pc,

3、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),

4、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),

2 库级catalog 膨胀点

哪些垃圾不能被回收？

1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录

2、当前数据库中最老事务快照之后产生的垃圾记录

3、备库开启了feedback后，备库返回的最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录

4、SLOT catalog_xmin后产生的垃圾记录(create table, drop table, pg_class, pg_att等)。影响全局（所有DB）

什么时候可能膨胀？

1、vacuum_defer_cleanup_age 设置太大

2、当前数据库中的 : 长事务, 慢SQL, 慢2pc,

3、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),

4、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),

普通对象膨胀点

用户创建的表、物化视图、索引等。

哪些垃圾不能被回收？

1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录

2、当前数据库中最老事务快照之后产生的垃圾记录

3、备库开启了feedback后，备库返回的最老事务快照(仅指 global xmin)之后产生的垃圾记录。(catalog xmin无影响)

什么时候可能膨胀？

1、vacuum_defer_cleanup_age 设置太大

2、当前数据库中的 : 长事务, 慢SQL, 慢2pc,

3、standby 开启了 feedback (且standby有慢事务, LONG SQL),

WAL文件膨胀点

wal是指PG的REDO文件。

哪些WAL不能被回收或不能被重复利用？

1、从最后一次已正常结束的检查点(检查点开始时刻, 不是结束时刻)开始，所有的REDO文件都不能被回收

2、归档开启后，所有未归档的REDO。（.ready对应的redo文件）

3、启用SLOT后，还没有被SLOT消费的REDO文件

4、设置wal_keep_segments时，当REDO文件数还没有达到wal_keep_segments个时。

什么时候可能膨胀？

1、archive failed ，归档失败

2、user defined archive BUG，用户开启了归档，但是没有正常的将.ready改成.done，使得WAL堆积

3、wal_keep_segments 设置太大，WAL保留过多

4、max_wal_size设置太大，并且checkpoint_completion_target设置太大，导致检查点跨度很大，保留WAL文件很多

5、slot slow(dead) ，包括(physical | logical replication) ， restart_lsn 开始的所有WAL文件都要被保留

参考

switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))  
{  
  case HEAPTUPLE_DEAD:

    *  
     * Ordinarily, DEAD tuples would have been removed by  
     * heap_page_prune(), but it's possible that the tuple  
     * state changed since heap_page_prune() looked.  In  
     * particular an INSERT_IN_PROGRESS tuple could have  
     * changed to DEAD if the inserter aborted.  So this  
     * cannot be considered an error condition.  
     *  
     * If the tuple is HOT-updated then it must only be  
     * removed by a prune operation; so we keep it just as if  
     * it were RECENTLY_DEAD.  Also, if it's a heap-only  
     * tuple, we choose to keep it, because it'll be a lot  
     * cheaper to get rid of it in the next pruning pass than  
     * to treat it like an indexed tuple. Finally, if index  
     * cleanup is disabled, the second heap pass will not  
     * execute, and the tuple will not get removed, so we must  
     * treat it like any other dead tuple that we choose to  
     * keep.  
     *  
     * If this were to happen for a tuple that actually needed  
     * to be deleted, we'd be in trouble, because it'd  
     * possibly leave a tuple below the relation's xmin  
     * horizon alive.  heap_prepare_freeze_tuple() is prepared  
     * to detect that case and abort the transaction,  
     * preventing corruption.  
     */  
    if (HeapTupleIsHotUpdated(&tuple) ||  
        HeapTupleIsHeapOnly(&tuple) ||  
        params->index_cleanup == VACOPT_TERNARY_DISABLED)  
      nkeep += 1;  
    else  
      tupgone = true; * we can delete the tuple */  
    all_visible = false;  
    break;

  case HEAPTUPLE_RECENTLY_DEAD:

    *  
     * If tuple is recently deleted then we must not remove it  
     * from relation.  
     */  
    nkeep += 1;  
    all_visible = false;  
    break;

src/backend/access/heap/heapam_visibility.c

*       HeapTupleSatisfiesVacuum()  
 *                visible to any running transaction, used by VACUUM

/*  
 * HeapTupleSatisfiesVacuum  
 *  
 *      Determine the status of tuples for VACUUM purposes.  Here, what  
 *      we mainly want to know is if a tuple is potentially visible to *any*  
 *      running transaction.  If so, it can't be removed yet by VACUUM.  
 *  
 * OldestXmin is a cutoff XID (obtained from GetOldestXmin()).  Tuples  
 * deleted by XIDs >= OldestXmin are deemed "recently dead"; they might  
 * still be visible to some open transaction, so we can't remove them,  
 * even if we see that the deleting transaction has committed.  
 */  
HTSV_Result  
HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,  
                                                 Buffer buffer)  


        *  
         * Deleter committed, but perhaps it was recent enough that some open  
         * transactions could still see the tuple.  
         */  
        if (!TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin))  
                return HEAPTUPLE_RECENTLY_DEAD;  


                else if (TransactionIdDidCommit(xmax))  
                {  
                        *  
                         * The multixact might still be running due to lockers.  If the  
                         * updater is below the xid horizon, we have to return DEAD  
                         * regardless -- otherwise we could end up with a tuple where the  
                         * updater has to be removed due to the horizon, but is not pruned  
                         * away.  It's not a problem to prune that tuple, because any  
                         * remaining lockers will also be present in newer tuple versions.  
                         */  
                        if (!TransactionIdPrecedes(xmax, OldestXmin))  
                                return HEAPTUPLE_RECENTLY_DEAD;  

                        return HEAPTUPLE_DEAD;  
                }

https://www.postgresql.org/docs/12/protocol-replication.html

Hot Standby feedback message (F)  
Byte1('h')  
Identifies the message as a Hot Standby feedback message.  

Int64  
The client's system clock at the time of transmission, as microseconds since midnight on 2000-01-01.  

Int32  
The standby's current global xmin, excluding the catalog_xmin from any replication slots. If both this value and the following catalog_xmin are 0 this is treated as a notification that Hot Standby feedback will no longer be sent on this connection. Later non-zero messages may reinitiate the feedback mechanism.  

Int32  
The epoch of the global xmin xid on the standby.  


Int32  
The lowest catalog_xmin of any replication slots on the standby. Set to 0 if no catalog_xmin exists on the standby or if hot standby feedback is being disabled.  


Int32  
The epoch of the catalog_xmin xid on the standby.

加

入

我

们

扫描钉钉群二维码，每周免费看直播

postgresql

文章转载自PostgreSQL中文社区，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PgSQL · 应用案例 · 什么情况下可能表膨胀

背景

全局catalog 膨胀点

哪些垃圾不能被回收？

什么时候可能膨胀？

库级catalog 膨胀点

哪些垃圾不能被回收？

什么时候可能膨胀？

普通对象 膨胀点

哪些垃圾不能被回收？

什么时候可能膨胀？

WAL文件 膨胀点

哪些WAL不能被回收 或 不能被重复利用？

什么时候可能膨胀？

一些例子

小结

1 全局catalog 膨胀点

哪些垃圾不能被回收？

什么时候可能膨胀？

2 库级catalog 膨胀点

哪些垃圾不能被回收？

什么时候可能膨胀？

普通对象 膨胀点

哪些垃圾不能被回收？

什么时候可能膨胀？

WAL文件 膨胀点

哪些WAL不能被回收 或 不能被重复利用？

什么时候可能膨胀？

参考

评论

普通对象膨胀点

WAL文件膨胀点

哪些WAL不能被回收或不能被重复利用？

普通对象膨胀点

WAL文件膨胀点

哪些WAL不能被回收或不能被重复利用？