暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PgSQL · 应用案例 · 什么情况下可能表膨胀

1258

背景

PostgreSQL heap TABLE AM引擎,使用多版本来解决快照问题,版本处于当前数据文件中,有垃圾回收进程进行回收,那么哪些垃圾不能被回收呢?

WAL是PG的REDO文件,哪些WAL不能被回收重复利用?什么情况下可能会一直增长不清理呢?
heap或INDEX的膨胀有些时候并不是因为回收慢,而是有些是无法被回收的垃圾,通常被称为膨胀点。本文对膨胀点进行逐一解释(回收慢不解释,可能: worker太少,io太差,worker睡眠太长或频繁,vacuum mem太少放不下所有垃圾行CTID导致多次扫描索引,launcher唤醒周期太长,表太大未支持并行垃圾回收, …)。
除了snapshot too old以外,12新增AM例如zedstore, zheap将彻底解决heap的垃圾版本带来的膨胀问题。

全局catalog 膨胀点

全局catalog包括tbs,db,role等,如下:
    postgres=# select relname from pg_class   
    where reltablespace in
    (select oid from pg_tablespace where spcname='pg_global')
    and relkind='r';
    relname
    -----------------------
    pg_authid
    pg_subscription
    pg_database
    pg_db_role_setting
    pg_tablespace
    pg_pltemplate
    pg_auth_members
    pg_shdepend
    pg_shdescription
    pg_replication_origin
    pg_shseclabel
    (11 rows)

    哪些垃圾不能被回收?

    1、当前实例中最老事务快照之后产生的垃圾记录
    2、SLOT catalog_xmin后产生的垃圾记录
    3、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
    4、备库开启了feedback后,备库中最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录

    什么时候可能膨胀?

    1、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
    2、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
    3、vacuum_defer_cleanup_age 设置太大
    4、整个实例中的 : 长事务, 慢SQL, 慢2pc,

    库级catalog 膨胀点

    库级catalog包括如下:
      postgres=#   
      select relname from pg_class where relkind='r'
      and relnamespace ='pg_catalog'::regnamespace
      except
      select relname from pg_class where reltablespace in
      (select oid from pg_tablespace where spcname = 'pg_global')
      and relkind='r';

      relname
      -------------------------
      pg_language
      pg_sequence
      pg_largeobject
      pg_policy
      pg_ts_template
      pg_attrdef
      pg_operator
      pg_ts_parser
      pg_depend
      pg_attribute
      pg_ts_config
      pg_conversion
      pg_inherits
      pg_subscription_rel
      pg_publication
      pg_foreign_table
      pg_largeobject_metadata
      pg_ts_dict
      pg_statistic
      pg_init_privs
      pg_opfamily
      pg_type
      pg_am
      pg_default_acl
      pg_proc
      pg_index
      pg_rewrite
      pg_statistic_ext
      pg_constraint
      pg_opclass
      pg_partitioned_table
      pg_namespace
      pg_trigger
      pg_enum
      pg_amop
      pg_event_trigger
      pg_collation
      pg_foreign_server
      pg_foreign_data_wrapper
      pg_user_mapping
      pg_description
      pg_cast
      pg_publication_rel
      pg_aggregate
      pg_transform
      pg_extension
      pg_class
      pg_seclabel
      pg_amproc
      pg_range
      pg_ts_config_map
      (51 rows)

      哪些垃圾不能被回收?

      1、当前数据库中最老事务快照之后产生的垃圾记录
      2、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
      3、备库开启了feedback后,备库返回的最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录
      4、SLOT catalog_xmin后产生的垃圾记录(create table, drop table, pg_class, pg_att等)。影响全局(所有DB)

      什么时候可能膨胀?

      1、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
      2、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
      3、vacuum_defer_cleanup_age 设置太大
      4、当前数据库中的 : 长事务, 慢SQL, 慢2pc,

      普通对象 膨胀点

      用户创建的表、物化视图、索引等。

      哪些垃圾不能被回收?

      1、当前数据库中最老事务快照之后产生的垃圾记录
      2、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
      3、备库开启了feedback后,备库返回的最老事务快照(仅指 global xmin)之后产生的垃圾记录。(catalog xmin无影响)

      什么时候可能膨胀?

      1、standby 开启了 feedback (且standby有慢事务, LONG SQL),
      2、vacuum_defer_cleanup_age 设置太大
      3、当前数据库中的 : 长事务, 慢SQL, 慢2pc,

      WAL文件 膨胀点

      wal是指PG的REDO文件。

      哪些WAL不能被回收 或 不能被重复利用?

      1、从最后一次已正常结束的检查点(检查点开始时刻, 不是结束时刻)开始,所有的REDO文件都不能被回收
      2、归档开启后,所有未归档的REDO。(.ready对应的redo文件)
      3、启用SLOT后,还没有被SLOT消费的REDO文件
      4、设置wal_keep_segments时,当REDO文件数还没有达到wal_keep_segments个时。

      什么时候可能膨胀?

      1、archive failed ,归档失败
      2、user defined archive BUG,用户开启了归档,但是没有正常的将.ready改成.done,使得WAL堆积
      3、wal_keep_segments 设置太大,WAL保留过多
      4、max_wal_size设置太大,并且checkpoint_completion_target设置太大,导致检查点跨度很大,保留WAL文件很多
      5、slot slow(dead) ,包括(physical | logical replication) , restart_lsn 开始的所有WAL文件都要被保留

      一些例子

      1、创建slot
        postgres=# select pg_create_logical_replication_slot('a','test_decoding');  
        pg_create_logical_replication_slot
        ------------------------------------
        (a,0/92C9C038)
        (1 row)

        2、查看slot的位点信息

          postgres=# select * from pg_get_replication_slots();  
          slot_name | plugin | slot_type | datoid | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn
          -----------+---------------+-----------+--------+-----------+--------+------------+------+--------------+-------------+---------------------
          a | test_decoding | logical | 13585 | f | f | | | 1982645 | 0/92C9BFE8 | 0/92C9C038
          (1 row)
          3、查看catalog_xmin对应XID的事务提交时间,需要开启事务时间跟踪track_commit_timestamp
            postgres=# select pg_xact_commit_timestamp(xmin),pg_xact_commit_timestamp(catalog_xmin) from pg_get_replication_slots();  
            psql: ERROR: could not get commit timestamp data
            HINT: Make sure the configuration parameter "track_commit_timestamp" is set.
            4、从RESTART_LSN找到对应WAL文件,从文件中也可以查到大概的时间。
              postgres=# select pg_walfile_name(restart_lsn) from pg_get_replication_slots();  
              pg_walfile_name
              --------------------------
              000000010000000000000092
              (1 row)

              postgres=# select * from pg_stat_file('pg_wal/000000010000000000000092');
              size | access | modification | change | creation | isdir
              ----------+------------------------+------------------------+------------------------+----------+-------
              16777216 | 2019-06-29 22:56:16+08 | 2019-07-01 09:50:16+08 | 2019-07-01 09:50:16+08 | | f
              (1 row)

              postgres=# select * from pg_ls_waldir() where name='000000010000000000000092';
              name | size | modification
              --------------------------+----------+------------------------
              000000010000000000000092 | 16777216 | 2019-07-01 09:50:16+08
              (1 row)
              5、建表
                postgres=# create table b(id int);  
                CREATE TABLE
                postgres=# insert into b values (1);
                INSERT 0 1
                6、消费SLOT WAL
                  postgres=# select * from pg_logical_slot_get_changes('a',pg_current_wal_lsn(),1);  
                  lsn | xid | data
                  ------------+---------+----------------
                  0/92C9C0C0 | 1982645 | BEGIN 1982645
                  0/92CA4A40 | 1982645 | COMMIT 1982645
                  (2 rows)

                  postgres=# select * from pg_logical_slot_get_changes('a',pg_current_wal_lsn(),1);
                  lsn | xid | data
                  ------------+---------+---------------------------------------
                  0/92CA4A78 | 1982646 | BEGIN 1982646
                  0/92CA4A78 | 1982646 | table public.b: INSERT: id[integer]:1
                  0/92CA4AE8 | 1982646 | COMMIT 1982646
                  (3 rows)
                  7、删除记录
                    postgres=# delete from b;  
                    DELETE 1
                    8、垃圾回收,正常。本地表垃圾不受slot catalog_xmin影响
                      postgres=# vacuum verbose b;  
                      psql: INFO: vacuuming "public.b"
                      psql: INFO: "b": removed 1 row versions in 1 pages
                      psql: INFO: "b": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages
                      DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1982648
                      There were 0 unused item identifiers.
                      Skipped 0 pages due to buffer pins, 0 frozen pages.
                      0 pages are entirely empty.
                      CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
                      psql: INFO: "b": truncated 1 to 0 pages
                      DETAIL: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
                      VACUUM
                      9、建表,删表,使得CATALOG发生变化,产生CATALOG垃圾
                        postgres=# create table c (id int);  
                        CREATE TABLE
                        postgres=# drop table c;
                        DROP TABLE
                        postgres=# create table c (id int);
                        CREATE TABLE
                        postgres=# drop table c;
                        DROP TABLE
                        10、垃圾回收catalog,无法回收SLOT后产生的CATALOG垃圾,因为还需要这个CATALOG版本去解析对应WAL的LOGICAL 日志
                          postgres=# vacuum verbose pg_class;  
                          psql: INFO: vacuuming "pg_catalog.pg_class"
                          psql: INFO: "pg_class": found 0 removable, 465 nonremovable row versions in 13 out of 13 pages
                          DETAIL: 2 dead row versions cannot be removed yet, oldest xmin: 1982646
                          There were 111 unused item identifiers.
                          Skipped 0 pages due to buffer pins, 0 frozen pages.
                          0 pages are entirely empty.
                          CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
                          VACUUM
                          catalog 受影响
                            postgres=# vacuum verbose pg_attribute ;  
                            psql: INFO: vacuuming "pg_catalog.pg_attribute"
                            psql: INFO: "pg_attribute": found 0 removable, 293 nonremovable row versions in 6 out of 62 pages
                            DETAIL: 14 dead row versions cannot be removed yet, oldest xmin: 1982646
                            There were 55 unused item identifiers.
                            Skipped 0 pages due to buffer pins, 55 frozen pages.
                            0 pages are entirely empty.
                            CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
                            VACUUM
                            11、长事务不影响其他库的垃圾回收
                            postgres
                              postgres=# begin;  
                              BEGIN
                              postgres=# delete from a;
                              DELETE 1
                              db1
                                db1=# create table b(id int);  
                                CREATE TABLE
                                db1=# insert into b values (1);
                                INSERT 0 1
                                db1=# delete from b;
                                DELETE 1
                                db1=# vacuum verbose b;
                                psql: INFO: vacuuming "public.b"
                                psql: INFO: "b": removed 1 row versions in 1 pages
                                psql: INFO: "b": found 1 removable, 0 nonremovable row versions in 1 out of 1 pages
                                DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 1982671
                                There were 0 unused item identifiers.
                                Skipped 0 pages due to buffer pins, 0 frozen pages.
                                0 pages are entirely empty.
                                CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
                                psql: INFO: "b": truncated 1 to 0 pages
                                DETAIL: CPU: user: 0.09 s, system: 0.00 s, elapsed: 0.09 s
                                VACUUM

                                小结

                                1 全局catalog 膨胀点

                                哪些垃圾不能被回收?

                                1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
                                2、当前实例中最老事务快照之后产生的垃圾记录
                                3、SLOT catalog_xmin后产生的垃圾记录
                                4、备库开启了feedback后,备库中最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录

                                什么时候可能膨胀?

                                1、vacuum_defer_cleanup_age 设置太大
                                2、整个实例中的 : 长事务, 慢SQL, 慢2pc,
                                3、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),
                                4、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),

                                2 库级catalog 膨胀点

                                哪些垃圾不能被回收?

                                1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
                                2、当前数据库中最老事务快照之后产生的垃圾记录
                                3、备库开启了feedback后,备库返回的最老事务快照(包括catalog_xmin, global xmin)之后产生的垃圾记录
                                4、SLOT catalog_xmin后产生的垃圾记录(create table, drop table, pg_class, pg_att等)。影响全局(所有DB)

                                什么时候可能膨胀?

                                1、vacuum_defer_cleanup_age 设置太大
                                2、当前数据库中的 : 长事务, 慢SQL, 慢2pc,
                                3、standby 开启了 feedback (且standby有慢事务, LONG SQL, 慢/dead slot),
                                4、慢/dead slot(catalog_xmin, 影响catalog垃圾回收),

                                普通对象 膨胀点

                                用户创建的表、物化视图、索引等。

                                哪些垃圾不能被回收?

                                1、年龄小于vacuum_defer_cleanup_age设置的垃圾记录
                                2、当前数据库中最老事务快照之后产生的垃圾记录
                                3、备库开启了feedback后,备库返回的最老事务快照(仅指 global xmin)之后产生的垃圾记录。(catalog xmin无影响)

                                什么时候可能膨胀?

                                1、vacuum_defer_cleanup_age 设置太大
                                2、当前数据库中的 : 长事务, 慢SQL, 慢2pc,
                                3、standby 开启了 feedback (且standby有慢事务, LONG SQL),

                                WAL文件 膨胀点

                                wal是指PG的REDO文件。

                                哪些WAL不能被回收 或 不能被重复利用?

                                1、从最后一次已正常结束的检查点(检查点开始时刻, 不是结束时刻)开始,所有的REDO文件都不能被回收
                                2、归档开启后,所有未归档的REDO。(.ready对应的redo文件)
                                3、启用SLOT后,还没有被SLOT消费的REDO文件
                                4、设置wal_keep_segments时,当REDO文件数还没有达到wal_keep_segments个时。

                                什么时候可能膨胀?

                                1、archive failed ,归档失败
                                2、user defined archive BUG,用户开启了归档,但是没有正常的将.ready改成.done,使得WAL堆积
                                3、wal_keep_segments 设置太大,WAL保留过多
                                4、max_wal_size设置太大,并且checkpoint_completion_target设置太大,导致检查点跨度很大,保留WAL文件很多
                                5、slot slow(dead) ,包括(physical | logical replication) , restart_lsn 开始的所有WAL文件都要被保留

                                参考

                                  switch (HeapTupleSatisfiesVacuum(&tuple, OldestXmin, buf))  
                                  {
                                  case HEAPTUPLE_DEAD:

                                  *
                                  * Ordinarily, DEAD tuples would have been removed by
                                  * heap_page_prune(), but it's possible that the tuple
                                  * state changed since heap_page_prune() looked. In
                                  * particular an INSERT_IN_PROGRESS tuple could have
                                  * changed to DEAD if the inserter aborted. So this
                                  * cannot be considered an error condition.
                                  *
                                  * If the tuple is HOT-updated then it must only be
                                  * removed by a prune operation; so we keep it just as if
                                  * it were RECENTLY_DEAD. Also, if it's a heap-only
                                  * tuple, we choose to keep it, because it'll be a lot
                                  * cheaper to get rid of it in the next pruning pass than
                                  * to treat it like an indexed tuple. Finally, if index
                                  * cleanup is disabled, the second heap pass will not
                                  * execute, and the tuple will not get removed, so we must
                                  * treat it like any other dead tuple that we choose to
                                  * keep.
                                  *
                                  * If this were to happen for a tuple that actually needed
                                  * to be deleted, we'd be in trouble, because it'd
                                  * possibly leave a tuple below the relation's xmin
                                  * horizon alive. heap_prepare_freeze_tuple() is prepared
                                  * to detect that case and abort the transaction,
                                  * preventing corruption.
                                  */
                                  if (HeapTupleIsHotUpdated(&tuple) ||
                                  HeapTupleIsHeapOnly(&tuple) ||
                                  params->index_cleanup == VACOPT_TERNARY_DISABLED)
                                  nkeep += 1;
                                  else
                                  tupgone = true; * we can delete the tuple */
                                  all_visible = false;
                                  break;

                                  case HEAPTUPLE_RECENTLY_DEAD:

                                  *
                                  * If tuple is recently deleted then we must not remove it
                                  * from relation.
                                  */
                                  nkeep += 1;
                                  all_visible = false;
                                  break;
                                  src/backend/access/heap/heapam_visibility.c
                                    *       HeapTupleSatisfiesVacuum()  
                                    * visible to any running transaction, used by VACUUM
                                      /*  
                                      * HeapTupleSatisfiesVacuum
                                      *
                                      * Determine the status of tuples for VACUUM purposes. Here, what
                                      * we mainly want to know is if a tuple is potentially visible to *any*
                                      * running transaction. If so, it can't be removed yet by VACUUM.
                                      *
                                      * OldestXmin is a cutoff XID (obtained from GetOldestXmin()). Tuples
                                      * deleted by XIDs >= OldestXmin are deemed "recently dead"; they might
                                      * still be visible to some open transaction, so we can't remove them,
                                      * even if we see that the deleting transaction has committed.
                                      */
                                      HTSV_Result
                                      HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId OldestXmin,
                                      Buffer buffer)


                                      *
                                      * Deleter committed, but perhaps it was recent enough that some open
                                      * transactions could still see the tuple.
                                      */
                                      if (!TransactionIdPrecedes(HeapTupleHeaderGetRawXmax(tuple), OldestXmin))
                                      return HEAPTUPLE_RECENTLY_DEAD;


                                      else if (TransactionIdDidCommit(xmax))
                                      {
                                      *
                                      * The multixact might still be running due to lockers. If the
                                      * updater is below the xid horizon, we have to return DEAD
                                      * regardless -- otherwise we could end up with a tuple where the
                                      * updater has to be removed due to the horizon, but is not pruned
                                      * away. It's not a problem to prune that tuple, because any
                                      * remaining lockers will also be present in newer tuple versions.
                                      */
                                      if (!TransactionIdPrecedes(xmax, OldestXmin))
                                      return HEAPTUPLE_RECENTLY_DEAD;

                                      return HEAPTUPLE_DEAD;
                                      }
                                      https://www.postgresql.org/docs/12/protocol-replication.html
                                        Hot Standby feedback message (F)  
                                        Byte1('h')
                                        Identifies the message as a Hot Standby feedback message.

                                        Int64
                                        The client's system clock at the time of transmission, as microseconds since midnight on 2000-01-01.

                                        Int32
                                        The standby's current global xmin, excluding the catalog_xmin from any replication slots. If both this value and the following catalog_xmin are 0 this is treated as a notification that Hot Standby feedback will no longer be sent on this connection. Later non-zero messages may reinitiate the feedback mechanism.

                                        Int32
                                        The epoch of the global xmin xid on the standby.


                                        Int32
                                        The lowest catalog_xmin of any replication slots on the standby. Set to 0 if no catalog_xmin exists on the standby or if hot standby feedback is being disabled.


                                        Int32
                                        The epoch of the catalog_xmin xid on the standby.


                                        扫描钉钉群二维码,每周免费看直播


                                        文章转载自PostgreSQL中文社区,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                        评论