暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

闲聊一下生活琐碎

dba悠然 2024-11-19
75
ha





“采菊东篱下,

  悠然现南山”。

           晋·陶渊明

DBA悠然

专注数据库技术,开源信创、AI云计算.......

击关注,一路同行吧!

"作者: 悠然 | 首发公众号: dba悠然"

<温馨提示:以下内容仅代表个人观点>




最近有一段时间未更新公众号了,确实有点小忙(停更一段时间,“闭关了”);刚好遇到一个case,与大家一起分享下。






PART1:最近"修炼成果"

1.小楷


2.行书


欢迎对书法感觉兴趣的同学一起交流。




PART2:问题Case介绍

某客户PG环境Insert单表阻塞卡顿,整体负载较低,PG insert进程CPU使用爆满,重启问题依旧:


PART3:问题分析

(一)   环境(测试复现)
1.Insert进程足持续时间7小时~1days不等,状态”active”,其中某个进程等待事件为”buffer_content”:

2.等待”lock on a relation”


--查询锁blocking无阻塞(链、源),查询过程略


(二)   环境(测试复现)

  1. 表结构信息:

表包含一个主键及多个索引

2.数据库配置

--其余配置略

3.表统计信息缺失(未截图)


(三) 问题定位


1. 尝试停pg_backend_pid终止Insert进程,进程无法终止(重启后仍然问题依旧)


2.Gstack&gdb打印堆栈

    1)  gastack打印堆栈等待”pg_statement.so”,无参考意义(无截图)

    2)gdb调试出现”_bt”相关函数:_bt_relandgetbuf_bt_moveright()



3)_split等待未保存截图

PART4:问题处理

通过前面初步分析,可以初步定位问题聚焦在资源配置不足、autovacuum统计信息缺失以及Index索引问题


(一) 参数调优


//OS:8C32G

1.调节内存参数

Shared_buffers:       1GB->8GB

Effectibe_cache_size :4GB->16GB


2.调节超时限制

#设置idle事务超时10mins

idle_in_transaction_session_timeout=600000

#设置长连接超时2h

statement_timeout=7200000


3.停业务,重启pg


(二) Vacuumdb

考虑到资源配置严重不足,全局autovacuum无法完全完成,导致vacuum/analyze缺失,尝试vacuudb修复

1)Vacuumdb

-e     :回显vacuum过程

-f     :vacuum full

-j     :jobs并行度

-v     :verbose

-z     :收集统计信息


2)vacuumdb 锁定异常,长时间无法应


3)Vacuumdb死锁日志:

vacuumdb: error: vacuuming of database "sdgold_xx_1" failed: ERROR: deadlock detected DETAIL: Process 14887 waits for AccessShareLock on relation 1249 of database 592878; blocked by process 14888. Process 14888 waits for ShareLock on transaction 24162689; blocked by process 14887. HINT: See server log for query details.


4)尝试重建索引,再vacuumdb,问题解决:

reindex  table txxx  concurrently;

vacuumdb -d xx -e -f -j 2 -v -z ;


PART5:故障复盘分析

1.由于时间紧迫,问题分析(根据经验推断):

1)资源配置不足,shared_buffers/effective_cache_size过小,缓存写入机制index buffer不一致性(类似”坏块”, 等待buffer_content),导致索引插入分裂异常;

2)索引异常,插入数据索引分裂失败,”_bt_moveright”,Insert 进程hang(重启后insert仍然卡顿、无法terminate insert进程);

3)autovacuum/analzye触发却并未激活,资源配置不足的负面影响案例;


2.关于Autovacuum,有哪些启发:

2.1Autovacuum运行机制

/* autovacuum.c*/


Autovacuum Daemon主要包括2类进程:

-autovacuum launcher,由Postmaster进程启动,负责周期性(autovacuum_naptime)fork autovacuum worker进程完成vacuum

-autovacuum launcher维护autovacuum shared memory area全局数据库信息(need vacuum dbs/relations)              

-autovacuum worker,autovacuum实际实行进程,数量由参数autovacuum_max_works控制



1)autovacuum launcher& autovacuum workers

-autovacuum launcher无法直接启动/控制worker进程,由Postmaster进程完成(参考附录postmaster.c):

-autovacuum launcher连接到shared memory,维护全局数据字典信息(记录哪些db,relation需要vacuum)

-worker进程只负责单个局部db vacuum,autovacuum launcher启动容易出现异常


2)autovacuum launcher& workers &shared memory area

-autovacuum launcher在autovacuum shared memory area记录了需要vacuum的db/relation信息

-当满足autovacuum条件时,autovacuum launcher通知postmaster进程fork worker进程进行vacuum

-新fork workers连接到autovacuum shared memory area,读取autovacuum launcher维护信息后,到指定db进行vacuum

 -如果postmaster fork worker失败,则会在shared memory area记录flag;launcher周期性(scheduler)识别后,重新通知postmaster尝试fork woker

-当worker进程完成vacuum后,发送SIGUSR2到autovacuum launcher,autovacuum launcher根据vacuum_cost评估/负载均衡当前worker与正在运行的woker的负载,进行合理调度(延迟,削峰填谷等),参考附录rebalance()函数;


3)multi workers in one database

-当单个db需要vacuum负载较多时,该db可以同时并发运行多个worker进程均衡负载,vacuum

-并发workers冲突检测:wokers将自己当前vacuum对象写入shared memory area(hash 表),这样并行vacuum将不会出现互相阻塞

-每个workers读取pgstats统计的最后一次vacuum时间以及标记为vaccummed对象,从而跳过已经被vaccumed对象,避免重复vacuumed冲突

 -BUG:并发worker重复vacuum:多workers并发vacuum场景未获取relation lock导致pgstat统计信息更新延迟,worker重复vacuum并行worker以及vacuumed的对象,Is bug.

/*

 However, there is a small window (due to not yet

 * holding the relation lock) during which a worker may choose a table that was

 * already vacuumed; this is a bug in the current design.

*/


4)postmaster调用fork()失败原因(Autovacuum未触发原因)

 -暂时性:high load, memory pressure, too many processes, etc

 -持久性:数据库异常/数据量连接失败等



5)shared memory area

autovacuum launcher维护共享内存信息:

//涉及源码函数

relation_needs_vacanalyze()

 autovacuum_do_vac_analyze()

.....


7)Autovacuum参数


postgres=# select name,setting,unit from pg_settings where name ~ 'autovacuum';

                 name                                     |  setting   | unit

-----------------------------------------------+-----------+------

 autovacuum                                             | on           |    #开启autovacuum

 autovacuum_freeze_max_age                      | 200000000 |

 autovacuum_multixact_freeze_max_age        | 400000000 |

 autovacuum_max_workers                          | 3             |              #launcher最多并发fork 3个workers,上线受制于(max_worker_processes)

 autovacuum_work_mem                   | -1        | kB           #-1表示等于maintenance_work_mem

 autovacuum_naptime                                 | 60        | s            #demon休眠周期,默认值60s

 log_autovacuum_min_duration                    | 600000    | ms   #autovacuum日志记录阈值

.....


-autovacuum launcher进程周期性(autovacuum_naptime)启动autovacuum_worker

-默认周期60s(autovacuum_naptime)

-周期性启动3个workers(autovacuum_max_works)

-3个worker在不同db并发vacuum,如果db较多,则autovaccum workers忙不过来


2.2Autovacuum触发条件


1)当前txid_current到达阈冻结阈值(vacuum freeze,xid消耗过快)


   txid_current >= {relfrozenxid + autovacuum_freeze_max_age}

  

-txid_current                       #当前事务transaction xid

-relfrozenxid                       #pg_class.relfrozenxid(relation表)

-autovacuum_freeze_max_age          #autovacuum最大冻结年龄:默认20E(200000000)



2)死元组数量(n_dead_tup)到达阈值


    n_dead_tup > = {autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor *reltuples }


-n_dead_tup                          #死元组tuples估算总量, pg_stat_all_tables.n_dead_tup

-autovacuum_vacuum_threshold         #autovacuum死元组基础阈值,默认值50

-autovacuum_vacuum_scale_factor      #autovacuum死元组可变因子0.2

-reltuples                           #元组总量,pg_class.reltuples       


3)插入元组数量(n_tup_ins)到达阈值


    n_tup_ins >= {autovacuum_vacuum_insert_threshold + autovacuum_vacuum_insert_scale_factor*reltuples


-n_tup_ins                               #插入元组tuples估算总量, pg_stat_all_tables.n_tup_ins

-autovacuum_vacuum_insert_threshold      #autovacuum插入元组基础阈值,默认值1000

-autovacuum_vacuum_insert_scale_factor   #autovacuum插入元组可变因子,默认值0.2

-reltuples                               #元组总量,pg_class.reltuples



4)统计更新元组达到阈值(PG13+)


    n_mod_since_analyze >= {autovacuum_analyze_threshold + autovacuum_analyze_scale_factor*relputples}



-n_mod_since_analyze                     #最近一次Analyze更改(INSERT/DELETE/UPDATE)tuples,pg_stat_all_tables.n_mod_since_analyze

-autovacuum_analyze_threshold            #autovacuum统计信息基础阈值,50

-autovacuum_vacuum_insert_scale_factor   #autovacuum统计信息可变因子,默认值0.1

-reltuples                               #元组总量,pg_class.reltuples



//附录源码

src/backend/postmaster/launch_backend.c


child_process_kind child_process_kinds[] = {

        [B_INVALID] = {"invalid", NULL, false},


        [B_BACKEND] = {"backend", BackendMain, true},

        [B_AUTOVAC_LAUNCHER] = {"autovacuum launcher", AutoVacLauncherMain, true},

        [B_AUTOVAC_WORKER] = {"autovacuum worker", AutoVacWorkerMain, true},

        [B_BG_WORKER] = {"bgworker", BackgroundWorkerMain, true},


        *

         * WAL senders start their life as regular backend processes, and change

         * their type after authenticating the client for replication.  We list it

         * here for PostmasterChildName() but cannot launch them directly.

         */

        [B_WAL_SENDER] = {"wal sender", NULL, true},

        [B_SLOTSYNC_WORKER] = {"slot sync worker", ReplSlotSyncWorkerMain, true},


        [B_STANDALONE_BACKEND] = {"standalone backend", NULL, false},


        [B_ARCHIVER] = {"archiver", PgArchiverMain, true},

        [B_BG_WRITER] = {"bgwriter", BackgroundWriterMain, true},

        [B_CHECKPOINTER] = {"checkpointer", CheckpointerMain, true},

        [B_STARTUP] = {"startup", StartupProcessMain, true},

        [B_WAL_RECEIVER] = {"wal_receiver", WalReceiverMain, true},

        [B_WAL_SUMMARIZER] = {"wal_summarizer", WalSummarizerMain, true},

        [B_WAL_WRITER] = {"wal_writer", WalWriterMain, true},


        [B_LOGGER] = {"syslogger", SysLoggerMain, false},

};


*/



//autovacuum launcher Fork启动AutoVacLauncherMain进程

//AutoVacWorkerMain()函数

AutoVacWorkerMain()


         pgstat_report_autovac(dbid);


         *

          * Connect to the selected database, specifying no particular user

          *

          * Note: if we have selected a just-deleted database (due to using

          * stale stats info), we'll fail and exit here.

          */

         InitPostgres(NULL, dbid, NULL, InvalidOid, 0, dbname);

         SetProcessingMode(NormalProcessing);

         set_ps_display(dbname);

         ereport(DEBUG1,

                 (errmsg_internal("autovacuum: processing database \"%s\"", dbname)));


         if (PostAuthDelay)

             pg_usleep(PostAuthDelay * 1000000L);


         * And do an appropriate amount of work */

         recentXid = ReadNextTransactionId();

         recentMulti = ReadNextMultiXactId();

        *启动autovacuum*/

       //do_autovacuum

         do_autovacuum();

     }


     *

      * The launcher will be notified of my death in ProcKill, *if* we managed

      * to get a worker slot at all

      */


     * All done, go away */

     proc_exit(0);

 }


//workers rebanlance 负载

autovac_recalculate_workers_for_balance

                   *

                    * a worker finished, or postmaster signaled failure to start a worker

                    */

                   if (got_SIGUSR2)

                   {

                            got_SIGUSR2 = false;


                            * rebalance cost limits, if needed */

                            if (AutoVacuumShmem->av_signal[AutoVacRebalance])

                            {

                                     LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);

                                     AutoVacuumShmem->av_signal[AutoVacRebalance] = false;

                                     autovac_recalculate_workers_for_balance();

                                     LWLockRelease(AutovacuumLock);

                            }


                            if (AutoVacuumShmem->av_signal[AutoVacForkFailed])

                            {

                                     *

                                      * If the postmaster failed to start a new worker, we sleep

                                      * for a little while and resend the signal.  The new worker's

                                      * state is still in memory, so this is sufficient.  After

                                      * that, we restart the main loop.

                                      *

                                      * XXX should we put a limit to the number of times we retry?

                                      * I don't think it makes much sense, because a future start

                                      * of a worker will continue to fail in the same way.

                                      */

                                     AutoVacuumShmem->av_signal[AutoVacForkFailed] = false;

                                     pg_usleep(1000000L);   * 1s */

                                   SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_WORKER);

                                     continue;

                            }

                   }



PART6:结束语

如果觉得本文有所帮助或者启发,欢迎添加好友交流收藏,2024年我们一路同行!!!。

select 1 from dual;


END


扫码关注,元气满满!
数据库|开源信创|架构重构|DevOPS|云计算|AI



精彩回顾




关于直方图探究

浅析Cursor Pins
Oracle等待事件解析
......






文章转载自dba悠然,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论