暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
PostgreSQL DBA one day.pdf
763
1页
17次
2022-07-07
5墨值下载
数据库状态检查
不要⽤kill -9杀进程!会导致数据库重启
参数检查
postgresql.conf (以最后⼀次出现的参数为准)
postgresql.auto.conf (会覆盖前者的配置)
include_dir,管理多个相关但不完全相同配置的服务器
include,典型场景是服务器使⽤了CG资源隔离
进程状态检查
1、数据库进程检查,是否存在D进程
ps -aux | grep postgres
2、连接数量检查,连接数过多会影响
性能,还存在relcache消耗内存
select count(*),state from
pg_stat_activity group by 2
1
2
3
归档检查
1、归档失败会导致WAL膨胀
select * from pg_stat_archiver;
2、紧急状态可以将archive_command修改为任意可执⾏
成功的命令,⽐如 ls、date等
3、清理归档⽬录find $ARCHIVEDIR -type f -mtime +3
-a -name '*backup' -a ! -newer $CHKPOINT -delete
4
复制状态检查
1、复制延迟状态
select * from pg_stat_replication
2、复制冲突状态,常⻅的是snapshot
select * from pg_stat_database_conflicts
① 失效的复制槽,active = false
② logical decoding下的⻓事务
③ wal_keep_segments过⼤
④ 过⼩的archive_timeout
⑤ 归档失败,对应⼤量的ready⽂件
⑥ 单进程归档,归档速度跟不上 (NFS)
WAL
堆积
葵花
宝典
公众号《PostgreSQL学徒》搜索
流复制冲突类型:处理PostgreSQL中的流复制冲突
流复制冲突案例:你真的搞懂recovery conflict了吗?
流复制总览⼤全:关于流复制的⽅⽅⾯⾯
5
⽇志检查
1、pgbadger、pglogCSV⽇志分析⼯具
2、pgloggingfilter:根据⽇志等级, 错误代码过滤⽇志打印. ⽤
处挺多, 例如你不想打印⼀些业务代码的错误, ⽐如pk冲突, 约束
冲突等.
3、pg_backtrace:打印错误SQL调⽤栈内容
4、pgreplay-go、pgreplay:回放⽇志
① 使⽤file_fdw将⽇志导⼊到数据库中
② 使⽤shell分析⽇志,egrep ‘^2022-07-03’
postgresql-07-03.csv | grep -i authentication |
awk ‘{print $1 " " $2}’ |awk -F: ‘{print $1 }’ | sort
| uniq -c
开启“审计⽇志、时
间记录”带来的性能
影响有多少?
可能使⽤pg_terminate_backend
或pg_cancel_backend也杀不掉
进程? 类似于Linux中的D进程,
可以使⽤perf、strace看看卡在
那⾥
6
实例运⾏状态检查
1、数据库年龄检查,事务ID总共2^32次⽅,还要折半
2、检查点状态检查,bgwriter和checkpoint是刷脏块的主⼒,应该尽量避免
backend process主动刷脏⽽引起的等待
3、缓冲区命中率,基于clock sweep扫描算法,要注意双缓存的影响
4、⻓事务检查,会引起表膨胀、年龄回收失败、索引失效等
5、2PC检查,会引起表膨胀、年龄回收失败,危害参考⻓事务
统计剩余年龄:SELECT datname,age(datfrozenxid) AS
frozen_xid_age,ROUND(100 * (age(datfrozenxid) / 2000000000::float))
consumed_txid_pct,2 * 1024 ^ 3 - 1 - age(datfrozenxid) AS
remaining_txid,current_setting('autovacuum_freeze_max_age')::int -
age(datfrozenxid) AS remaining_aggressive_vacuum FROM pg_database;
checkpoint平均时间间隔:SELECT total_checkpoints, seconds_since_start /
total_checkpoints / 60 AS minutes_between_checkpoints FROM (SELECT
EXTRACT(EPOCH FROM (now() - pg_postmaster_start_time())) AS
seconds_since_start,(checkpoints_timed+checkpoints_req) AS
total_checkpoints FROM pg_stat_bgwriter ) AS sub;
查看数据库缓冲区命中率:s e l e c t r o u n d ( s u m ( b l k s _ h i t ) * 1 0 0 /
sum(blks_hit+blks_read),2)::numeric from pg_stat_database where datname
= current_database();
预热⼯具:pgfincore、pg_prewarm (posix_fadvise)
2PC检查:select count(*) from pg_prepared_xacts ;
查询⼤于5分钟的⻓事务:select query,state from pg_stat_activity where
state<>'idle' and (backend_xid is not null or backend_xmin is not null) and
now()-xact_start > interval '5 min' order by xact_start;
数据库对象检查
7
数据库对象检查
1、表膨胀检查,MVCC实现机制会导致表膨胀
2、索引膨胀,vacuum疲于奔命
3、索引检查,重复索引?⽆效索引?低效索引?
4、序列检查,剩余多少序列?序列的各种坑?
5、⼤对象检查,未引⽤的⼤对象
6、分区表检查,⽬前还不能⾃动创建分区
表膨胀率:select relname,coalesce(round(n_dead_tup * 100 / (case
when n_live_tup + n_dead_tup = 0 then null else n_live_tup +
n_dead_tup end ),2),0.00) as dead_tup_ratio from pg_stat_all_tables
where 1=1 and n_dead_tup >=10000 order by dead_tup_ratio desc
limit 5
索引膨胀:select leaf_fragmentation from pgstatindex('idx_name');
双缓存的存在,会使缓冲区的统计不那么准确,数据块有可能在page cache中就获取到
哪些会阻⽌年龄回收?
⻓事务
复制槽
预备事务
索引维护? https://wiki.postgresql.org/wiki/Index_Maintenance
select
now(),
query_start as started_at,
now() - query_start as query_duration,
format('[%s] %s', a.pid, a.query) as pid_and_query,
index_relid::regclass as index_name,
relid::regclass as table_name,
(pg_size_pretty(pg_relation_size(relid))) as table_size,
phase,
nullif(wait_event_type, '') || ': ' || wait_event as wait_type_and_event,
current_locker_pid,
(select nullif(left(query, 150), '') || '...' from pg_stat_activity a where a.pid =
current_locker_pid) as current_locker_query,
format(
'%s (%s of %s)',
coalesce((round(100 * lockers_done::numeric / nullif(lockers_total, 0), 2))::text || '%', 'N/A'),
coalesce(lockers_done::text, '?'),
coalesce(lockers_total::text, '?')
) as lockers_progress,
format(
'%s (%s of %s)',
coalesce((round(100 * blocks_done::numeric / nullif(blocks_total, 0), 2))::text || '%', 'N/A'),
coalesce(blocks_done::text, '?'),
coalesce(blocks_total::text, '?')
) as blocks_progress,
format(
'%s (%s of %s)',
coalesce((round(100 * tuples_done::numeric / nullif(tuples_total, 0), 2))::text || '%', 'N/A'),
coalesce(tuples_done::text, '?'),
coalesce(tuples_total::text, '?')
) as tuples_progress,
format(
'%s (%s of %s)',
coalesce((round(100 * partitions_done::numeric / nullif(partitions_total, 0), 2))::text || '%', 'N/
A'),
coalesce(partitions_done::text, '?'),
coalesce(partitions_total::text, '?')
) as partitions_progress
from pg_stat_progress_create_index p
left join pg_stat_activity a on a.pid = p.pid
; -- in psql, use "\watch 5" instead of semicolon to run in loop
索引创了半年,如何评估进度
log_statement、log_duration
如果并发⾮常⾼,⼤量写⽇志可能引
mutex lock冲突,导致性能骤降
8
pg_probackup不⽀持4GB以上的segment
pg_backrest在备库备份依赖 ssh,需要免密ssh
9
慢SQL分析
慢SQL分析
1、pg_stat_statements
2、pg_stat_kcache:可以获取单条SQL消耗了多少
CPU和内存
3、pg_stat_monitor:前两者的结合
4、pgpro_stats:AWR
1 最耗IO SQL、单次调⽤最耗IO SQL TOP 5select userid::regrole, dbid, query from
pg_stat_statements order by (blk_read_time+blk_write_time)/calls desc limit 5;
2 总最耗IO SQL TOP 5select userid::regrole, dbid, query from pg_stat_statements
order by (blk_read_time+blk_write_time) desc limit 5;
3 最耗时 SQL,单次调⽤最耗时 SQL TOP 5select userid::regrole, dbid, query from
pg_stat_statements order by mean_exec_time desc limit 5;
4 总最耗时 SQL TOP 5select userid::regrole, dbid, query from pg_stat_statements
order by total_time desc limit 5;
5 响应时间抖动最严重 SQLselect userid::regrole, dbid, query from
pg_stat_statements order by stddev_exec_time desc limit 5;
6 最耗共享内存 SQLselect userid::regrole, dbid, query from pg_stat_statements order
by (shared_blks_hit+shared_blks_dirtied) desc limit 5;
7 最耗临时空间 SQLselect userid::regrole, dbid, query from pg_stat_statements order
by temp_blks_written desc limit 5;
https://
www.timescale.co
m/blog/point-in-
time/
Your own AWR
10
慢SQL优化
减少数据访问
所有归档失败的WAL都会保留,直到撑爆磁盘
PEV
https://explain.depesz.com/
https://explain.dalibo.com/plan#
https://tatiyants.com/pev/#/plans/new
11
使⽤空间维护
使⽤空间分析
1、⽂件系统剩余空间
2、表空间剩余⼤⼩,df -h
3、Top 10 size object
select schemaname as table_schema,
relname as table_name,
pg_size_pretty(pg_total_relation_size(relid)) as
total_size,
pg_size_pretty(pg_relation_size(relid)) as
data_size,
pg_size_pretty(pg_total_relation_size(relid) -
pg_relation_size(relid))
as external_size
from pg_catalog.pg_statio_user_tables
order by pg_total_relation_size(relid) desc,
pg_relation_size(relid) desc limit 10;
通过使⽤表空间,我们可以控制磁盘的布局。例如冷热分
离,表空间位于pg_tblspc⽬录下,本质只是⼀个符号链接
可以使⽤ pg_basebackup + 归档实现PITR
归档清理最佳实践:https://github.com/Napsty/
scripts/blob/master/pgsql/walarchivecleanup.sh
最佳实
⻛险分析
数据库安全或⻛险分析
1、权限检测
2、密码复杂度检测
3、⽤户密码到期时间
4、SQL注⼊⻛险分析
5、锁等待
6、UDF稳定性检测
7、视图攻击,security_barrier
8、密码泄露检测
echo " 检查 ~/.psql_history : "
grep -i "password" ~/.psql_history|grep -i -E "role|group|user"
echo ""
echo " 检查 *.csv : "
cat *.csv | grep -E "^[0-9]" | grep -i -r -E "role|group|user" |grep -i
"password"|grep -i -E "create|alter"
echo ""
echo " 检查 $PGDATA/recovery.* : "
grep -i "password" ../recovery.*
echo ""
echo " 检查 pg_stat_statements : "
psql --pset=pager=off -c 'select query from pg_stat_statements where (query
~* $$group$$ or query ~* $$user$$ or query ~* $$role$$) and query ~* $
$password$$'
echo " 检查 pg_authid : "
psql --pset=pager=off -q -c 'select * from pg_authid where rolpassword !~ $
$^md5$$ or length(rolpassword)<>35'
SQL注⼊检测:cat *.csv | grep -E "^[0-9]" | grep exec_simple_query |awk -F ","
'{print $2" "$3" "$5" "$NF}'|sed 's/\:[0-9]*//g'|sort|uniq -c|sort -n -r
表空间在维护的时候需要注意映射关系,⽐
如pg_basebackup、pg_probackup等
Fast path⽤以减少那些需要经常获取和释放但⼜很少出现冲突的锁类
型的获取/释放负载,因为⼤多数操作都是DMLDDLDQL的频率很
低,⼤部分时间使⽤弱锁,避免频繁访问pg_locks,提升性能!
12
操作系统分析
13
1. pg_top,PostgreSQL中的top⼯具
2. top(经典的Linux任务管理⼯具)
3. vmstat(展现给定时间间隔的服务器的状态值,包括服务器的CPU使⽤率,内存使⽤)
4. pidstat(进程实时监控)
5. sar (性能监控和瓶颈检查)
6. dstat(dstat 是⼀个可以取代vmstat,iostat,netstat和ifstat这些命令的多功能产品)
7. iostat (IO实时监控)
8. iftop(实时流量监控⼯具)
9. ss(获取socket统计信息,它可以显示和netstat类似的内容)
10. tcpdump(抓包⼯具)
11. tcpflow(分析⽹络流量)
12. nload(⽤于查看linux⽹络流量状况,实时输出)
13. htop(类似top,⽐top更加⼈性化)
14. glances(类似top,基于 Python 的系统遥测监控⼯具)
15. strace(常⽤来跟踪进程执⾏时的系统调⽤和所接收的信号)
16. dtrace(动态跟踪)
17. valgrind(内存泄漏检测)
18. dmesg(内核信息)
19. slabtop(内核slab缓冲区)
14
定时任务检测
定时任务
1、crontab
2、pg_timetable
3、pg_cron
4、pg_agent
5、pg_task
6、pg_dbms_job
Tips
pg_dbms_job extension to schedules and
manages jobs in a job queue similar to
Oracle DBMS_JOB package.
权限体系就好⽐我们去⻄天取经,需要通过层层关卡,⼀层⼀
层通过,才可访问到数据。另外某个⽤户的权限是⼀个集合
Author
xiongcc@PostgreSQL
学徒
PostgreSQL是进程模型,连接数越
多,性能会线性下降(在14做了部分优
化),不建议将max_connections设置
过⼤,包括应⽤端的连接池配置
为什么备库的某些参数必须
⽐主库⼤?
⽐如max_connections、
max_locks_per_transaction
当可⽤事务ID只有100万时,数据库会变为只读,并拒绝开启任何新的事务
pg_repack、pgsqueeze、pgcompacttable
备份检查
PostgreSQL中的schema类似MySQL中的database,跨库访问需要dblink或者fdw
Oracle中的schema和user绑定,新建⽤户⾃动创建(虽然也有create schema的语法)
start
下班
SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
now() - blocked_activity.query_start AS blocked_duration,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
now() - blocking_activity.query_start AS blocking_duration,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM
pg_catalog.pg_locks AS blocked_locks
JOIN pg_catalog.pg_stat_activity AS blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks AS blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity AS blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE
NOT blocked_locks.granted;
为什么
shared_buffers推荐
设置为内存的25% ~
40%?
⼩⼼backup_label⽂件的存在 ( 15移除了)
Tips
可以提前创建⼀个占位⽂件,紧急时
刻删除释放空间,Maybe 可以救命!
!
锁阻塞检查
1 批量处理
2 内存操作:Hash/Sort/Aggregate等等
3 更多地在客户端处理运算
4 避免select *SQL解析所依赖的元数据都是存放在数据字典(系统表)中
的,减少字段可以减少CPU的解析
5 绑定变量,使⽤prepare statement。每条SQL都会经过parse -> analyze ->
rewrite -> planner -> excutor这五个步骤,通过重⽤parseanalyze这⼏个
阶段,可以减少CPU的解析
减少CPU和内存开销
减少⽹络传输
1、服务端游标 vs 客户端游标
2、减少⽹络传输
3、减少磁盘访问量
4、更多使⽤index only scan
create index concurrently 期间要格外⼩⼼⻓事务
create index concurrently可能留下⽆效索引
⽬前分区表⽗表还不⽀持create index concurrently
wechat:_xiongcc
mail:xiongcc_1994@126.com
公众号:PostgreSQL学徒
Tips
WITH RECURSIVE t_wait AS (
SELECT
a.locktype,
a.database,
a.relation,
a.page,
a.tuple,
a.classid,
a.objid,
a.objsubid,
a.pid,
a.virtualtransaction,
a.virtualxid,
a.transactionid
FROM
pg_locks a
WHERE
NOT a.granted
),
t_run AS (
SELECT
a.mode,
a.locktype,
a.database,
a.relation,
a.page,
a.tuple,
a.classid,
a.objid,
a.objsubid,
a.pid,
a.virtualtransaction,
a.virtualxid,
a.transactionid,
b.query,
b.xact_start,
b.query_start,
b.usename,
b.datname
FROM
pg_locks a,
pg_stat_activity b
WHERE
a.pid = b.pid
AND a.granted
),
w AS (
SELECT
r.pid r_pid,
w.pid w_pid
FROM
t_wait w,
t_run r
WHERE
r.locktype IS NOT DISTINCT FROM w.locktype
AND r.database IS NOT DISTINCT FROM w.database
AND r.relation IS NOT DISTINCT FROM w.relation
AND r.page IS NOT DISTINCT FROM w.page
AND r.tuple IS NOT DISTINCT FROM w.tuple
AND r.classid IS NOT DISTINCT FROM w.classid
AND r.objid IS NOT DISTINCT FROM w.objid
AND r.objsubid IS NOT DISTINCT FROM w.objsubid
AND r.transactionid IS NOT DISTINCT FROM w.transactionid
AND r.virtualxid IS NOT DISTINCT FROM w.virtualxid
),
c (
waiter, holder, root_holder, path, deep
) AS (
SELECT
w_pid,
r_pid,
r_pid,
w_pid || '->' || r_pid,
1
FROM
w
UNION
SELECT
w_pid,
r_pid,
c.holder,
w_pid || '->' || c.path,
c.deep + 1
FROM
w t,
c
WHERE
t.r_pid = c.waiter
)
SELECT
t1.waiter,
t1.holder,
t1.root_holder,
path,
t1.deep
FROM
c t1
WHERE
NOT EXISTS (
SELECT
1
FROM
c t2
WHERE
t2.path ~ t1.path
AND t1.path <> t2.path)
ORDER BY
root_holder;
锁队列检查
of 1
5墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜