问题概述
近期遇到一个问题,数据库主机swap占用超50G,其中一个作为备节点的数据库进程占用36G
问题分析过程
内存使用情况,swap使用率80%
[postgres]$ free -g
total used free shared buff/cache available
Mem: 254 12 21 63 220 177
Swap: 63 50 13
查看占用swap较多的PID为数据库进程
[postgres]$ for i in $( cd /proc;ls |grep "^[0-9]"|awk ' $0 >100') ;do awk '/Swap:/{a=a+$2}END{print '"$i"',a/1024"M"}' /proc/$i/smaps 2>/dev/null ; done | sort -k2nr | head -10
2349619 38367.6M
2349618 38359.8M
2349620 38354.9M
2349616 38354.6M
2349622 38354.6M
1850974 38354.5M
3195297 38354.5M
3195791 38354.5M
1406798 9308.69M
1407289 9308.19M
[postgres]$ ps -ef|grep 2349616
postgres 1850974 2349616 0 Dec06 ? 00:00:00 postgres: user postgres ip(59370) idle
postgres 2349616 1 0 May26 ? 00:10:32 /app/pg_108_18801/postgresql/bin/postgres -D /data1/data_108_18801/data
postgres 2349617 2349616 0 May26 ? 00:01:11 postgres: logger
postgres 2349618 2349616 0 May26 ? 00:47:40 postgres: startup recovering 0000000100000064000000A2
postgres 2349619 2349616 0 May26 ? 02:35:21 postgres: checkpointer
postgres 2349620 2349616 0 May26 ? 00:01:21 postgres: background writer
postgres 2349621 2349616 0 May26 ? 00:12:14 postgres: stats collector
postgres 2349622 2349616 0 May26 ? 00:38:39 postgres: walreceiver streaming 64/A2041FD0
postgres 3195297 2349616 0 09:01 ? 00:00:00 postgres: postgres_exporter postgres ip(27830) idle
postgres 3195791 2349616 0 09:01 ? 00:00:00 postgres:user postgres ip(56748) idle
postgres 3198336 3160476 0 09:02 pts/0 00:00:00 grep --color=auto 2349616
查看smaps详细信息,找到swap多的段对应的文件是/dev/zero (deleted),这是pg共享内存使用 mmap 方式时,产生的映射。
参考:Linux 内存 PageCache 和数据库共享内存分析(中) - 墨天轮 (modb.pro)
[postgres]$ grep -C 20 39139008 /proc/2349616/smaps Locked: 0 kB THPeligible: 0 VmFlags: rd wr mr mw me ac 400030430000-401086de0000 rw-s 00000000 00:05 136256314 /dev/zero (deleted) Size: 68527808 kB KernelPageSize: 64 kB MMUPageSize: 64 kB Rss: 33856 kB Pss: 10012 kB Shared_Clean: 0 kB Shared_Dirty: 33856 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 33792 kB Anonymous: 0 kB LazyFree: 0 kB AnonHugePages: 0 kB ShmemPmdMapped: 0 kB Shared_Hugetlb: 0 kB Private_Hugetlb: 0 kB Swap: 39139008 kB SwapPss: 0 kB Locked: 0 kB THPeligible: 0 VmFlags: rd wr sh mr mw me ms
查看OSW日志
swap剩余空间逐渐减少
zzz ***Wed Dec 6 17:00:31 CST 2023 SwapFree: 30736320 kB zzz ***Wed Dec 6 17:01:31 CST 2023 SwapFree: 30101440 kB zzz ***Wed Dec 6 17:02:32 CST 2023 SwapFree: 29282240 kB zzz ***Wed Dec 6 17:03:32 CST 2023 SwapFree: 28573632 kB zzz ***Wed Dec 6 17:04:33 CST 2023 SwapFree: 28237760 kB ...... zzz ***Wed Dec 6 17:57:01 CST 2023 SwapFree: 12471104 kB zzz ***Wed Dec 6 17:58:02 CST 2023 SwapFree: 12471104 kB zzz ***Wed Dec 6 17:59:02 CST 2023 SwapFree: 12469440 kB
数据库进程内存RSS逐渐减少
zzz ***Wed Dec 6 17:00:31 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 17.1 68775488 45772288 ep_pol S May 26 02:34:41 postgres: checkpointer USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Wed Dec 6 17:01:31 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 16.9 68775488 45218752 ep_pol S May 26 02:34:41 postgres: checkpointer USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Wed Dec 6 17:02:32 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 16.7 68775488 44514304 ep_pol S May 26 02:34:41 postgres: checkpointer USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Wed Dec 6 17:03:32 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 16.4 68775488 43899200 ep_pol S May 26 02:34:41 postgres: checkpointer ...... zzz ***Wed Dec 6 17:57:01 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 10.6 68775488 28510848 ep_pol S May 26 02:34:43 postgres: checkpointer USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Wed Dec 6 17:58:02 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 10.6 68775488 28510848 ep_pol S May 26 02:34:43 postgres: checkpointer USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND zzz ***Wed Dec 6 17:59:02 CST 2023 USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND postgres 2349619 2349616 19 0.0 10.6 68775488 28509440 ep_pol S May 26 02:34:43 postgres: checkpointer USER PID PPID PRI %CPU %MEM VSZ RSS WCHAN S STARTED TIME COMMAND
根据osw日志记录可以看到内存是逐渐交换出去的,但当时buff/cache还有很多,且内存使用率无波动,暂时不清楚内存交换的原因。
该库为备节点,基本没有会话,所以猜测数据库占用的内存,在长时间不活跃后由于某种内存管理机制被交换出去。
于是尝试让不活跃的内存重新活跃起来,找到一个40G的表,查询并统计行数。
List of relations
Schema | Name | Type | Owner | Size | Description
--------+-------------+-------+-------+-------+-------------
iom | table_name | table | | 44 GB |
(1 row)
database=# select count(*) from table_name ;
count
-----------
104062707
(1 row)
SQL执行的过程中swap used就逐渐减小,结果swap used从50减小到了17,swap空间释放
[postgres]$ free -g
total used free shared buff/cache available
Mem: 254 15 1 95 237 141
Swap: 63 17 46
继续使用该方法查询其他表,swap逐渐释放
参考文档
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




