
收到服务器磁盘空间告警,mq服务器磁盘占用率96%
登录服务器,通过df -Hl查看
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.7G 0 7.7G 0% dev
tmpfs 7.7G 303M 7.4G 4% dev/shm
tmpfs 7.7G 504K 7.7G 1% run
tmpfs 7.7G 0 7.7G 0% sys/fs/cgroup
/dev/vda1 40G 39G 1.8G 96%
/dev/vdb1 196G 123G 64G 66% data
tmpfs 1.6G 0 1.6G 0% run/user/1000
和告警信息一致,接着我们就是要找到导致磁盘空间满的目录或文件
在根目录下,通过du -shc命令,列出各目录所占空间大小
0 bin
212M boot
127G data
307M dev
23M etc
26M home
0 lib
0 lib64
0 media
0 mnt
51M opt
du: cannot access 'proc/506813/task/506813/fd/4': No such file or directory
du: cannot access 'proc/506813/task/506813/fdinfo/4': No such file or directory
du: cannot access 'proc/506813/fd/4': No such file or directory
du: cannot access 'proc/506813/fdinfo/4': No such file or directory
0 proc
5.1M root
500K run
0 sbin
0 srv
0 sys
204K tmp
3.7G usr
555M var
132G total
之后再用同样的方法继续到对应目录下去找
再相对高效一点的方法是通过du的-d参数,或--max-depth,设置查询的目录深度,目录深度增加,所查询的目录,展示出来会很多,这个时候可以通过grep进行过滤
du -h -d 2|grep [GT] |sort -nr
du -h --max-depth=2|grep [GT] |sort -nr
通过这样的方式,可以搜出以G或者T为单位的占用磁盘空间的大目录,并排序
或者可以通过find来查询
find / -type f -size +1G -exec du -h {} \;
通过find或du查半天,发现所有加起来的占用空间,和df看到的磁盘空间占用,相差很大
通过df查看,磁盘使用39G,但是在根目录下通过du -hs 查看,总共加起来差不多8G,没有隐藏目录,那空间被谁吃了?
很明显,有空间被已删除文件占用,文件删除了,但是资源没释放
有一个很好用的命令:lsof,我们可以通过以下命令去查看
lsof +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
auditd 758 root 4r REG 253,1 6406312 0 51159825 /var/lib/sss/mc/group (deleted)
lsmd 786 libstoragemgmt 3r REG 253,1 8406312 0 51209910 /var/lib/sss/mc/passwd (deleted)
polkitd 791 polkitd 3r REG 253,1 8406312 0 51209910 /var/lib/sss/mc/passwd (deleted)
polkitd 791 polkitd 4r REG 253,1 10406312 0 51159823 /var/lib/sss/mc/initgroups (deleted)
dbus-daem 796 dbus 6r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
dbus-daem 796 dbus 20r REG 253,1 10406312 0 51159630 /var/lib/sss/mc/initgroups (deleted)
sssd 803 root 6r REG 253,1 8406312 0 51209910 /var/lib/sss/mc/passwd (deleted)
sssd 803 root 16r REG 253,1 10406312 0 51159823 /var/lib/sss/mc/initgroups (deleted)
chronyd 813 chrony 5r REG 253,1 8406312 0 51209910 /var/lib/sss/mc/passwd (deleted)
sssd_be 902 root 18r REG 253,1 8406312 0 51209910 /var/lib/sss/mc/passwd (deleted)
sssd_be 902 root 19r REG 253,1 10406312 0 51159823 /var/lib/sss/mc/initgroups (deleted)
sssd_nss 917 root 17r REG 253,1 8406312 0 51209910 /var/lib/sss/mc/passwd (deleted)
NetworkMa 985 root 13r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
tuned 988 root 3r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
systemd-r 1055 systemd-resolve 4r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
sshd 1099 root 3r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
atd 1105 root 3r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
atd 1105 root 5r REG 253,1 6406312 0 51159821 /var/lib/sss/mc/group (deleted)
agetty 1117 root 3r REG 253,1 6406312 0 51159821 /var/lib/sss/mc/group (deleted)
agetty 1118 root 3r REG 253,1 6406312 0 51159821 /var/lib/sss/mc/group (deleted)
(sd-pam) 3005 bjyw 3r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
(sd-pam) 3005 bjyw 4r REG 253,1 10406312 0 51159630 /var/lib/sss/mc/initgroups (deleted)
java 32263 root 4r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
mqcon 32657 root 3r REG 253,1 8406312 0 51159820 /var/lib/sss/mc/passwd (deleted)
bash 1404205 root cwd DIR 253,17 0 0 11797301 /data/ftp/xhzw/upload/all0606 (deleted)
java 3644606 root 40w REG 253,1 23739184311 0 67112132 /usr/local/rocketmq4.5.1/logs/rocketmqlogs/store.log (deleted)
java 3644606 root 42w REG 253,1 11493353151 0 67497735 /usr/local/rocketmq4.5.1/logs/rocketmqlogs/storeerror.log (deleted)
java 3644606 root 43w REG 253,1 104857643 0 67109251 /usr/local/rocketmq4.5.1/logs/rocketmqlogs/transaction.log (deleted)
java 3644606 root 46w REG 253,1 104870086 0 67182651 /usr/local/rocketmq4.5.1/logs/rocketmqlogs/stats.log (deleted)
java 3654826 root 37w REG 253,1 822118061 0 72548665 /usr/local/rocketmq4.5.1/logs/rocketmqlogs/broker.log (deleted)
java 3654826 root 39w REG 253,1 15208787 0 67112129 /usr/local/rocketmq4.5.1/logs/rocketmqlogs/watermark.log (deleted)
aliyun-se 4148112 root 7uW REG 253,1 0 0 8580 /tmp/AliyunAssistClientSingleLock.lock (deleted)
从结果可以看出,有一个23G左右的大日志文件,删除了,但是空间没释放,这是很常见的一种情况
对应的解决方法就是,重启mq,释放空间
重启mq后查看
df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.7G 0 7.7G 0% /dev
tmpfs 7.7G 307M 7.4G 4% /dev/shm
tmpfs 7.7G 500K 7.7G 1% /run
tmpfs 7.7G 0 7.7G 0% /sys/fs/cgroup
/dev/vda1 40G 5.9G 35G 15% /
/dev/vdb1 196G 128G 59G 69% /data
tmpfs 1.6G 0 1.6G 0% /run/user/1000
可以看到空间已经恢复




