当系统发生coredump时,通常需要通过分析core文件来定位问题所在。但实际工作中,却经常发现core 文件没有生成,或者找不到,或者core文件被系统自动删除或截断了。core 文件的生成方式、保存位置等都可以通过系统配置来实现。
一、两种coredump方式
方式一:通过systemd-coredump 生成coredump
1. 查看 systemd-coredump@0.service 服务启动情况:
[root@dbhost01 ~]# systemctl status systemd-coredump@0.service
● systemd-coredump@0.service - Process Core Dump
Loaded: loaded (/usr/lib/systemd/system/systemd-coredump@.service; static; vendor preset: disabled)
Active: inactive (dead) since Thu 2024-01-11 15:22:30 CST; 3min 41s ago
Docs: man:systemd-coredump(8)
Process: 4749 ExecStart=/usr/lib/systemd/systemd-coredump (code=exited, status=1/FAILURE)
Main PID: 4749 (code=exited, status=1/FAILURE)2. 修改/etc/systemd/coredump.conf
coredump.conf 文件显示的都是注释掉默认值。
[Coredump]
#Storage=external
#Compress=yes
#ProcessSizeMax=2G
#ExternalSizeMax=2G
#JournalSizeMax=767M
#MaxUse=
#KeepFree=ExternalSizeMax=2G 表示dump 文件最大2G。修改后需要执行: systemctl daemon-reload
对于通过 coredump 方式生成的core 文件,可以通过 coredumpctl 进行查看和转储。
方式二:通过abrtd服务
1. 后台abrtd服务生成进程的core文件。其配置文件分别是/etc/abrt/和/proc/sys/kernel/core_pattern,但是此服务不受ulimit的控制。并且abrtd是通过调用在core_pattern中配置的abrt-hook-ccpp完成core文件的生成。
[root@dbhost01 systemd]# systemctl status abrtd
● abrtd.service - ABRT Automated Bug Reporting Tool
Loaded: loaded (/usr/lib/systemd/system/abrtd.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2024-01-11 15:22:19 CST; 1h 17min ago
Process: 1175 ExecStart=/usr/sbin/abrtd -d -s (code=exited, status=0/SUCCESS)
Main PID: 1175 (code=exited, status=0/SUCCESS)
Jan 11 15:02:45 dbhost01 systemd[1]: Starting ABRT Automated Bug Reporting Tool...
Jan 11 15:02:45 dbhost01 systemd[1]: Started ABRT Automated Bug Reporting Tool.
Jan 11 15:22:19 dbhost01 systemd[1]: Stopping ABRT Automated Bug Reporting Tool...
Jan 11 15:22:19 dbhost01 systemd[1]: abrtd.service: Succeeded.
Jan 11 15:22:19 dbhost01 systemd[1]: Stopped ABRT Automated Bug Reporting Tool.2. abrt-ccpp hook 如果要启用这个机制,需要将abrtd服务停掉即可。注意这种方式受控与ulimit,所以确保ulimit -c为unlimited。
[root@dbhost01 systemd]# systemctl status abrt-ccpp
● abrt-ccpp.service - Install ABRT coredump hook
Loaded: loaded (/usr/lib/systemd/system/abrt-ccpp.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Thu 2024-01-11 15:22:19 CST; 1h 17min ago
Process: 4738 ExecStop=/usr/sbin/abrt-install-ccpp-hook uninstall (code=exited, status=0/SUCCESS)
Process: 1265 ExecStart=/usr/sbin/abrt-install-ccpp-hook install (code=exited, status=0/SUCCESS)
Main PID: 1265 (code=exited, status=0/SUCCESS)
Jan 11 15:02:45 dbhost01 systemd[1]: Starting Install ABRT coredump hook...
Jan 11 15:02:46 dbhost01 systemd[1]: Started Install ABRT coredump hook.
Jan 11 15:22:19 dbhost01 systemd[1]: Stopping Install ABRT coredump hook...
Jan 11 15:22:19 dbhost01 systemd[1]: abrt-ccpp.service: Succeeded.
Jan 11 15:22:19 dbhost01 systemd[1]: Stopped Install ABRT coredump hook.二、Redhat7 core文件
Redhat7 使用abrtd (automatically bug report daemon )服务,core 文件会在/var/spool/abrt/ccpp* 目录下。生成的core 路径及名称格式可以通过如下参数配置
kernel.core_pattern =/tmp/core_%e_%p
kernel.core_uses_pid = 0这里是改为生成目录在/tmp/,%e代表程序名称,%p是进程ID 。core_uses_pid 表示是否包含pid。 如果想直接生成在可执行文件相同目录,前面不要加任何目录,直接
kernel.core_pattern =core_%e_%p
系统默认配置如下:
[root@dbhost03 systemd]# sysctl -n kernel.core_pattern |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h
abrtd默认会把core文件存放在/var/spool/abrt目录。这是由配置/etc/abrt/abrt.conf决定core文件的存放目录.
默认非root用户生成core会被系统自动删除,在messages 日志文件中通常有如下信息:
Jul 24 15:50:22 dbhost03 abrt-hook-ccpp: Process 5070 (kingbase) of user 1001 killed by SIGSEGV - dumping core Jul 24 15:50:25 dbhost03 abrt-server: Executable '/opt/Kingbase/ES/V8/Server/bin/kingbase' doesn't belong to any package and ProcessUnpackaged is set to 'no' Jul 24 15:50:25 dbhost03 abrt-server: 'post-create' on '/var/spool/abrt/ccpp-2021-07-24-15:50:22-5070' exited with 1 Jul 24 15:50:25 dbhost03 abrt-server: Deleting problem directory '/var/spool/abrt/ccpp-2021-07-24-15:50:22-5070'
要保留core 不被删除,需要修改 /etc/abrt/abrt-action-save-package-data.conf 文件,并重启abrtd服务。
OpenGPGCheck = no ProcessUnpackaged = yes
ProcessUnpackaged = yes , 我们自己写的可执行程序,一般就是拷过去的,不从属于任何的package(rpm),abrt不生效。为了保证core文件大小,同时还要修改 /etc/abrt/abrt.conf
# Max size for crash storage [MiB] or 0 for unlimited # MaxCrashReportsSize = 1000
三、core 文件被截断原因
1、limit 设置太小
关注两个标红的。这两个值设置过小,可能导致文件被截断。
[kingbase@dbhost03 tns]$ ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 18501 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 4096 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
2、coredump.conf 配置文件问题
如果使用systemd-coredump@0.service 生成core 的话,需要注意 coredump.conf 文件中以下参数的配置
ExternalSizeMax=2G四、如何简单分析core文件
1、确认core文件是哪个程序生成的

2、使用gdb分析core文件





