
kernel-debuginfo
kernel-debuginfo-common
版本信息
| 系统版本 | CentOS Linux release 7.4.1708 |
| 内核版本 | 3.10.0-693.2.2.el7.x86_64 |
| docker版本 | 18.06.3-ce |
| Kubernetes | v1.14.3 |
进入crash控制台
kdump生成的宕机日志存放在/var/crash/宕机时间/目录
[root@hdc4 ~]# cd /var/crash/127.0.0.1-2020-08-15-09\:45\:48/
[root@hdc4 127.0.0.1-2020-08-15-09:45:48]# ls -lrt
total 1439096
-rw-r--r-- 1 root root 1062743 Aug 15 09:45 vmcore-dmesg.txt
-rw------- 1 root root 1472563009 Aug 15 09:46 vmcore
[root@hdc4 127.0.0.1-2020-08-15-09:45:48]#
使用crash工具打开vmcore文件
[root@hdc4 127.0.0.1-2020-08-15-09:45:48]# crash vmcore /usr/lib/debug/lib/modules/3.10.0-693.2.1.el7.x86_64/vmlinux
crash 7.2.3-10.el7
Copyright (C) 2002-2017 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
WARNING: kernel version inconsistency between vmlinux and dumpfile
KERNEL: /usr/lib/debug/lib/modules/3.10.0-693.2.1.el7.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 4
DATE: Sat Aug 15 09:45:43 2020
UPTIME: 766 days, 18:41:26
LOAD AVERAGE: 0.57, 0.51, 0.78
TASKS: 2225
NODENAME: hdc4
RELEASE: 3.10.0-693.2.2.el7.x86_64
VERSION: #1 SMP Tue Sep 12 22:26:13 UTC 2017
MACHINE: x86_64 (2494 Mhz)
MEMORY: 16 GB
PANIC: "kernel BUG at lib/idr.c:1157!"
PID: 18614
COMMAND: "kworker/3:94"
TASK: ffff88015cb20fd0 [THREAD_INFO: ffff88015ae30000]
CPU: 3
STATE: TASK_RUNNING (PANIC)
crash>
PANIC为内核崩溃类型,可以看出这是内核的BUG,由kworker/3:94导致的,kworker即’Linux kernel doing work’(系统调用,processing system calls),它是内核工作线程的’占位符’进程,它实际上执行内核的大部分工作,如中断、计时器、I/O等
继续使用bt命令查看系统崩溃前内核在何处崩溃
crash> bt
PID: 18614 TASK: ffff88015cb20fd0 CPU: 3 COMMAND: "kworker/3:94"
#0 [ffff88015ae33a50] machine_kexec at ffffffff8105c4cb
#1 [ffff88015ae33ab0] __crash_kexec at ffffffff81104a32
#2 [ffff88015ae33b80] crash_kexec at ffffffff81104b20
#3 [ffff88015ae33b98] oops_end at ffffffff816ad2b8
#4 [ffff88015ae33bc0] die at ffffffff8102e97b
#5 [ffff88015ae33bf0] do_trap at ffffffff816aca00
#6 [ffff88015ae33c40] do_invalid_op at ffffffff8102b184
#7 [ffff88015ae33cf0] invalid_op at ffffffff816b689e
[exception RIP: ida_simple_remove+65]
RIP: ffffffff81324031 RSP: ffff88015ae33da0 RFLAGS: 00010286
RAX: ffff8800743e1000 RBX: 00000000ffffffff RCX: 0000000000002fff
RDX: 0000000000002fff RSI: 00000000ffffffff RDI: ffffffff81fa5ca0
RBP: ffff88015ae33db8 R8: 0000000000000000 R9: 000000000006edf0
R10: 000000000000ba68 R11: 000000000006edd1 R12: 0000000000000400
R13: ffff88043fe1d6f0 R14: 0000000000000001 R15: 0000000000000004
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
#8 [ffff88015ae33dc0] __mem_cgroup_free at ffffffff811f0654
#9 [ffff88015ae33e10] free_work at ffffffff811f0685
#10 [ffff88015ae33e20] process_one_work at ffffffff810a881a
#11 [ffff88015ae33e68] worker_thread at ffffffff810a94e6
#12 [ffff88015ae33ec8] kthread at ffffffff810b098f
#13 [ffff88015ae33f50] ret_from_fork at ffffffff816b4f58
crash>
IDA和IDR在linux内核中是整数ID管理机制,IDA是一种基于Radix的ID分配机制,IDA是对IDR的封装,IDA不需要记录指针,仅仅是分配与管理ID
使用dis命令来看一下该地址的反汇编结果
crash> dis -l ffffffff81324031
/usr/src/debug/kernel-3.10.0-693.2.1.el7/linux-3.10.0-693.2.1.el7.x86_64/lib/idr.c: 1157
0xffffffff81324031 <ida_simple_remove+65>: ud2
crash>
从上面可以看到内核在idr.c文件1157行代码崩溃,打开linux源码的相应位置,定位到1157行
1153 void ida_simple_remove(struct ida *ida, unsigned int id)
1154 {
1155 unsigned long flags;
1156
1157 BUG_ON((int)id < 0);
1158 spin_lock_irqsave(&simple_ida_lock, flags);
1159 ida_remove(ida, id);
1160 spin_unlock_irqrestore(&simple_ida_lock, flags);
1161 }
1162 EXPORT_SYMBOL(ida_simple_remove);
通过翻阅内核文档,ida_simple_remove() 函数是用于释放一个ID。参数ida指向IDA;参数 id 指向需要释放的 ID,函数直接调用ida_remove() 函数进行ID释放。在使用Kubernetes集群托管docker的系统上,系统都在同一集群中,并且有活动的Docker容器正在运行。每次出现恐慌都是由于kmemcg_id值为-1(0xffffffff)
继续所以log命令查看日志
[66249122.657112] docker0: port 18(veth455b303) entered blocking state
[66249122.657115] docker0: port 18(veth455b303) entered forwarding state
[66249123.178811] docker-runc: page allocation failure: order:6, mode:0x10c0d0
[66249123.178817] CPU: 3 PID: 20628 Comm: docker-runc Tainted: G W OE ------------ T 3.10.0-693.2.2.el7.x86_64 #1
[66249123.178818] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[66249123.178820] 000000000010c0d0 00000000eefb6afd ffff88015c307a98 ffffffff816a3db1
[66249123.178823] ffff88015c307b28 ffffffff81188810 0000000000000000 ffff88043ffd9000
[66249123.178825] 0000000000000006 000000000010c0d0 ffff88015c307b28 00000000eefb6afd
[66249123.178827] Call Trace:
[66249123.178837] [<ffffffff816a3db1>] dump_stack+0x19/0x1b
[66249123.178844] [<ffffffff81188810>] warn_alloc_failed+0x110/0x180
[66249123.178846] [<ffffffff8169fd8a>] __alloc_pages_slowpath+0x6b6/0x724
[66249123.178857] [<ffffffff8118cd85>] __alloc_pages_nodemask+0x405/0x420
[66249123.178861] [<ffffffff811d1108>] alloc_pages_current+0x98/0x110
[66249123.178863] [<ffffffff8118760e>] __get_free_pages+0xe/0x40
[66249123.178867] [<ffffffff811dcabe>] kmalloc_order_trace+0x2e/0xa0
[66249123.178870] [<ffffffff811e0651>] __kmalloc+0x211/0x230
[66249123.178873] [<ffffffff811f5d6d>] memcg_update_cache_size+0x5d/0xc0
[66249123.178877] [<ffffffff811a6b84>] memcg_update_all_caches+0x54/0x90
[66249123.178879] [<ffffffff811f5641>] memcg_update_cache_sizes+0x41/0xc0
[66249123.178880] [<ffffffff811f5c8f>] mem_cgroup_css_online+0x1cf/0x210
[66249123.178885] [<ffffffff8110a3bf>] online_css.isra.9+0x1f/0x50
[66249123.178887] [<ffffffff8110e4e7>] cgroup_mkdir+0x527/0x630
[66249123.178890] [<ffffffff8120d0d7>] vfs_mkdir+0xb7/0x160
[66249123.178893] [<ffffffff8121346a>] SyS_mkdirat+0xca/0x100
[66249123.178896] [<ffffffff816b5009>] system_call_fastpath+0x16/0x1b
[66249123.178898] Mem-Info:
[66249123.178901] active_anon:1687418 inactive_anon:3334 isolated_anon:0
active_file:611690 inactive_file:347074 isolated_file:0
unevictable:0 dirty:374 writeback:0 unstable:0
slab_reclaimable:754737 slab_unreclaimable:450917
mapped:55731 shmem:8109 pagetables:8507 bounce:0
free:138442 free_pcp:0 free_cma:0
[66249123.178904] Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[66249123.178908] lowmem_reserve[]: 0 2814 15869 15869
[66249123.178910] Node 0 DMA32 free:64312kB min:11976kB low:14968kB high:17964kB active_anon:862264kB inactive_anon:328kB active_file:168kB inactive_file:160kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3129216kB managed:2884232kB mlocked:0kB dirty:12kB writeback:0kB mapped:1036kB shmem:1492kB slab_reclaimable:1508624kB slab_unreclaimable:383868kB kernel_stack:9680kB pagetables:4512kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[66249123.178914] lowmem_reserve[]: 0 0 13054 13054
[66249123.178916] Node 0 Normal free:473548kB min:55540kB low:69424kB high:83308kB active_anon:5887408kB inactive_anon:13008kB active_file:2446592kB inactive_file:1388136kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:13631488kB managed:13367588kB mlocked:0kB dirty:1484kB writeback:0kB mapped:221888kB shmem:30944kB slab_reclaimable:1510324kB slab_unreclaimable:1419800kB kernel_stack:26096kB pagetables:29516kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[66249123.178919] lowmem_reserve[]: 0 0 0 0
[66249123.178921] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15908kB
[66249123.178929] Node 0 DMA32: 1501*4kB (UEM) 498*8kB (UEM) 271*16kB (UEM) 265*32kB (UEM) 154*64kB (UEM) 93*128kB (UEM) 10*256kB (M) 6*512kB (M) 2*1024kB (EM) 6*2048kB (M) 0*4096kB = 64532kB
[66249123.178938] Node 0 Normal: 8454*4kB (UEM) 6702*8kB (UEM) 5452*16kB (UEM) 3852*32kB (UEM) 1960*64kB (UEM) 389*128kB (UEM) 3*256kB (M) 0*512kB 0*1024kB 0*2048kB 0*4096kB = 473928kB
[66249123.178946] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[66249123.178947] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[66249123.178948] 966873 total pagecache pages
[66249123.178950] 0 pages in swap cache
[66249123.178951] Swap cache stats: add 0, delete 0, find 0/0
[66249123.178952] Free swap = 0kB
[66249123.178952] Total swap = 0kB
[66249123.178953] 4194174 pages RAM
[66249123.178954] 0 pages HighMem/MovableOnly
[66249123.178954] 127242 pages reserved
[66249123.187530] ------------[ cut here ]------------
[66249123.188383] kernel BUG at lib/idr.c:1157!
[66249123.188982] invalid opcode: 0000 [#1] SMP
[66249123.189629] Modules linked in: AliSecGuard(OE) nfnetlink_queue nfnetlink_log bluetooth rfkill binfmt_misc rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace sunrpc fscache vfat fat isofs xfs rbd libceph dns_resolver xt_set ip_set_hash_ipportnet ip_set_bitmap_port ip_set_hash_ipport ip_set_hash_ipportip ip_set dummy ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs xt_comment xt_mark vxlan ip6_udp_tunnel udp_tunnel xt_nat veth udp_diag ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack libcrc32c br_netfilter bridge stp llc overlay(T) tcp_diag inet_diag sb_edac edac_core iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul ppdev glue_helper ablk_helper cryptd joydev virtio_balloon
[66249123.200801] parport_pc i2c_piix4 parport pcspkr ip_tables ext4 mbcache jbd2 ata_generic pata_acpi virtio_net virtio_console virtio_blk cirrus drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crct10dif_pclmul crct10dif_common ata_piix crc32c_intel libata serio_raw virtio_pci virtio_ring i2c_core virtio floppy
[66249123.205547] CPU: 3 PID: 18614 Comm: kworker/3:94 Tainted: G W OE ------------ T 3.10.0-693.2.2.el7.x86_64 #1
[66249123.207152] Hardware name: Alibaba Cloud Alibaba Cloud ECS, BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
[66249123.210722] Workqueue: events free_work
[66249123.212525] task: ffff88015cb20fd0 ti: ffff88015ae30000 task.ti: ffff88015ae30000
[66249123.214131] RIP: 0010:[<ffffffff81324031>] [<ffffffff81324031>] ida_simple_remove+0x41/0x50
[66249123.215720] RSP: 0018:ffff88015ae33da0 EFLAGS: 00010286
[66249123.217024] RAX: ffff8800743e1000 RBX: 00000000ffffffff RCX: 0000000000002fff
[66249123.218537] RDX: 0000000000002fff RSI: 00000000ffffffff RDI: ffffffff81fa5ca0
[66249123.220093] RBP: ffff88015ae33db8 R08: 0000000000000000 R09: 000000000006edf0
[66249123.221628] R10: 000000000000ba68 R11: 000000000006edd1 R12: 0000000000000400
[66249123.223171] R13: ffff88043fe1d6f0 R14: 0000000000000001 R15: 0000000000000004
[66249123.224705] FS: 0000000000000000(0000) GS:ffff88043fd80000(0000) knlGS:0000000000000000
[66249123.226364] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[66249123.228062] CR2: 00007efe3c32d000 CR3: 00000001680b6000 CR4: 00000000003406e0
[66249123.229683] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[66249123.231302] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[66249123.232945] Stack:
[66249123.234144] ffff8800743e1000 0000000000000400 ffff88043fe1d6f0 ffff88015ae33e08
[66249123.235924] ffffffff811f0654 00000000000003d8 0000040038ae5980 ffff8800743e1000
[66249123.237708] ffff8800743e1100 ffff88013cca5f00 ffff88043fd96540 ffff88043fd9a700
[66249123.239493] Call Trace:
[66249123.240807] [<ffffffff811f0654>] __mem_cgroup_free+0x224/0x240
[66249123.242479] [<ffffffff811f0685>] free_work+0x15/0x20
[66249123.244040] [<ffffffff810a881a>] process_one_work+0x17a/0x440
[66249123.245684] [<ffffffff810a94e6>] worker_thread+0x126/0x3c0
[66249123.247315] [<ffffffff810a93c0>] ? manage_workers.isra.24+0x2a0/0x2a0
[66249123.249154] [<ffffffff810b098f>] kthread+0xcf/0xe0
[66249123.250753] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[66249123.252471] [<ffffffff816b4f58>] ret_from_fork+0x58/0x90
[66249123.254149] [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40
[66249123.255911] Code: 7a fc 81 e8 b2 7b 38 00 89 de 49 89 c5 4c 89 e7 e8 65 fe ff ff 4c 89 ee 48 c7 c7 b8 7a fc 81 e8 26 78 38 00 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 41 57
[66249123.261000] RIP [<ffffffff81324031>] ida_simple_remove+0x41/0x50
[66249123.262906] RSP <ffff88015ae33da0>
Kubernetes会为每个pod创建netns来隔离network namespace,内核初始化netns时会为其创建 nf_conntrack 表的 cache,需要申请大页内存,如果此时系统内存已经碎片化,无法分配到足够的大页内存就会报"page allocation failure" 错误。
这里是docker-runc
[66249123.178811] docker-runc: page allocation failure: order:6, mode:0x10c0d0
实际上docker容器启动的过程中,docker-runc进程是作为 docker-containerd-shim 的子进程存在的。docker-runc进程根据配置找到容器的 rootfs 并创建子进程 bash 作为容器中的第一个进程,当这一切都完成后 docker-runc 进程退出,然后容器进程 bash 由 docker-runc 的父进程 docker-containerd-shim 接管。但是docker-runc 申请不到大页内存,容器内的进程无法启动,异常退出,而Kubernetes始终保持最小的副本集,会删除或重启上一次创建的pod,但上一次创建的pod因为申请不到大页内存,所以它的ID不是>0,从而触发内核BUG。
解决办法
社区已经在内核3.10.0-1048.el7版本中修复了该BUG




