暂无图片
暂无图片
2
暂无图片
暂无图片
暂无图片

Seabox数据库偶入Redhat 8 cgroup blkio坑

原创 FairyFar 2025-01-27
344

1. 问题

SeaboxMPP数据库运维反映现场项目遇到一个诡异问题:部分资源组执行查询报以下类似错误:

ERROR:  can't open file '/sys/fs/cgroup/blkio/scdb_seabox/6438/cgroup.procs': No such file or directory

集群启动后,运行很平稳。但是指不定什么时候某些资源组用户突然报上述错误(有些组正常),报错的主机不确定,重启Seabox集群后恢复正常。

根据报错信息,登录报错的主机,发现/sys/fs/cgroup/blkio/scdb_seabox/6438/目录不在了,但是/sys/fs/cgroup/blkio/scdb_seabox/6441/目录还在,无法解释的是/sys/fs/cgroup/cpu/scdb_seabox/6438/也存在。

最初当然是怀疑Seabox数据库程序的Bug,对代码静态走查,百思不得其解。

然后,继续与现场运维工程师沟通,得知这套集群是最近才部署的,是按照现有集群配置的新集群,不同的是新集群部署在openEuler 20.03操作系统上。于是将问题焦点转移到操作系统上,怀疑国产化操作系统做了“魔改”。研发内部尝试在openEuler 20.03系统上进行复现,遗憾的是,并没有复现问题。与此同时,搜索网络资料,看看有没有人遇到类似问题——一个cgroup v1的Bug被挖出。

2. Root Cause

经查,Redhat/CentOS 8openEuler 20.03~23.03等版本操作系统上,cgroup v1blkiodevice存在Bug:blkiodevice子系统下的用户层级在某些情况下会被系统删除。

已经有人给Redhat报告了Bug:

截止到发稿时间(2025.01.27)尚无答复。

systemctl相关的命令可能会触发以上问题,例如:systemctl daemon-reloadsystemctl enable等。

3. 复现用例

openEuler 23.03系统上复现问题:

[root@seabox ~]# mkdir /sys/fs/cgroup/blkio/seabox [root@seabox ~]# mkdir /sys/fs/cgroup/cpu/seabox [root@seabox ~]# systemctl daemon-reload [root@seabox ~]# ls /sys/fs/cgroup/blkio/seabox ls: cannot access '/sys/fs/cgroup/blkio/seabox': No such file or directory [root@seabox ~]# ls /sys/fs/cgroup/cpu/seabox ……

使用trace跟踪系统systemd进程删除blkio子目录的过程:

1     1691148727.037709 newfstatat(AT_FDCWD, "/sys/fs/cgroup/blkio", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000007>
1     1691148727.037737 openat(AT_FDCWD, "/sys/fs/cgroup/blkio", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 22 <0.000017>
1     1691148727.037797 newfstatat(22, "", {st_mode=S_IFDIR|0555, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000009>
1     1691148727.037842 getdents64(22, 0x563751ce3a50 /* 22 entries */, 32768) = 1000 <0.000015>
1     1691148727.037889 newfstatat(22, "blkio.bfq.io_service_bytes", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000010>
1     1691148727.037931 newfstatat(22, "cgroup.procs", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000038>
1     1691148727.037997 newfstatat(22, "blkio.bfq.io_serviced", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038026 newfstatat(22, "blkio.throttle.read_iops_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038051 newfstatat(22, "blkio.throttle.io_service_bytes", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038076 newfstatat(22, "cgroup.sane_behavior", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038101 newfstatat(22, "blkio.bfq.io_service_bytes_recursive", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038126 newfstatat(22, "blkio.bfq.io_serviced_recursive", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000005>
1     1691148727.038154 newfstatat(22, "blkio.throttle.write_iops_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.038195 newfstatat(22, "blkio.reset_stats", {st_mode=S_IFREG|0200, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038225 newfstatat(22, "blkio.throttle.read_bps_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038249 newfstatat(22, "blkio.throttle.write_bps_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038274 newfstatat(22, "tasks", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038299 newfstatat(22, "seabox", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.038324 openat(22, "seabox", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 23 <0.000008>
1     1691148727.038351 newfstatat(23, "", {st_mode=S_IFDIR|0755, st_size=0, ...}, AT_EMPTY_PATH) = 0 <0.000005>
1     1691148727.038376 fcntl(23, F_GETFL) = 0x18800 (flags O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) <0.000004>
1     1691148727.038395 fcntl(23, F_SETFD, FD_CLOEXEC) = 0 <0.000004>
1     1691148727.038420 getdents64(23, 0x563751ceba90 /* 21 entries */, 32768) = 984 <0.000010>
1     1691148727.038450 newfstatat(23, "blkio.bfq.io_service_bytes", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000015>
1     1691148727.038485 newfstatat(23, "cgroup.procs", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000008>
1     1691148727.038513 newfstatat(23, "blkio.bfq.io_serviced", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000008>
1     1691148727.038548 newfstatat(23, "blkio.throttle.read_iops_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000008>
1     1691148727.038577 newfstatat(23, "blkio.throttle.io_service_bytes", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.038606 newfstatat(23, "blkio.bfq.io_service_bytes_recursive", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.038635 newfstatat(23, "blkio.bfq.io_serviced_recursive", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000010>
1     1691148727.038664 newfstatat(23, "blkio.throttle.write_iops_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000010>
1     1691148727.038694 newfstatat(23, "blkio.bfq.weight", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.038745 newfstatat(23, "blkio.reset_stats", {st_mode=S_IFREG|0200, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000016>
1     1691148727.038799 newfstatat(23, "blkio.throttle.read_bps_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000011>
1     1691148727.038840 newfstatat(23, "blkio.throttle.write_bps_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000012>
1     1691148727.038881 newfstatat(23, "tasks", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000011>
1     1691148727.038921 newfstatat(23, "notify_on_release", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000046>
1     1691148727.039006 newfstatat(23, "cgroup.clone_children", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000010>
1     1691148727.039042 newfstatat(23, "blkio.throttle.io_serviced", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.039071 newfstatat(23, "blkio.bfq.weight_device", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.039100 newfstatat(23, "blkio.throttle.io_service_bytes_recursive", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000008>
1     1691148727.039128 newfstatat(23, "blkio.throttle.io_serviced_recursive", {st_mode=S_IFREG|0444, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000009>
1     1691148727.039157 getdents64(23, 0x563751ceba90 /* 0 entries */, 32768) = 0 <0.000004>
1     1691148727.039178 close(23)       = 0 <0.000006>
1     1691148727.039199 rmdir("/sys/fs/cgroup/blkio/seabox") = 0 <0.000038>
1     1691148727.039257 newfstatat(22, "notify_on_release", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>
1     1691148727.039284 newfstatat(22, "release_agent", {st_mode=S_IFREG|0644, st_size=0, ...}, AT_SYMLINK_NOFOLLOW) = 0 <0.000006>

注意倒数第3行,可以看到rmdir("/sys/fs/cgroup/blkio/seabox")系统调用,说明的确是systemd执行的删除命令,所以问题根源在于systemd程序。

综上分析,Seabox数据库现场遇到的问题完全解释通了。

4. 解决方法

4.1 操作系统

操作系统层面暂时没有好的办法(cgroup v1几乎已经被抛弃,Linux内核已转向推荐使用cgroup v2)。如果确实需要使用blkio子系统,那么应该避免执行systemctl daemon-reloadsystemctl enable等命令。或者可以将一个始终存在的进程PID写入blkio子系统的cgroup.procs,这样相应的层级目录不会被删除(有进程存在的资源组不会被删除)。

4.2 配置Seabox数据库

好在blkio子系统并非Seabox数据库的必需功能,因此,可以配置Seabox数据库参数sc_resgroup_enable_blkio = off来禁止使用blkio

最后修改时间:2025-01-27 21:06:09
「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论