在kuboard监控套件安装时,prometheus-k8两个pod状态一直为CrashLoopBackOff,文中提供分析及解决方法 。
kubernetes V1.23.5;
Kuboard V3.5.0.1;
资源层监控套件system-monitor.addons.kuboard.cn v3.1.7;
安装kuboard套件时,prometheus-k8s两个pod不能启动,状态为CrashLoopBackOff
[root@rh-master01 ~]# kubectl get pod -n kuboardNAME READY STATUS RESTARTS AGE...prometheus-k8s-0 1/2 CrashLoopBackOff 34 (3m33s ago) 153mprometheus-k8s-1 1/2 CrashLoopBackOff 34 (3m41s ago) 153mprometheus-operator-5d7cc5dc4d-wl97p 2/2 Running 0 156msystem-monitor-config-864f784987-ngqk2 1/1 Running 0 153m
查看其中一个pod(prometheus-k8s-0)日志
[root@rh-master01 ~]# kubectl logs prometheus-k8s-0 -n kuboard...level=info ts=2022-05-26T03:07:52.844Z caller=main.go:428 msg="Starting Prometheus" version="(version=2.29.1, branch=HEAD, revision=dcb07e8eac34b5ea37cd229545000b857f1c1637)"level=info ts=2022-05-26T03:07:52.844Z caller=main.go:433 build_context="(go=go1.16.7, user=root@364730518a4e, date=20210811-14:48:27)"level=info ts=2022-05-26T03:07:52.844Z caller=main.go:434 host_details="(Linux 5.17.8-1.el7.elrepo.x86_64 #1 SMP PREEMPT Sat May 14 07:21:57 EDT 2022 x86_64 prometheus-k8s-0 (none))"level=info ts=2022-05-26T03:07:52.844Z caller=main.go:435 fd_limits="(soft=1048576, hard=1048576)"level=info ts=2022-05-26T03:07:52.844Z caller=main.go:436 vm_limits="(soft=unlimited, hard=unlimited)"level=error ts=2022-05-26T03:07:52.845Z caller=query_logger.go:87 component=activeQueryTracker msg="Error opening query log file" file=/prometheus/queries.active err="open prometheus/queries.active: permission denied"panic: Unable to create mmap-ed active query log...
报错提示permission denied:
msg="Error opening query log file" file=/prometheus/queries.active err="open prometheus/queries.active: permission denied" panic: Unable to create mmap-ed active query log
权限问题,Kuboard 监控套件基于kube-prometheus构建,prometheus的镜像中文件/prometheus/queries.active属主为1000这个用户,当前nfs路径prometheus-k8s-db-prometheus-k8s-0属主是root用户(有权限风险),从而导致写入失败。
修改PV的路径权限为777,确保后续pod中属主为1000的用户也可以对文件进行操作:
[root@rh-master01 ~]# chmod -R 777 vm/dev-nfs/kuboard_pv/prometheus-k8s-db-prometheus-k8s-0[root@rh-master01 ~]# chmod -R 777 vm/dev-nfs/kuboard_pv/prometheus-k8s-db-prometheus-k8s-1等待数分钟之后,prometheus-k8s-0和prometheus-k8s-0状态便为Running,问题解决[root@rh-master01 ~]# kubectl get po -n kuboardNAME READY STATUS RESTARTS AGE...prometheus-k8s-0 2/2 Running 80 (16m ago) 6h35mprometheus-k8s-1 2/2 Running 81 (11m ago) 6h35m...
进入pod查看queries.active文件权限
[root@rh-master01 ~]# kubectl exec -it prometheus-k8s-0 sh -n kuboard/prometheus $ ls -l prometheus/queries.active-rw-r--r-- 1 1000 root 20001 May 27 00:45 prometheus/queries.active
在kuboard监控套件安装时,prometheus-k8两个pod运行异常,通过分析,更改PV依赖的NFS目录权限后,pod内文件权限正常,pod也恢复正常。
另:生产环境NFS权限建议配置非root用户,提升安全性,如:
cat etc/exports/vm/dev-nfs 192.168.80.0/24(rw,sync,no_root_squash)
http://events.jianshu.io/p/8b9fd5493682https://github.com/prometheus/prometheus/issues/9704
-- 完 --
更多精彩,敬请期待
不足之处,还望抛转。
作者:王坤,微信公众号:rundba,欢迎转载,转载请注明出处。
如需公众号转发,请联系wx:landnow。

往期推荐
文章转载自rundba,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




