欢迎关注「WeiyiGeek」公众号
点击 👇 下方卡片 即可关注我哟!
设为「星标⭐」每天带你 基础入门 到 进阶实践 再到 放弃学习!
涉及 网络安全运维、应用开发、物联网IOT、学习路径 、个人感悟 等知识
“ 花开堪折直须折,莫待无花空折枝。 ”
作者主页:[ https://www.weiyigeek.top ]
作者博客:[ https://blog.weiyigeek.top ]
作者答疑学习交流群:请关注公众号后回复【学习交流群】
文章目录:
0x00 实操1.记一次 Kubernetes V1.23.x 集群证书过期或者延期处理操作实践案例
1.前言简述
2.实践环境
3.证书续签
4.集群证书更新后 calico && kube-proxy 相关操作
5.控制平面(Master)节点的移除&&添加
6.工作平面(Work)节点的移除&&添加
n.实践所遇问题
问题1.在master节点上或者非master节点上执行kubectl命令报
The connection to the server localhost:8080 was refused -
错误问题2.在某master节点上calico-node准备状态一直为0/1并提示
connect: connection refusedcalico/node is not ready:
错误问题3.在节点加入集群时报
bridge-nf-call-iptables does not exist
错误问题解决问题4.在启动某个节点的kubelet报
Unable to read config path" err="path does not exist, ignoring
错误解决办法问题5.在节点加入到集群中时报
the namespace "kube-system" error downloading the secret
错误解决办法
0x00 实操1.记一次 Kubernetes V1.23.x 集群证书过期或者延期处理操作实践案例
描述: 下述操作主要用于处理K8S证书已过期
或者即将过期的kubernetes集群
实践案例,当然网上百度、CSDN(捡垃圾)大多数方法均是一知半解,不同的K8S版本操作有些许不同,若全部按照其操作,有可能你需要重建K8S了,别问我怎么知道的,因为我踩过坑( Ĭ ^ Ĭ ),所以建议在遇到问题时先查询K8S官方文档。
1.前言简述
在年后上班的第一天,我像往常一样,登录到K8S集群之中依次检查应用,在检查开发测试环境的k8s集群时,发现执行kubectl命令报证书过期错误,顿时心情都不好了,却也无可奈何,只能进行证书续签了,由于是高可用集群遇到了许多坑,为了方便自己以及广大的运维工作者,解决相关问题,遂整理了此篇文章。
# 连接 Api-server 失败,报证书已过期不可用。
$ kubectl get node,pod
Unable to connect to the server: x509: certificate has expired or is not yet valid: current time 2023-01-31T16:55:27+08:00 is after 2023-01-16T04:47:34Z
2.实践环境
集群版本及其节点描述:
# 集群版本
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:39:51Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
# 集群节点
$ data -s "2023-01-01"
$ kubectl get node
NAME STATUS ROLES AGE VERSION
weiyigeek-107 Ready control-plane,master 381d v1.23.1
weiyigeek-108 Ready control-plane,master 380d v1.23.1
weiyigeek-109 Ready control-plane,master 380d v1.23.1
weiyigeek-223 Ready work 380d v1.23.1
weiyigeek-224 Ready work 380d v1.23.1
weiyigeek-225 Ready work 381d v1.23.1
weiyigeek-226 Ready work 220d v1.23.1
# 论保存过程配置文件的重要性,在搭建k8s集群时建议备份资源清单。
kubectl -n kube-system get cm kubeadm-config -o yaml > kubeadm-config-v1.23.1.yaml
3.实践证书续签
高可用K8S集群,证书续签操作流程步骤如下:
0.在进行操作前一定要进行备份,便于回退处理,此处我在三台master节点之一的weiyigeek-107
机器上操作,后续默认也在此机器上操作,若需在其他机器上操作我会进行说明
# 备份旧的配置文件。cp -a etc/kubernetes{,.bak}cp -a var/lib/kubelet{,.bak}cp -a var/lib/etcd var/lib/etcd.bak# 备份集群配置 (当证书到期时是无法执行的此步骤可跳过)但可以利用date命令将系统时间设置到过期前。data -s "2023-01-01" || timedatectl set-time "2023-01-01"kubectl -n kube-system get cm kubeadm-config -o yaml > kubeadm-init-config.yaml # 后续会用到此原始配置文件。
1.使用openssl命令查询单个证书可用时间及其相关信息
# k8s 集群的 ca.crt 证书有效期为 十年# k8s 集群的 apiserver.crt 、kubelet.crt、etcd.crt 证书默认有效期为 一年,当然你也可以自行修改为十年(后续有文章进行讲解)$ for i in $(ls etc/kubernetes/pki/*.crt etc/kubernetes/pki/etcd/*.crt); do echo "===== $i ====="; openssl x509 -in $i -text -noout | grep -A 3 'Validity' ; done# for item in `find etc/kubernetes/pki -maxdepth 2 -name "*.crt"`;do echo ======================$item===============;openssl x509 -in $item -text -noout| grep -A 3 Not;done===== etc/kubernetes/pki/apiserver.crt =====ValidityNot Before: Jan 15 10:42:56 2022 GMT # 颁发时间Not After : Jan 15 10:42:57 2023 GMT # 到期时间Subject: CN = kube-apiserver # 通用名称===== etc/kubernetes/pki/apiserver-etcd-client.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 15 10:42:59 2023 GMTSubject: O = system:masters, CN = kube-apiserver-etcd-client===== etc/kubernetes/pki/apiserver-kubelet-client.crt =====ValidityNot Before: Jan 15 10:42:56 2022 GMTNot After : Jan 15 10:42:57 2023 GMTSubject: O = system:masters, CN = kube-apiserver-kubelet-client===== etc/kubernetes/pki/ca.crt =====ValidityNot Before: Jan 15 10:42:56 2022 GMTNot After : Jan 13 10:42:56 2032 GMTSubject: CN = kubernetes===== etc/kubernetes/pki/etcd/ca.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 13 10:42:58 2032 GMTSubject: CN = etcd-ca===== etc/kubernetes/pki/etcd/healthcheck-client.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 15 10:42:59 2023 GMTSubject: O = system:masters, CN = kube-etcd-healthcheck-client===== etc/kubernetes/pki/etcd/peer.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 15 10:42:59 2023 GMTSubject: CN = weiyigeek-107===== etc/kubernetes/pki/etcd/server.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 15 10:42:59 2023 GMTSubject: CN = weiyigeek-107===== etc/kubernetes/pki/front-proxy-ca.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 13 10:42:58 2032 GMTSubject: CN = front-proxy-ca===== etc/kubernetes/pki/front-proxy-client.crt =====ValidityNot Before: Jan 15 10:42:58 2022 GMTNot After : Jan 15 10:42:58 2023 GMTSubject: CN = front-proxy-client
2.查看当前集群证书相关信息,包含所有证书名称以及证书颁发机构、到期时间等, 此处可以看到均已经到期。
$ sudo kubeadm certs check-expiration# [check-expiration] Reading configuration from the cluster...# [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'# [check-expiration] Error reading configuration from the Cluster. Falling back to default configuration# CERTIFICATE EXPIRES(过期时间) RESIDUAL TIME(剩余时间) CERTIFICATE AUTHORITY EXTERNALLY MANAGED(是否是外部管理)# admin.conf Jan 15, 2023 10:43 UTC <invalid> ca no# apiserver Jan 15, 2023 10:42 UTC <invalid> ca no# apiserver-etcd-client Jan 15, 2023 10:42 UTC <invalid> etcd-ca no# apiserver-kubelet-client Jan 15, 2023 10:42 UTC <invalid> ca no# controller-manager.conf Jan 15, 2023 10:43 UTC <invalid> ca no# etcd-healthcheck-client Jan 15, 2023 10:42 UTC <invalid> etcd-ca no# etcd-peer Jan 15, 2023 10:42 UTC <invalid> etcd-ca no# etcd-server Jan 15, 2023 10:42 UTC <invalid> etcd-ca no# front-proxy-client Jan 15, 2023 10:42 UTC <invalid> front-proxy-ca no# scheduler.conf Jan 15, 2023 10:43 UTC <invalid> ca no# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED# ca Jan 13, 2032 10:42 UTC 8y no# etcd-ca Jan 13, 2032 10:42 UTC 8y no# front-proxy-ca Jan 13, 2032 10:42 UTC 8y no
温馨提示: 如果 Etcd 是由Kubeadm创建和托管的此时也可以通过下面的方式进行证书的续期, 如果是外部高可用环境管理需要则手动进行更新证书配置;3.使用 certs 的 renew 子命令刷新集群所有证书的到期时间进行再续期一年, 此处 --config 参数指定的是我当初创建集群的初始化配置清单,若没有可以安装步骤0进行生成。
~/.k8s$ sudo kubeadm certs renew all --config=./kubeadm-init-config.yaml# W1212 17:17:16.721037 1306627 configset.go:348] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]# certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed # 嵌入在kubeconfig文件中的证书,供管理员使用,并对kubeadm本身进行更新 (admin.conf )# certificate for serving the Kubernetes API renewed # 更新Kubernetes API服务证书# certificate the apiserver uses to access etcd renewed # 服务器访问etcd所使用的证书已更新# certificate for the API server to connect to kubelet renewed # API服务器连接到kubelet的证书已更新# certificate embedded in the kubeconfig file for the controller manager to use renewed # 证书嵌入在kubeconfig文件中,供控制器管理器使用更新 (controller-manager.conf)# certificate for liveness probes to healthcheck etcd renewed # 健康检查etcd激活探针证书续期# certificate for etcd nodes to communicate with each other renewed # 用于etcd节点之间通信的证书更新# certificate for serving etcd renewed # 续期etcd“服务证书”# certificate for the front proxy client renewed # 前代理客户端的证书更新# certificate embedded in the kubeconfig file for the scheduler manager to use renewed # 证书嵌入在kubeconfig文件中,供调度器管理器使用更新 (scheduler.conf )# 若看到已完成续订证书,您必须重新启动kube apiserver、kube控制器管理器、kube调度器等,以便它们可以使用新证书,表示证书续期成功Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.# 检查证书续签以及到期时间~/.k8s$ kubeadm certs check-expiration# [check-expiration] Reading configuration from the cluster...# [check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'# CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED# admin.conf Jan 31, 2024 09:26 UTC 364d ca no# apiserver Jan 31, 2024 09:26 UTC 364d ca no# apiserver-etcd-client Jan 31, 2024 09:26 UTC 364d etcd-ca no# apiserver-kubelet-client Jan 31, 2024 09:26 UTC 364d ca no# controller-manager.conf Jan 31, 2024 09:26 UTC 364d ca no# etcd-healthcheck-client Jan 31, 2024 09:26 UTC 364d etcd-ca no# etcd-peer Jan 31, 2024 09:26 UTC 364d etcd-ca no# etcd-server Jan 31, 2024 09:26 UTC 364d etcd-ca no# front-proxy-client Jan 31, 2024 09:26 UTC 364d front-proxy-ca no# scheduler.conf Jan 31, 2024 09:26 UTC 364d ca no# CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED# ca Jan 13, 2032 10:42 UTC 8y no# etcd-ca Jan 13, 2032 10:42 UTC 8y no# front-proxy-ca Jan 13, 2032 10:42 UTC 8y no# 使用stat命令查看 apiserver.key 与 apiserver.crt 证书修改时间/etc/kubernetes/pki$ stat apiserver.key apiserver.crt# File: apiserver.key# Size: 1675 Blocks: 8 IO Block: 4096 regular file# Device: fd00h/64768d Inode: 3670556 Links: 1# Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root)# Access: 2022-04-28 12:55:13.456040564 +0800 最近访问:# Modify: 2023-01-31 17:26:51.108767670 +0800 最近更改:# Change: 2023-01-31 17:26:51.108767670 +0800 最近改动:# Birth: -# File: apiserver.crt# Size: 1338 Blocks: 8 IO Block: 4096 regular file# Device: fd00h/64768d Inode: 3670557 Links: 1# Access: (0644/-rw-r--r--) Uid: ( 0/ root) Gid: ( 0/ root)# Access: 2023-01-31 17:28:58.104917185 +0800# Modify: 2023-01-31 17:26:51.108767670 +0800# Change: 2023-01-31 17:26:51.108767670 +0800# Birth: -
4.完成证书更新后,此时我们需要重新生成新的K8S集群master节点所需的相关配置文件,例如 /etc/kubernetes
目录下的 admin.conf controller-manager.conf kubelet.conf scheduler.conf
相关文件。
# 初始化所需配置文件$ rm -rf etc/kubernetes/*.conf$ kubeadm init phase kubeconfig all --config=kubeadm-init-config.yaml# [kubeconfig] Using kubeconfig folder "/etc/kubernetes"# [kubeconfig] Writing "admin.conf" kubeconfig file# [kubeconfig] Writing "kubelet.conf" kubeconfig file# [kubeconfig] Writing "controller-manager.conf" kubeconfig file# [kubeconfig] Writing "scheduler.conf" kubeconfig file# 可以看到文件时间已经发生变化$ ls -alh etc/kubernetes/*.conf-rw------- 1 root root 5.6K Jan 31 21:53 etc/kubernetes/admin.conf-rw------- 1 root root 5.6K Jan 31 21:53 etc/kubernetes/controller-manager.conf-rw------- 1 root root 5.6K Jan 31 21:53 etc/kubernetes/kubelet.conf-rw------- 1 root root 5.5K Jan 31 21:53 etc/kubernetes/scheduler.conf# 为防止 kubelet 客户端证书轮换失败,我们需要将(此处坑有点大)kubelet-client-* 进行删除,在kubelet服务重启时又会自动生成# 如果此轮换过程失败,你可能会在 kube-apiserver 日志中看到诸如 x509: certificate has expired or is not yet valid 之类的错误# https://kubernetes.io/zh/docs/setup/production-environment/tools/kubeadm/troubleshooting-kubeadm/#kubelet-client-certrm -rf var/lib/kubelet/pki/kubelet-client-*
补充: 若要生成其他master节点的K8S配置文件请参考如下,例如,此处是 weiyigeek-108 控制平面节点的conf配置文件。
# $ mkdir -vp tmp/kubernetes/
# $ kubeadm init phase kubeconfig all --node-name weiyigeek-108 --kubeconfig-dir tmp/kubernetes/
5.按照提示将当前操作集群 master(weiyigeek-107) 节点上重启 kube-apiserver, kube-controller-manager, kube-scheduler 以及 etcd 等相关服务。
# 将新生成的集群连接配置文件覆盖到 ~/.kube/configmkdir -p $HOME/.kubeecho 'yes' | sudo cp -i etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config# 重启 kube-apiserver, kube-controller-manager, kube-scheduler 及 etcd 等相关服务(建议一个一个pod删除,启动后再进行下一步操作)kubectl -n kube-system delete pod -l 'component=kube-apiserver'kubectl -n kube-system delete pod -l 'component=kube-controller-manager'kubectl -n kube-system delete pod -l 'component=kube-scheduler'kubectl -n kube-system delete pod -l 'component=etcd'# 重启节点上 kubelet 服务systemctl restart kubelet.service# 查看 kubelet 服务状态systemctl status -l kubelet.service# 查看 kubelet 服务运行日志有无异常(有异常请依次解决)journalctl -xefu kubelet# 重启 kubelet 后 kubelet 客户端证书轮换自动生成新的 pem 证书ls -alh var/lib/kubelet/pki/kubelet-client-current.pem# lrwxrwxrwx 1 root root 59 Jan 31 22:29 var/lib/kubelet/pki/kubelet-client-current.pem -> var/lib/kubelet/pki/kubelet-client-2023-01-31-22-29-20.pem
6.此时在(weiyigeek-107) 节点上运行kubectl相关命令,则不会报证书到期了,我们可正常使用相关命令.
$ kubectl get nodes# NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME# weiyigeek-107 Ready control-plane,master 381d v1.23.1 192.168.12.107 <none> Ubuntu 20.04.1 LTS 5.4.0-137-generic containerd://1.4.12# weiyigeek-108 Ready control-plane,master 57m v1.23.1 192.168.12.108 <none> Ubuntu 20.04.3 LTS 5.4.0-137-generic containerd://1.4.12# weiyigeek-109 Ready control-plane,master 2m19s v1.23.1 192.168.12.109 <none> Ubuntu 20.04.3 LTS 5.4.0-137-generic containerd://1.4.12# weiyigeek-223 Ready work 380d v1.23.1 192.168.12.223 <none> Ubuntu 20.04.3 LTS 5.4.0-94-generic containerd://1.4.12# weiyigeek-224 Ready work 380d v1.23.1 192.168.12.224 <none> Ubuntu 20.04.3 LTS 5.4.0-42-generic containerd://1.4.12# weiyigeek-225 Ready work 381d v1.23.1 192.168.12.225 <none> Ubuntu 20.04.3 LTS 5.4.0-94-generic containerd://1.4.12# weiyigeek-226 Ready work 220d v1.23.1 192.168.12.226 <none> Ubuntu 20.04.3 LTS 5.4.0-80-generic containerd://1.4.12$ kubectl get pod -n kube-system | egrep "kube-apiserver|kube-controller-manager|kube-scheduler|etcd"# etcd-weiyigeek-107 1/1 Running 1 380d# etcd-weiyigeek-108 1/1 Running 0 96d# etcd-weiyigeek-109 1/1 Running 0 380d# kube-apiserver-weiyigeek-107 1/1 Running 0 380d# kube-apiserver-weiyigeek-108 1/1 Running 0 380d# kube-apiserver-weiyigeek-109 1/1 Running 0 380d# kube-controller-manager-weiyigeek-107 1/1 Running 2 (380d ago) 380d# kube-controller-manager-weiyigeek-108 1/1 Running 1 (15d ago) 380d# kube-controller-manager-weiyigeek-109 1/1 Running 1 (380d ago) 380d# kube-scheduler-weiyigeek-107 1/1 Running 3 (15d ago) 380d# kube-scheduler-weiyigeek-108 1/1 Running 2 (15d ago) 380d# kube-scheduler-weiyigeek-109 1/1 Running 1 (380d ago) 380d
7.查看集群的健康状态,即调度器、控制器以及etcd数据库是否正常。
$ kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
你是否认为,实践到这里就结束了,当然不是由于此处是高可用K8S集群,更新证书后坑,远不止于此。
有兴趣的,请继续看下节。
偷偷的告诉你哟?【极客全栈修炼】微信小程序已经上线了,
可直接在微信里面直接浏览博主博客了哟,后续将上线更多有趣的小工具。
4.集群证书更新后 calico && kube-proxy 相关操作
描述:在进行该K8S高可用集群安装时,参照了我博客中的《在Ubuntu安装部署K8S高可用集群使用初体验》文章,我将文章链接地址( https://blog.weiyigeek.top/2020/4-27-470.html#0x04-高可用集群使用初体验 ), 其中在使用calico网络插件时选择将 Calico 数据存储在 etcd datastore 之中,所以在进行calico安装或者使用时需要将数据存储到etcd中,则肯定需要链接到etcd数据中,此处 calico-etcd.yaml 配置清单中的etcd-ca、etcd-cert、etcd-key 字段的值仍然为原证书,由于我们前面已经更新了所有组件证书,所以再进行数据的CURD肯定是无法链接到etcd,则此时我们需要重新该 calico-etcd.yaml
资源配置文件。
在更新集群证书后发现的calico以及业务应用的异常情况,其中最明显的就是calic无法正常启动。
$ kubectl get pod -n kube-systemNAME READY STATUS RESTARTS AGEcalico-kube-controllers-6cf9b574f-zlrjn 0/1 Running 0 3scalico-node-5q8lq 0/1 CrashLoopBackOff 11 (6s ago) 25mcalico-node-62zd9 0/1 CrashLoopBackOff 9 (5m6s ago) 25mcalico-node-85b7k 0/1 Running 11 (66s ago) 25mcalico-node-8mt8q 0/1 Running 3 (66s ago) 4m43scalico-node-cdkf8 0/1 CrashLoopBackOff 9 (5m6s ago) 25mcalico-node-jgm6q 0/1 CrashLoopBackOff 9 (4m56s ago) 25mcalico-node-x2b9q 0/1 CrashLoopBackOff 9 (5m6s ago) 25mcoredns-65c54cc984-7vt8m 0/1 ContainerCreating 0 7m5scoredns-65c54cc984-hf774 0/1 ContainerCreating 0 25m
操作步骤:
步骤 01.此处参考上面的文章进行更改etcd数据库证书链接字段。
# (1) Install Calico with etcd datastore (使用etcd数据存储安装Calico)curl https://docs.projectcalico.org/manifests/calico-etcd.yaml -O# calico-etcd 网络与etc集群连接修改(此处指定pod子网地址)ETCD_CA=`cat etc/kubernetes/pki/etcd/ca.crt | base64 | tr -d '\n'`ETCD_CERT=`cat etc/kubernetes/pki/etcd/server.crt | base64 | tr -d '\n'`ETCD_KEY=`sudo cat etc/kubernetes/pki/etcd/server.key | base64 | tr -d '\n'`POD_SUBNET=`sudo cat etc/kubernetes/manifests/kube-controller-manager.yaml | grep cluster-cidr= | awk -F= '{print $NF}'`sed -i "s@# etcd-key: null@etcd-key: ${ETCD_KEY}@g; s@# etcd-cert: null@etcd-cert: ${ETCD_CERT}@g; s@# etcd-ca: null@etcd-ca: ${ETCD_CA}@g" calico-etcd.yamlsed -i 's#etcd_ca: ""#etcd_ca: "/calico-secrets/etcd-ca"#g; s#etcd_cert: ""#etcd_cert: "/calico-secrets/etcd-cert"#g; s#etcd_key: "" #etcd_key: "/calico-secrets/etcd-key" #g' calico-etcd.yamlsed -i 's#etcd_endpoints: "http://<ETCD_IP>:<ETCD_PORT>"#etcd_endpoints: "https://192.168.12.107:2379,https://192.168.12.108:2379,https://192.168.12.109:2379"#g' calico-etcd.yamlsed -i 's@# - name: CALICO_IPV4POOL_CIDR@- name: CALICO_IPV4POOL_CIDR@g; s@# value: "192.168.0.0/16"@ value: '"${POD_SUBNET}"'@g' calico-etcd.yaml# (2) 覆盖部署calico到K8S集群中kubectl apply -f calico-etcd.yaml# (3) 此处对比两个配置清单文件即可发现证书的不同。diff calico-etcd.yaml calico-etcd.yaml.bak17,18c17,18
步骤 02.在master节点(weiyigeek-107)节点上执行如下命令,重启 calico 以及各节点中的 kube-proxy Pod 容器。
# calico-node
kubectl delete pod -n kube-system -l k8s-app=calico-node
# kube-proxy
kubectl delete pod -n kube-system -l k8s-app=calico-node
步骤 03.等待一段时间后,验证 calico-node 、kube-proxy服务启动情况
$ kubectl get pod -n kube-system | egrep "calico|kube-proxy"calico-kube-controllers-6cf9b574f-42jnz 1/1 Running 0 98mcalico-node-dvvxk 1/1 Running 0 98mcalico-node-g9svc 1/1 Running 0 98mcalico-node-ggxqp 1/1 Running 0 98mcalico-node-jps97 1/1 Running 0 98mcalico-node-qf7cj 1/1 Running 0 92mcalico-node-vvw9f 1/1 Running 0 98mcalico-node-zvz8r 1/1 Running 0 98mkube-proxy-25p5s 1/1 Running 0 220dkube-proxy-8bl7f 1/1 Running 0 94mkube-proxy-8jxvr 1/1 Running 0 93mkube-proxy-d79mp 1/1 Running 0 381dkube-proxy-dtdtm 1/1 Running 0 108mkube-proxy-l7jxp 1/1 Running 0 93mkube-proxy-nlgln 1/1 Running 0 381d
步骤 04.访问通过nodePort暴露的业务系统验证是否可以从任意节点代理转发访问。
$ kubectl get svc,pod -n devops -l app=jenkins# NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE# service/jenkins NodePort 10.109.163.223 <none> 8080:30001/TCP,50000:30634/TCP 380d# NAME READY STATUS RESTARTS AGE# pod/jenkins-7fc6f4fcf6-glqxj 1/1 Running 0 118m$ curl -sI 10.109.163.223:8080 | head -n 1HTTP/1.1 200 OK$ curl -s 10.109.163.223:8080 | grep -oP "<title>\S.+</title>"$ curl -s 192.168.12.107:30001 | grep -oP "<title>\S.+</title>"<title>Dashboard [Jenkins]</title>
好的,此问题也已经解决了。
下面继续来看,如何在更新证书后删除或者新增
控制面板节点以及工作节点。
5.控制平面(Master)节点的移除&&添加
描述: 有时在更新集群证书后,某一个控制平面的节点可能一致不正常,我们也找不到解决办法时,最终解决方案就是重置该master节点并重新加入到K8S集群中。
操作流程
步骤 01.将需要master节点,例如此处(weiyigeek-108)节点设置不可调度,温馨提示操作前请注意备份该节点上的相关数据。
# 在 weiyigeek-107 节点上执行如下命令,设置节点设置不可调度,此时节点状态变成:Ready,SchedulingDisabledkubectl drain weiyigeek-108 --delete-local-data --delete-emptydir-data --force --ignore-daemonsets# node/weiyigeek-108 cordoned# WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-qf7cj, kube-system/kube-proxy-8jxvr# node/weiyigeek-108 drainedkubectl get node weiyigeek-108NAME STATUS ROLES AGE VERSIONweiyigeek-108 Ready,SchedulingDisabled control-plane,master 3h50m v1.23.1# 从集群中移除 weiyigeek-108 节点kubectl delete node weiyigeek-109
步骤 02.生成master节点以及工作节点加入到K8S集群的认证Token及其命令示例(值得学习借鉴)。
# (1) 查看token是否失效默认是24H,如果失效过期可以重新进行生成token并打印加入命令kubeadm token listkubeadm token create --print-join-command# kubeadm join slb-vip.k8s:16443 --token vkhqa1.t3gtrbowlalt8um5 --discovery-token-ca-cert-hash sha256:bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59 # 注意,配置清单中会使用到# (2)从节点的 kubeadm 加入到k8s集群之中 (推荐),使用此命令调用init工作流的单个阶段kubeadm init phase upload-certs --upload-certs# [upload-certs] Using certificate key:# c6a084cb06aaae2f4581145dbbe6057ce111c88fdac4ff4405a0a2db58882d76 # 注意,配置清单中会使用到# 3) 获取CA(证书)公钥哈希值openssl x509 -pubkey -in etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^ .* '# (stdin)= bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59# 此处是公钥哈希值(一台机器上ca证书不变就一直是该sha256的值)# (4) 进行收到组合加入集群的 master 节点的 join 命令如下:kubeadm join slb-vip.k8s:16443 --token ejwx62.vqwog6il5p83uk7y \--discovery-token-ca-cert-hash sha256:bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59 \--control-plane --certificate-key c6a084cb06aaae2f4581145dbbe6057ce111c88fdac4ff4405a0a2db58882d76
步骤 03.通过ssh远程登录weiyigeek-108
节点重置该节点。
systemctl stop kubeletecho y | kubeadm resetsudo rm -rf $HOME/.kubesudo rm -rf var/lib/cni/ etc/cni/ var/lib/kubelet/*ipvsadm --cleariptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -Xsystemctl start kubelet; systemctl status kubelet
温馨提示: 如果主Master节点在初始化时候出错需要重新配置时候,请执行以上述命令进行重置;
步骤 04.准备控制平面节点加入到集群的JoinConfiguration资源配置清单
$ vim join-k8s.yamlapiVersion: kubeadm.k8s.io/v1beta3caCertPath: /etc/kubernetes/pki/ca.crtdiscovery:bootstrapToken:apiServerEndpoint: slb-vip.k8s:16443 # 高可用的APIServer地址token: vkhqa1.t3gtrbowlalt8um5 # 上述步骤生成TokencaCertHashes:- "sha256:bfc86e13da79a1ec5f53cef99661e4e3f51adda59c525cb9377cfe59956b1e59" # 上述步骤获取到的CA(证书)公钥哈希值timeout: 5m0skind: JoinConfigurationcontrolPlane:certificateKey: "c6a084cb06aaae2f4581145dbbe6057ce111c88fdac4ff4405a0a2db58882d76" # 上述步骤获取到 certificate keylocalAPIEndpoint:advertiseAddress: 192.168.12.108 # 本地APIServer节点地址(即weiyigeek-108节点机器地址)bindPort: 6443 # 本地APIServer节点端口nodeRegistration:criSocket: /run/containerd/containerd.sock # 重点,在1.24.x之前默认是使用docker-shim,此处我们指定使用containerdimagePullPolicy: IfNotPresentname: weiyigeek-108 # 重点,节点名称taints:- effect: NoSchedulekey: node-role.kubernetes.io/master
温馨提示: 在 bootstrapToken 字段中也可跳过 caCertHashes 认证,请键值为unsafeSkipCAVerification: true
。
步骤 05.证书更新后原有的master节点移除加入前,需要针对久的etcd数据库进行处理操作。
此处有个小插曲,由于各master节点组成了一个高可用集群,每个master节点上都运行了etcd服务,以实现etcd数据库的高可用,有
Kubernetes 与 Etcd 版本与证书关联说明:
ETCD 版本小于等于v1.9版本,etcd默认是不使用TLS连接,没有etcd相关证书,只需要更新master证书即可。
ETCD 版本大于等于v1.10版本,etcd默认开启TLS,需要更新etcd证书和master证书。
此处我们的etcd版本是v3.5.x是开启TLS的,前面我们重启在master节点的etcd的相关pod后将会自动使用最新的证书,而在将master节点移除重新加入需进行如下操作,否则将会报错(查看尾部的错误实例):
kubectl exec -n kube-system -it etcd-weiyigeek-107 bin/sh# etcd 集群成员列表$ etcdctl --endpoints 127.0.0.1:2379 --cacert etc/kubernetes/pki/etcd/ca.crt --cert etc/kubernetes/pki/etcd/server.crt --key etc/kubernetes/pki/etcd/server.key member list# 2db31a5d67ec1034, started, weiyigeek-108, https://192.168.12.108:2380, https://192.168.12.108:2379, false# 42efe7cca897d765, started, weiyigeek-109, https://192.168.12.109:2380, https://192.168.12.109:2379, false# 471323846709334f, started, weiyigeek-107, https://192.168.12.107:2380, https://192.168.12.107:2379, false# etcd 集群节点数据状态$ etcdctl --endpoints https://192.168.12.107:2379,https://192.168.12.108:2379,https://192.168.12.109:2379 --cacert etc/kubernetes/pki/etcd/ca.crt --cert etc/kubernetes/pki/etcd/server.crt --key etc/kubernetes/pki/etcd/server.key endpoint status# https://192.168.12.107:2379, 471323846709334f, 3.5.1, 324 MB, true, false, 8, 109396908, 109396908,# https://192.168.12.108:2379, 2db31a5d67ec1034, 3.5.1, 324 MB, false, false, 8, 109396916, 109396916,# https://192.168.12.109:2379, 42efe7cca897d765, 3.5.1, 324 MB, false, false, 8, 109396922, 109396922,# 移除 2db31a5d67ec1034 成员(weiyigeek-108 旧节点)$ etcdctl --endpoints 127.0.0.1:2379 --cacert etc/kubernetes/pki/etcd/ca.crt --cert etc/kubernetes/pki/etcd/server.crt --key etc/kubernetes/pki/etcd/server.key member remove 7e66da9dd902d557Member 2db31a5d67ec1034 removed from cluster 3a7f4f11f646b97b
步骤 06.在 weiyigeek-108 节点上执行加入集群命令, 若加入成果可以通过kubectl get nods
命令查看
# 停止该节点上所有 Pod 防止端口占用crictl stop $(crictl ps -a -q)# 按照 JoinConfiguration 资源配置清单加入到集群中,--v=5显示更完整的操作过程日志,排错必备。kubeadm join --config=join-k8s.yaml --v=5# 若显示如下提示则加入成功,否则请排查异常情况。# This node has joined the cluster and a new control plane instance was created# To start administering your cluster from this node, you need to run the following as a regular user:mkdir -p $HOME/.kubesudo cp -i etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config$ kubectl get node weiyigeek-108# NAME STATUS ROLES AGE VERSION# weiyigeek-108 Ready control-plane,master 3m9s v1.23.1
温馨提示: 如果加入主master节点时一直停留在 pre-flight 状态,请在第额外的几个节点上执行命令检查:curl -ik https://设置APISERVER地址:6443/version
$ curl -ik https://slb-vip.k8s:16443/version# 正常状态下的输出HTTP/2 200audit-id: 77c614cb-0c27-42f5-a852-b5ef8415361fcache-control: no-cache, privatecontent-type: application/jsonx-kubernetes-pf-flowschema-uid: 41a01a35-c480-4cfd-8854-494261622406x-kubernetes-pf-prioritylevel-uid: 4cfd380c-d39c-490f-96b0-dd4ed07be4e0content-length: 263date: Wed, 01 Feb 2023 07:34:01 GMT{"major": "1","minor": "23","gitVersion": "v1.23.0","gitCommit": "ab69524f795c42094a6630298ff53f3c3ebab7f4","gitTreeState": "clean","buildDate": "2021-12-07T18:09:57Z","goVersion": "go1.17.3","compiler": "gc","platform": "linux/amd64"}
至此,实践完毕!
6.工作平面(Work)节点的移除&&添加
描述: k8s集群中工作节点的添加与移除,和master节点移除添加方法基本一致,不同之处在于加入集群的配置清单,你可以对照一下master节点与node节点加入集群时的配置清单。
此处,也不在累述直接快速上脚本。
步骤 01.在master节点上执行,将设weiyigeek-226节点置不可调度,即不分配新的资源到该节点上,并且驱逐pod到其他工作节点上。
# 温馨提示: drain命令会自动把node设置为不可调度,所以可以省略上面执行的cordon命令
kubectl cordon weiyigeek-226
kubectl drain weiyigeek-226 --delete-emptydir-data --force --ignore-daemonsets
步骤 02.在weiyigeek-226工作节点上执行重置节点命令
systemctl stop kubeletecho y | kubeadm resetsudo rm -rf $HOME/.kube; sudo rm -rf var/lib/cni/ etc/cni/ var/lib/kubelet/*ipvsadm --clear; iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -Xsystemctl start kubelet; systemctl status kubelet
步骤 03.两种方式将工作节点加入K8S集群中。
# 1.命令行模式kubeadm join 192.168.80.137:6443 --token 新生成的Token填写此处 --discovery-token-ca-cert-hash sha256:获取的公钥哈希值填写此处# 2.配置清单模式tee > join-k8s.yaml <<EOFcat join-k8s.yamlapiVersion: kubeadm.k8s.io/v1beta3caCertPath: /etc/kubernetes/pki/ca.crtdiscovery:bootstrapToken:apiServerEndpoint: slb-vip.k8s:16443token: vkhqa1.t3gtrbowlalt8um5unsafeSkipCAVerification: truetimeout: 5m0stlsBootstrapToken: vkhqa1.t3gtrbowlalt8um5kind: JoinConfigurationnodeRegistration:criSocket: /run/containerd/containerd.sockimagePullPolicy: IfNotPresentname: weiyigeek-226taints: nullEOFkubeadm join --config=join-k8s.yaml --v=5
步骤 04.在master节点查看加入的工作的节点,此处设置其work标签。
kubectl label nodes weiyigeek-226 node-role.kubernetes.io/work=testkubectl get nodes weiyigeek-226# NAME STATUS ROLES AGE VERSION# weiyigeek-226 Ready work 10m v1.23.1
至此,完毕!
n.实践所遇问题
问题1.在master节点上或者非master节点上执行kubectl命令报 The connection to the server localhost:8080 was refused -
错误
错误信息:
$ kubectl get cs
The connection to the server localhost:8080 was refused - did you specify the right host or port?
错误原因: 通常情况下是当前用户下没有~/.kube/config
或者不存在环境变量
解决办法:
# 方式1.复制 etc/kubernetes 目录下 admin.conf 文件到 当前用户家目录下 .kube/configmkdir -p $HOME/.kubeecho 'yes' | sudo cp -i etc/kubernetes/admin.conf $HOME/.kube/configsudo chown $(id -u):$(id -g) $HOME/.kube/config# 方式2.使用 KUBECONFIG 环境变量包含一个 kubeconfig 文件列表。export KUBECONFIG=/etc/kubernetes/admin.conf:~/.kube/devops.kubeconfig# 方式3.在命令执行时使用--kubeconfig参数指定配置文件kubectl config --kubeconfig=/etc/kubernetes/admin.conf
问题2.在某master节点上calico-node准备状态一直为0/1并提示 connect: connection refusedcalico/node is not ready:
错误
错误信息:
$ kubectl get pod -n kube-system calico-node-v52sv# NAME READY STATUS RESTARTS AGE# calico-node-v52sv 0/1 Running 0 31m$ kubectl describe pod -n kube-system calico-node-v52sv | grep "not ready"# Warning Unhealthy 33m (x2 over 33m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused# calico/node is not ready: BIRD is not ready: BGP not established with 192.168.12.107,192.168.12.109,192.168.12.223,192.168.12.224,192.168.12.225,192.168.12.226$ kubectl logs -f --tail 50 -n kube-system calico-node-v52sv | grep "interface"# 2023-02-01 08:40:57.583 [INFO][69] monitor-addresses/startup.go 714: Using autodetected IPv4 address on interface br-b92e9270f33c: 172.22.0.1/16# calico 对应的 Pod 启动失败,报错:# Number of node(s) with BGP peering established = 0
错误原因: 由于该节点上安装了docker并创建了容器,Calico 选择了有问题的br网卡,导致 calico-node 的 Pod 不能启动。
# Calico 提供了 IP 自动检测的方法,默认是使用第一个有效网卡上的第一个有效的 IP 地址:
IP_AUTODETECTION_METHOD=first-found
# 节点上应该是出现了有问题的网卡,可以使用以下命令查看:
ip link | grep br
知识扩展: calico-node daemonset 默认的策略是获取第一个取到的网卡的 ip 作为 calico node 的ip, 由于集群中网卡名称不统一所以可能导致calico获取的网卡IP不对, 所以出现此种情况下就只能 IP_AUTODETECTION_METHOD 字段指定通配符网卡名称或者IP地址。
解决办法:
# 方法1.修改 yaml 配置清单中 IP 自动检测方法,在 spec.containers.env 下添加以下两行。(推荐)- name: IP_AUTODETECTION_METHODvalue: "interface=ens.*" # ens 根据实际网卡开头配置# 方法2.删除有问题的网卡(推荐),即指定网卡名称(br 开头的问题网卡)删除。ifconfig br-b92e9270f33c down# 方法3.假如环境不依赖docker情况下,可以卸载docker, 然后重启系统即可。sudo apt-get autoremove docker docker-ce docker-engine docker.io
最后重新启动异常Pod即可:
kubectl delete pod -n kube-system calico-node-v52sv
kubectl get nodes weiyigeek-108
# NAME STATUS ROLES AGE VERSION
# weiyigeek-108 Ready control-plane,master 66m v1.23.1
问题3.在节点加入集群时报 bridge-nf-call-iptables does not exist
错误问题解决
错误信息:
[preflight] Some fatal errors occurred:[ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`error execution phase preflight
解决办法:
# 配置
$ cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
$ cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
$ modprobe br_netfilter && sudo sysctl --system
问题4.在启动某个节点的kubelet报Unable to read config path" err="path does not exist, ignoring
错误解决办法
错误信息:
Jan 16 14:27:25 weiyigeek-226 kubelet[882231]: E0116 14:27:25.496423 882231 kubelet.go:2347] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns e>Jan 16 14:27:26 weiyigeek-226 kubelet[882231]: E0116 14:27:26.482369 882231 file_linux.go:61] "Unable to read config path" err="path does not exist, ignoring" path="/etc/kubernetes/manifests"
解决办法: 检查 /etc/kubernetes/manifests 目录是否存储及其权限
mkdir -vp /etc/kubernetes/manifests
问题5.在节点加入到集群中时报the namespace "kube-system" error downloading the secret
错误解决办法
错误信息:
I0116 04:39:41.428788 184219 checks.go:246] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Would pull the required images (like 'kubeadm config images pull')
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
secrets "kubeadm-certs" is forbidden: User "system:bootstrap:20w21w" cannot get resource "secrets" in API group "" in the namespace "kube-system"
error downloading the secret
解决办法: 将证书上载到kubeadm证书。
kubeadm init phase upload-certs --upload-certs
# [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
# [upload-certs] Using certificate key:
# 3a3d7610038c9d14edf377d92b9c6b44e049566ddd25b0e69bf571af58227ae7
本文至此完毕,更多技术文章,尽情等待下篇好文!
文章书写不易,如果您觉得此篇文章还不错的朋友,请给这篇专栏 【点个赞、投个币、收个藏、关个注、转个发、留个言、赞个助】,这将对我的肯定,我将持续发布更多优质文章,谢谢!
原文地址: https://blog.weiyigeek.top/2022/12-28-691.html
学习书籍推荐 往期发布文章 
网站首页被篡改? 看我使用PhantomJS利器实现网站自动监控修改并截图发送企业微信预警
运维实践 | 如何优雅将K8S资源清单中的元数据metadata,通过环境变量注入到Pod容器?
运维实践 | 使用K3S之快速搭建精简版本K8S集群环境,助力开发测试环境!
大神之路-起始篇 | 第1章.计算机科学导论之【基础绪论】学习笔记
更多企业运维实践、网络安全、系统运维、应用开发、物联网实战、全栈文章,尽在【 https://blog.weiyigeek.top 】站点,谢谢支持!
亲,文章都看完了,不关注一下吗?
点击 👆 卡片或者长按二维码
即可关注我哟!


温馨提示: 由于作者水平有限,本章错漏缺点在所难免,希望读者批评指正,并请在文章末尾留下您宝贵的经验知识,联系邮箱地址 master@weiyigeek.top 或者关注公众号 WeiyiGeek 联系我。
点击【"阅读原文"】获取更多有趣的知识!




