执行查看node资源命令报错:
kubectl top node
error: metrics not available yet
开始解决问题前,我们先熟悉下metrics server 相关知识
Metrics Server 是 Kubernetes 集群核心监控数据的聚合器,Metrics Server 从 Kubelet 收集资源指标,并通过 Merics API 在 Kubernetes APIServer 中提供给缩放资源对象 HPA 使用。也可以通过 Metrics API 提供的 Kubectl top 查看 Pod 资源占用情况,从而实现对资源的自动缩放。
Metrics Server 是 Kubernetes 监控组件中的重要一部分,Metrics Server 主要分为 API 和 Server 两大部分。其中 Metrics API 部分主要通过 APIServer 对外暴露 Pod 资源使用情况,比如:HPA、kubectl top、Kubernetes dashboard 等。Metrics Server 是根据 Kubernetes 监控架构进行实施,该组件会定期通过 Summary API 从 Kubelet 所在集群节点获取服务指标,然后将指标汇总、存储到内存中,仅仅存储指标最新状态,一旦重启组件数据将会丢失。现在通过 Metrics Server 采集到了数据,也暴露了 API 那么通过 kube-aggregator 统一把 API Server 数据转发给 Metrics Server,最后通过 metrics api 统一暴露出去。
kubectl top 执行请求流程图

流程描述
1、kubectl top node 执行后向kube-apiserver /apis/metrics.k8s.io/地址发起请求,kube-apiserver 对发出请求的用户身份认证,并对请求的 API 路径执行鉴权。
2、kube-apiserver 将打到apiserver的/apis/metrics.k8s.io的请求转发给metrics api 这个扩展API,这里用到kube-aggregator,它是对apiserver 的有力扩展,它允许k8s的开发人员编写一个自己的服务,并把这个服务注册到k8s的api里面,即扩展API。
3、Metrics api 执行请求,转发到定义的service:metrics-server。
4、service 请求转发到关联的endpoint,最终到达pod。
5、pod 最终从kubelet Summary API获取。
流程1 排查
kubectl top 执行详细日志
# kubectl --v=8 top node
I1012 10:11:38.322658 39273 loader.go:359] Config loaded from file: /root/.kube/config
I1012 10:11:38.323804 39273 round_trippers.go:416] GET https://192.168.1.3:6443/api?timeout=32s
I1012 10:11:38.323815 39273 round_trippers.go:423] Request Headers:
I1012 10:11:38.323821 39273 round_trippers.go:426] Accept: application/json, */*
I1012 10:11:38.323827 39273 round_trippers.go:426] User-Agent: kubectl/v1.15.2 (linux/amd64) kubernetes/f627830
I1012 10:11:38.333052 39273 round_trippers.go:441] Response Status: 200 OK in 9 milliseconds
I1012 10:11:38.333063 39273 round_trippers.go:444] Response Headers:
I1012 10:11:38.333068 39273 round_trippers.go:447] Content-Type: application/json
I1012 10:11:38.333091 39273 round_trippers.go:447] Content-Length: 137
I1012 10:11:38.333095 39273 round_trippers.go:447] Date: Tue, 12 Oct 2021 02:11:38 GMT
I1012 10:11:38.333125 39273 request.go:947] Response Body: {"kind":"APIVersions","versions":["v1"],"serverAddressByClientCIDRs":[{"clientCIDR":"0.0.0.0/0","serverAddress":"172.18.213.151:6443"}]}
I1012 10:11:38.333291 39273 round_trippers.go:416] GET https://192.168.1.3:6443/apis?timeout=32s
I1012 10:11:38.333299 39273 round_trippers.go:423] Request Headers:
I1012 10:11:38.333304 39273 round_trippers.go:426] Accept: application/json, */*
I1012 10:11:38.333310 39273 round_trippers.go:426] User-Agent: kubectl/v1.15.2 (linux/amd64) kubernetes/f627830
I1012 10:11:38.334485 39273 round_trippers.go:441] Response Status: 200 OK in 1 milliseconds
I1012 10:11:38.334499 39273 round_trippers.go:444] Response Headers:
I1012 10:11:38.334504 39273 round_trippers.go:447] Content-Type: application/json
I1012 10:11:38.334509 39273 round_trippers.go:447] Date: Tue, 12 Oct 2021 02:11:38 GMT
I1012 10:11:38.334623 39273 request.go:947] Response Body: {"kind":"APIGroupList","apiVersion":"v1","groups":[{"name":"apiregistration.k8s.io","versions":[{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"},{"groupVersion":"apiregistration.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apiregistration.k8s.io/v1","version":"v1"}},{"name":"extensions","versions":[{"groupVersion":"extensions/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"extensions/v1beta1","version":"v1beta1"}},{"name":"apps","versions":[{"groupVersion":"apps/v1","version":"v1"},{"groupVersion":"apps/v1beta2","version":"v1beta2"},{"groupVersion":"apps/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"apps/v1","version":"v1"}},{"name":"events.k8s.io","versions":[{"groupVersion":"events.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"events.k8s.io/v1beta1","version":"v1beta1"}},{"name":"authentication.k8s.io","versions":[{"groupVersion":"authentication.k8s.io/v1","version":"v1"},{"groupVersion":"authenticati [truncated 4396 chars]
I1012 10:11:38.334925 39273 round_trippers.go:416] GET https://192.168.1.3:6443/apis/metrics.k8s.io/v1beta1/nodes
I1012 10:11:38.334933 39273 round_trippers.go:423] Request Headers:
I1012 10:11:38.334939 39273 round_trippers.go:426] User-Agent: kubectl/v1.15.2 (linux/amd64) kubernetes/f627830
I1012 10:11:38.334944 39273 round_trippers.go:426] Accept: application/json, */*
I1012 10:11:38.353651 39273 round_trippers.go:441] Response Status: 200 OK in 18 milliseconds
I1012 10:11:38.353659 39273 round_trippers.go:444] Response Headers:
I1012 10:11:38.353666 39273 round_trippers.go:447] Content-Type: application/json
I1012 10:11:38.353671 39273 round_trippers.go:447] Date: Tue, 12 Oct 2021 02:11:38 GMT
I1012 10:11:38.353675 39273 round_trippers.go:447] Content-Length: 137
I1012 10:11:38.353688 39273 request.go:947] Response Body: {"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/metrics.k8s.io/v1beta1/nodes"},"items":[]}
F1012 10:11:38.354504 39273 helpers.go:114] error: metrics not available yet
日志可以看到请求已经发送到apiserver,Response 状态码也是:200。没有其他有用信息,继续排查。
流程2 排查
查看 kube-apiserver 是否开启kube-aggregator
metrics-server 暴露出来的 metrics API,使用kube-aggregator 将 apiserver 的请求转发给 metrics-server ,apiserver 配置参数如下:
--proxy-client-cert-file=/etc/kubernetes/certs/proxy.crt
--proxy-client-key-file=/etc/kubernetes/certs/proxy.key
--requestheader-client-ca-file=/etc/kubernetes/certs/proxy-ca.crt
--requestheader-allowed-names=aggregator
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
如果kube-proxy没有在Master上面运行,还需要配置
--enable-aggregator-routing=true
对比kube-apiserver 启动参数,聚合层配置无误。
流程3 排查
排查Kubernetes apiserver 将请求发送到扩展 apiserver metrics API 调用配置是否设置正确。
查看apiservices v1beta1.metrics.k8s.io
# kubectl describe apiservices v1beta1.metrics.k8s.io
Name: v1beta1.metrics.k8s.io
Namespace:
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"apiregistration.k8s.io/v1","kind":"APIService","metadata":{"annotations":{},"name":"v1beta1.metrics.k8s.io"},"spec":{"group...
API Version: apiregistration.k8s.io/v1
Kind: APIService
Metadata:
Creation Timestamp: 2019-11-26T07:32:03Z
Resource Version: 202235905
Self Link: /apis/apiregistration.k8s.io/v1/apiservices/v1beta1.metrics.k8s.io
UID: d5e26540-8f0b-4823-83d1-f92c2fb46e23
Spec:
Group: metrics.k8s.io
Group Priority Minimum: 100
Insecure Skip TLS Verify: true
Service:
Name: prometheus-adapter
Namespace: monitoring
Port: 443
Version: v1beta1
Version Priority: 100
Status:
Conditions:
Last Transition Time: 2021-10-09T07:01:36Z
Message: all checks passed
Reason: Passed
Status: True
Type: Available
Events: <none>
这里发现v1beta1.metrics.k8s.io API 绑定至一个名为prometheus-adapter,而不是metrics-server。到这里我们的问题就找到。
修改v1beta1.metrics.k8s.io APIService配置:
Service:
Name: prometheus-adapter
Namespace: monitoring
Port: 443
修改为
Service:
Name: metrics-server
Namespace: kube-system
Port: 443
再次执行kubectl top ,执行结果如下:
# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
192.168.213.133 2208m 13% 56386Mi 88%
192.168.213.134 989m 6% 38148Mi 59%
192.168.213.137 837m 5% 39297Mi 61%
192.168.213.140 5867m 37% 46577Mi 73%
192.168.213.151 265m 3% 7596Mi 24%