Kubernetes1.16集群中部署指标服务遇见的坑


在 Kubernetes1.16集群中,部署metrics-server-0.3.6,发现一直没有数据出现,不出现cpu、内存等利用率核心指标。

1.安装metric-server

a.获取部署文件

https://github.com/kubernetes-incubator/metrics-server/
wget && git clone 都可以

b.应用部署文件
cd metrics-server-0.3.6/deploy
#我这里集群是kubernetes1.16 大于1.8所以要应用1.8+文件夹
kubectl apply -f 1.8+/

2.遇见的问题

使用 kubectl top nodes,返回的永远都是 error: metrics not available yet

通过 kubectl logs metricxxx -n kube-system查看日志

a.坑1

unable to fetch metrics from Kubelet node21 (node21): Get https://node21:10250....

上面这个原因是因为集群使用kubeadm,节点使用自签发的ssl证书,没有使用TLS Bootstrap。所以在所有节点的kubelet配置文件中增加:

systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2019-11-04 15:34:23 CST; 6s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 10395 (kubelet)
    Tasks: 15
   Memory: 17.6M
      CPU: 416ms
   CGroup: /system.slice/kubelet.service
           └─10395 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml 省略

得到config路径
/var/lib/kubelet/config.yaml
增加
serverTLSBootstrap: true
root@node21:/var/lib/kubelet# systemctl daemon-reload
root@node21:/var/lib/kubelet# systemctl restart kubelet.servic
即可!

重启 Kubelet,会发现出现了新的 CSR:

kubectl get csr
xxx

然后使用
kubectl certificate approve xxx 
接收证书签发请求,将多个计算节点证书重签发一次。即可!

b.坑2

修复好上面这个bug后,数据一致还是没出现,日志发现有如下提示:

E1104 08:05:44.8 reststorage.go:135
unable to fetch node metrics for node "node21": no metrics known for node

通过issues发现需要在metric-server执行参数中增加:

      containers:
        - name: metrics-server
          image: k8s.gcr.io/metrics-server-amd64:v0.3.6
          imagePullPolicy: Always
          args:  #增加
            - --kubelet-insecure-tls  #增加
            - --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname #增加

重写应用yaml,生效后几分钟可以访问得到数据:

kubectl top node
NAME     CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
node20   355m         17%    1300Mi          68%
node21   207m         5%     1562Mi          19%
node22   512m         12%    2521Mi          31%
node23   449m         11%    1631Mi          23%
查看pod资源情况
kubectl top pod -n kube-system
NAME                                       CPU(cores)   MEMORY(bytes)
calico-kube-controllers-6d85fdfbd8-8vsm7   3m           19Mi
calico-node-47h4w                          36m          43Mi
calico-node-g4278                          34m          43Mi
calico-node-mmh7k                          37m          67Mi
calico-node-tnfkb                          35m          42Mi
coredns-5644d7b6d9-sbxx7                   4m           18Mi
coredns-5644d7b6d9-tfnj8                   5m           21Mi
etcd-node20                                24m          82Mi
kube-apiserver-node20                      93m          585Mi
kube-controller-manager-node20             18m          69Mi
kube-proxy-49cv9                           1m           21Mi
kube-proxy-9zb5z                           1m           21Mi
kube-proxy-t266f                           1m           23Mi
kube-proxy-x5pvt                           2m           21Mi
kube-scheduler-node20                      2m           27Mi
metrics-server-8779b8f8b-qnqj6             3m           18Mi
tiller-deploy-77855d9dcf-c775s             1m           17Mi

用到的一些命令:

kubectl top nodes
kubectl top pod
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/pods"
kubectl get --raw "/apis/metrics.k8s.io/v1beta1/nodes"