Lab 12.3 Cannot make metrics-server work.

zhangwe · July 2020

I used the step 1 to download the package. The step 2 did not work, because the package changed. I used the way I got from the web site to install metrics-server:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
When I taped the "student@master:~$ kubectl top node", I got " error: metrics not available yet".
I checked and found "metrics-server-5f956b6d5f-2lbzv 1/1 Running 0 3m58s".
I deleted the deployment. svc, and sa of the "metrics-serve", and used:
sudo kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
did it again. I got the same error.
What should I do to fix this problem?

Thanks,

Wei

chrispokorni · July 2020

Hi Wei,

Typically, the metrics-server needs a good few minutes to start collecting metrics from your cluster. If you issue the kubectl top command too soon, you will see the metrics not available yet message.

Regards,
-Chris

zhangwe · August 2020

Hi Chris,

Thank you very much for your response.
But it still did not work.

Wei

serewicz · August 2020

Hello,

I have just run the lab and it worked. Here is what I did:
git clone https://github.com/kubernetes-incubator/metrics-server.git
less README.md
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml
kubectl -n kube-system edit deployments.apps metrics-server
....
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls #<<---------------Added this insecure TLS line
image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
....
sleep 120 ; kubectl -n kube-system top pod
NAME CPU(cores) MEMORY(bytes)
calico-kube-controllers-578894d4cd-48hrz 1m 6Mi
calico-node-cx7gb 25m 25Mi
calico-node-hxfwn 30m 24Mi
coredns-66bff467f8-rqhvr 3m 6Mi
coredns-66bff467f8-sbnn4 3m 6Mi
etcd-master 19m 42Mi
....

It works.

Regards,

zhangwe · August 2020

It works. Thank you very much for your help.

serewicz · August 2020

Great! I've added it to the update and it should be part of the 1.19 release.

zhangwe · August 2020

The metrics-server just monitor the worker node and the pods in the worker node, not the master nod snd the pods in the master node.
student@master:~$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
worker 88m 4% 839Mi 11%
master

serewicz · August 2020

Hello,
Other than waiting for a bit to see if it just needs to update the other node, there are a few things that could lead to this. What are you using for your lab environment, platform/OS/K8s version?

Please show the output of kubectl get node

Is the master node still tainted?

Regards,

mpatters72 · August 2020

I'm unclear on "allowed" websites during exam "github.com/kubernetes-sigs" doesn't appear to be listed here: https://www.cncf.io/certification/cka/faq/ but seems to be needed to install metrics-server.

"in order to access assets at https://kubernetes.io/docs/ and its subdomains, https://github.com/kubernetes/ and its subdomains, or https://kubernetes.io/blog/ . No other tabs may be opened and no other sites may be navigated to."

@serewicz can you clarify please?

serewicz · August 2020

Hello,

There is only so much I can say about an exam due to the confidentiality agreement we all accept in order to take the exam. I can't talk directly about what may or may not be on the exam, other than suggest the Candidate Handbook https://training.linuxfoundation.org/go/cka-ckad-candidate-handbook or the general certification email address.

The certification team are top notch. I'm sure that they know what is available and not as they design the questions.

Regards,

fcioanca · August 2020

Please email certificationsupport@linuxfoundation.org for any exam-related question if the Candidate Handbook does not provide the answers you are looking for.

zhangwe · August 2020

For the metrics-server just monitor the worker node and the pods in the worker node, I restart the whole VMs. and Now it just monitor the master node and the pods in the master node, not monitor the worker node and the pods in the worker node.
I used the ubuntu-1804-bionic-v20200701 two GCE VMs.
The other info:
student@master:~$ kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master 198m 9% 1225Mi 16%
worker

student@master:~$ kubectl top pod --all-namespaces
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system calico-kube-controllers-578894d4cd-pq67q 2m 8Mi
kube-system calico-node-hxrhx 23m 64Mi
kube-system coredns-66bff467f8-2mrnn 3m 6Mi
kube-system coredns-66bff467f8-m7ls6 3m 6Mi
kube-system etcd-master 19m 47Mi
kube-system kube-apiserver-master 37m 286Mi
kube-system kube-controller-manager-master 12m 38Mi
kube-system kube-proxy-mzj9b 1m 13Mi
kube-system kube-scheduler-master 4m 11Mi
kube-system metrics-server-6c7dddc9f8-7w8w2 1m 10Mi
student@master:~$

student@master:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:38:50Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:30:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

chrispokorni · August 2020

Hi @zhangwe,

Could you provide the output of kubectl get ds,po -o wide --all-namespaces ?

Regards,
-Chris

zhangwe · August 2020

student@master:~$ kubectl get node --show-labels
NAME STATUS ROLES AGE VERSION LABELS
master Ready master 16d v1.18.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=master,kubernetes.io/os=linux,node-role.kubernetes.io/master=
worker Ready 16d v1.18.1 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=worker,kubernetes.io/os=linux

serewicz · August 2020

Hello,

It looks like the versions are okay. Could you verify you have an allow all traffic rule in the GCE VPC firewall?

As Chris mentioned it would also be good to see the output of kubectl get ds,po -o wide --all-namespaces

Regards,

zhangwe · August 2020

student@master:~$ kubectl get ds,po -o wide --all-namespaces
NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/calico-node 2 2 2 2 2 kubernetes.io/os=linux 16d calico-node calico/node:v3.15.1 k8s-app=calico-node
kube-system daemonset.apps/kube-proxy 2 2 2 2 2 kubernetes.io/os=linux 16d kube-proxy k8s.gcr.io/kube-proxy:v1.18.1 k8s-app=kube-proxy

NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/hello-1597090080-8kpd5 0/1 Completed 0 2m17s 192.168.171.107 worker
default pod/hello-1597090140-7wvc6 0/1 Completed 0 76s 192.168.171.78 worker
default pod/hello-1597090200-w8bz8 0/1 Completed 0 16s 192.168.171.95 worker
default pod/mypod 0/1 ContainerCreating 0 41m worker
default pod/nginx 0/1 Pending 0 30m
default pod/nginx-pod 0/1 Completed 0 10m 192.168.171.79 worker
kube-system pod/calico-kube-controllers-578894d4cd-pq67q 1/1 Running 12 16d 192.168.219.97 master
kube-system pod/calico-node-gqrcc 1/1 Running 13 16d 10.142.0.10 worker
kube-system pod/calico-node-hxrhx 1/1 Running 12 16d 10.142.0.9 master
kube-system pod/coredns-66bff467f8-2mrnn 1/1 Running 12 16d 192.168.219.98 master
kube-system pod/coredns-66bff467f8-m7ls6 1/1 Running 12 16d 192.168.219.102 master
kube-system pod/etcd-master 1/1 Running 12 16d 10.142.0.9 master
kube-system pod/kube-apiserver-master 1/1 Running 14 16d 10.142.0.9 master
kube-system pod/kube-controller-manager-master 1/1 Running 12 16d 10.142.0.9 master
kube-system pod/kube-proxy-kd42s 1/1 Running 12 16d 10.142.0.10 worker
kube-system pod/kube-proxy-mzj9b 1/1 Running 12 16d 10.142.0.9 master
kube-system pod/kube-scheduler-master 1/1 Running 12 16d 10.142.0.9 master
kube-system pod/metrics-server-6c7dddc9f8-24jr7 1/1 Running 2 3d22h 192.168.171.126 worker
kube-system pod/metrics-server-6c7dddc9f8-7w8w2 1/1 Running 3 4d1h 192.168.219.101 master
namespace1 pod/pod1 0/1 Completed 0 2d23h worker
namespace2 pod/pod2 0/1 Completed 0 2d23h master
namespace3 pod/pod3 0/1 Completed 0 2d23h worker
namespace4 pod/pod4 0/1 Completed 0 2d23h worker
qos-example pod/qos-demo 1/1 Running 10 10d 192.168.171.112 worker
student@master:~$

serewicz · August 2020

Hello,

Thank you for the info. I notice that you have a fair amount of restarts on all of the long-running containers, and a pending container. Are you using 2cpu/7.5+ or larger VMs?

Also, please verify you have an allow all traffic rule for you GCE VPC, like this:

I just reproduced the lab and can see both VMs. I'm leaning towards something in your VPC firewall settings.

Regards,

zhangwe · August 2020

Yes, I am using using 2cpu/7.5GB VMS with 15GB disk-space.

The same network policies and VPC firewall settings I used before. And before, metrics-server worked on bith nodes. I used instance template to create VMs.
This time, when I created metrics-server, I forgot to add "- --kubelet-insecure-tls #". According to your suggestion, I added it, and restart the the whole. I found metrics-server just worked only on one node.

serewicz · August 2020

Hello,

Do you mean that the same settings you used before is to allow all traffic to all IPs? Perhaps your previous wasn't quite correct.

We have looked at much of the differences and I'm not seeing why your metrics server would not be reporting from both nodes, with the VPC firewall as the primary culprit. Obvioulsy the image is proper because it is reporting from the master. The difference is the network which connects them. Check your firewall settings.

Perhaps during Friday's office hours you can share your screen and we can look at your VPC firewall settings. I just ran the ten commands and both nodes are reporting metrics.

Regards,

chrispokorni · August 2020

Hi @zhangwe,

According to step 4 of the metrics-server lab exercise, do you have both the following lines in the metrics-server Deployment yaml manifest, and they are properly aligned?

- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

Also, what is the output of:

kubectl -n kube-system get svc,ep

Regards,
-Chris

zhangwe · August 2020

Hi Chris,
Yes, I missed this line "- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname" too. After I added this line, it works all well now.

Thank you very much for your help.

Wei

serewicz · August 2020

Interesting. I did not add that preferred address line to my configuration, and it works.

Lab 12.3 Cannot make metrics-server work.

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)