LAB 12.3 metrics-server Issues
HI Team,
i'm having issues getting that metrics-server matrix working, below are the output of some general commands
ubuntu@ip-172-31-26-86:~$ kubectl -n kube-system logs metrics-server-5f4ffd464c-fg7ss
I0815 07:01:40.290885 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0815 07:01:41.111497 1 secure_serving.go:116] Serving securely on [::]:4443
ubuntu@ip-172-31-26-86:~$
ubuntu@ip-172-31-26-86:~$
ubuntu@ip-172-31-26-86:~$ kubectl top pod --all-namespaces
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
ubuntu@ip-172-31-26-86:~$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
ubuntu@ip-172-31-26-86:~$ kubectl -n kube-system get svc,ep
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 44d
service/metrics-server ClusterIP 10.103.33.46 443/TCP 13m
service/traefik-ingress-service ClusterIP 10.102.129.203 80/TCP,8080/TCP 6d
NAME ENDPOINTS AGE
endpoints/kube-controller-manager 44d
endpoints/kube-dns 192.168.119.10:53,192.168.119.18:53,192.168.119.10:9153 + 3 more... 44d
endpoints/kube-scheduler 44d
endpoints/metrics-server 192.168.66.223:4443 13m
endpoints/traefik-ingress-service 172.31.26.226:8080,172.31.26.86:8080,172.31.26.226:80 + 1 more... 6d
Please note that i use Ubuntu on AWS
ubuntu@ip-172-31-26-86:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:38:50Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:30:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Thanks in advance
Raj.
Answers
-
Hello,
Thank you for the output that helps.
As both nodes are not replying with the information I'd look at the edit of the metrics server being properly configured first. Some other questions to investigate, in bold only to make them clear:
Did you run these commands to use v0.3.7?:
git clone https://github.com/kubernetes-incubator/metrics-server.git
less README.md
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.7/components.yaml
kubectl -n kube-system edit deployments.apps metrics-serverHave you waited a minute or two after the edit of the deployment for the server to collect information?
Do yo have these lines in the metrics-server deployment, indented the proper amount:
- --secure-port=4443
- --kubelet-insecure-tls #<<------------------------------Added this insecure TLS line
image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7 #<<-------- version 0.3.7
You may need to add this one, (some have reported it's required):
- --kubelet-preferred-address-types=InternalIP,ExternalIP,HostnameCould you paste the output of kubectl get pods -o wide --all-namespaces
Do your VMs have at least 2vCPU and 7.5G of memory?
After ten minutes and trying to get node and pod top info, are there any new errors in the pod logs?
Hopefully that will expose the hiccup, otherwise we'll look at some other information.
Regards,
1 -
Thanks for the reply..
Below is the output
kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-system calico-kube-controllers-89df8c6f8-cx6bc 1/1 Running 5 45d 192.168.119.31 ip-172-31-26-86
calico-system calico-node-hkmjf 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
calico-system calico-node-lcsc7 1/1 Running 7 47d 172.31.26.226 ip-172-31-26-226
calico-system calico-typha-f5f97d556-chl6n 1/1 Running 7 45d 172.31.26.226 ip-172-31-26-226
calico-system calico-typha-f5f97d556-ql7f9 1/1 Running 7 45d 172.31.26.86 ip-172-31-26-86
default dev-web-7b474799bb-8jd9w 1/1 Running 2 4d21h 192.168.119.29 ip-172-31-26-86
default dev-web-7b474799bb-9c6h9 1/1 Running 2 4d21h 192.168.119.35 ip-172-31-26-86
default dev-web-7b474799bb-cfwj7 1/1 Running 2 4d21h 192.168.119.34 ip-172-31-26-86
default dev-web-7b474799bb-m7cx5 1/1 Running 2 4d21h 192.168.119.28 ip-172-31-26-86
default dev-web-7b474799bb-p7lfh 1/1 Running 2 4d21h 192.168.119.14 ip-172-31-26-86
default dev-web-7b474799bb-sjlbq 1/1 Running 2 4d21h 192.168.119.30 ip-172-31-26-86
default ghost-6bbd97db54-l4rbp 1/1 Running 2 4d21h 192.168.119.27 ip-172-31-26-86
default nginx-7d88d7b787-2qbhp 1/1 Running 2 4d21h 192.168.119.26 ip-172-31-26-86
default sleepy-1595053200-crn74 0/1 Terminating 0 31d 192.168.66.231 ip-172-31-26-226
kube-system coredns-66bff467f8-jj7xr 1/1 Running 5 45d 192.168.119.4 ip-172-31-26-86
kube-system coredns-66bff467f8-tgl7l 1/1 Running 5 47d 192.168.119.33 ip-172-31-26-86
kube-system etcd-ip-172-31-26-86 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
kube-system kube-apiserver-ip-172-31-26-86 1/1 Running 6 47d 172.31.26.86 ip-172-31-26-86
kube-system kube-controller-manager-ip-172-31-26-86 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
kube-system kube-proxy-9tbk4 1/1 Running 4 47d 172.31.26.226 ip-172-31-26-226
kube-system kube-proxy-qsk7f 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
kube-system kube-scheduler-ip-172-31-26-86 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
kube-system metrics-server-5f4ffd464c-fg7ss 1/1 Running 1 3d3h 192.168.66.222 ip-172-31-26-226
kube-system traefik-ingress-controller-qcxlq 1/1 Running 2 4d20h 172.31.26.226 ip-172-31-26-226
kube-system traefik-ingress-controller-wrp5k 1/1 Running 3 9d 172.31.26.86 ip-172-31-26-86
low-usage-limit limited-hog-d9d756c45-5zs6w 1/1 Running 5 45d 192.168.119.1 ip-172-31-26-86
tigera-operator tigera-operator-c9cf5b94d-wfk5k 1/1 Running 8 45d 172.31.26.86 ip-172-31-26-86This is the Metric server edits looks like
labels: k8s-app: metrics-server name: metrics-server spec: containers: - args: - --cert-dir=/tmp - --secure-port=4443 - --kubelet-insecure-tls - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7 imagePullPolicy: IfNotPresent name: metrics-server ports: - containerPort: 4443 name: main-port protocol: TCP resources: {} securityContext: readOnlyRootFilesystem: true runAsNonRoot: true runAsUser: 1000
I'm using AWS Ubuntu m5a.large instances which have 2 vCpu and 8GB RAM
Thanks
0 -
kubectl -n kube-system logs metrics-server-5f4ffd464c-fg7ss
I0818 10:05:15.137550 1 secure_serving.go:116] Serving securely on [::]:44430 -
Hi @RajG,
It seems that all client workload is deployed on a single node, and the only client pod (sleepy) deployed on the other node is in terminating state.
This may indicate that one of your nodes could be tainted and does not allow workload to be evenly distributed in the cluster. What is the output of
kubectl get nodes
?Regards,
-Chris0 -
Thanks for the reply..
Below is the output, please look into that
ubuntu@ip-172-31-26-86:~/metrics-server$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-26-226 Ready 48d v1.18.1
ip-172-31-26-86 Ready master 48d v1.18.10 -
Some more output in verbose mode
kubectl -n kube-system logs metrics-server-5f4ffd464c-fg7ss -v=9
I0819 11:17:37.116378 11380 loader.go:375] Config loaded from file: /home/ubuntu/.kube/config
I0819 11:17:37.117672 11380 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.18.1 (linux/amd64) kubernetes/7879fc1" 'https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s'
I0819 11:17:37.132460 11380 round_trippers.go:443] GET https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s 503 Service Unavailable in 14 milliseconds
I0819 11:17:37.132491 11380 round_trippers.go:449] Response Headers:
I0819 11:17:37.132509 11380 round_trippers.go:452] Content-Type: text/plain; charset=utf-8
I0819 11:17:37.132523 11380 round_trippers.go:452] X-Content-Type-Options: nosniff
I0819 11:17:37.132537 11380 round_trippers.go:452] Content-Length: 20
I0819 11:17:37.132551 11380 round_trippers.go:452] Date: Wed, 19 Aug 2020 11:17:37 GMT
I0819 11:17:37.134520 11380 request.go:1068] Response Body: service unavailable
I0819 11:17:37.136341 11380 request.go:1271] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
I0819 11:17:37.136374 11380 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
I0819 11:17:37.136422 11380 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
I0819 11:17:37.136861 11380 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.18.1 (linux/amd64) kubernetes/7879fc1" 'https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s'
I0819 11:17:37.139067 11380 round_trippers.go:443] GET https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s 503 Service Unavailable in 2 milliseconds0 -
Hi @RajG,
In an earlier exercise we explored taints and tolerations. There may be a chance that one of your nodes is still tainted and prevents scheduling of new workload.
Run
kubectl describe nodes | grep -i taint
and if a taint is found on one node then you'd need to revisit exercise 11.2 steps 8, 10 or 12 to remove it, depending which taint(s) is/are found.Regards,
-Chris0 -
Hi Chris,
I found no taint on both nodes
kubectl describe nodes | grep -i taint
Taints:
Taints:**Below is verbose output **
I0819 12:43:52.201098 10728 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.18.1 (linux/amd64) kubernetes/7879fc1" 'https://k8smaster:6443/apis/metrics.k8s.io/v1beta1/nodes'
I0819 12:43:52.204297 10728 round_trippers.go:443] GET https://k8smaster:6443/apis/metrics.k8s.io/v1beta1/nodes 503 Service Unavailable in 3 milliseconds
I0819 12:43:52.204382 10728 round_trippers.go:449] Response Headers:
I0819 12:43:52.204437 10728 round_trippers.go:452] Content-Type: text/plain; charset=utf-8
I0819 12:43:52.204488 10728 round_trippers.go:452] X-Content-Type-Options: nosniff
I0819 12:43:52.204505 10728 round_trippers.go:452] Content-Length: 20
I0819 12:43:52.204532 10728 round_trippers.go:452] Date: Wed, 19 Aug 2020 12:43:52 GMT
I0819 12:43:52.204574 10728 request.go:1068] Response Body: service unavailable
I0819 12:43:52.204767 10728 helpers.go:216] server response object: [{
"metadata": {},
"status": "Failure",
"message": "the server is currently unable to handle the request (get nodes.metrics.k8s.io)",
"reason": "ServiceUnavailable",
"details": {
"group": "metrics.k8s.io",
"kind": "nodes",
"causes": [
{
"reason": "UnexpectedServerResponse",
"message": "service unavailable"
}
]
},
"code": 503
}]
F0819 12:43:52.204811 10728 helpers.go:115] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)**Also please see below output
**
kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 49d
metrics-server ClusterIP 10.103.33.46 443/TCP 4d5h
traefik-ingress-service ClusterIP 10.102.129.203 80/TCP,8080/TCP 10d
ubuntu@ip-172-31-26-86:~/metrics-server$ ping 10.103.33.46 -c 1
PING 10.103.33.46 (10.103.33.46) 56(84) bytes of data.I tried the below additional lines without any luck
spec:
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-insecure-tls
- --requestheader-allowed-names=aggregator
- --metric-resolution=30s
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname,InternalDNS,ExternalDNS
image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
imagePullPolicy: IfNotPresent
name: metrics-server
ports:
- containerPort: 4443
name: main-port
protocol: TCP
resources: {}
securityContext:0 -
Hello,
I see that your nodes have been up for 49 days, and the output now shows k8smaster. Did you just rename the node, or did you go back and ensure that you completed the proper setup of the cluster? Again, if you missed steps when setting up the cluster there could be other issues.
Please verify, did you rebuild the cluster following each step?
Are you using docker or cri-o as your container engine?
Regards,
0 -
My host file is per below
cat /etc/hosts
127.0.0.1 localhostThe following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
172.31.26.86 k8smasterYes i created the CLuster a while ago and working on these Labs whenever i have spare time
Thanks
0 -
Hello,
Please revisit my previous posts. You did not set up your cluster correctly. Changing the name of the host does not reconfigure the kubernetes cluster.
Please start with new, fresh, virtual machines, and complete each of the setup tasks.
Also please verify did you configure docker, or did you choose cri-o? If you read the large warning at the start of lab 12.3 you'll see:
The metrics-server is written to interact with Docker. If you chose to use crio the logs will show errors and inability
to collect metrics.Regards
0 -
Hello
Please note that i didn't change the name of the host now, it was there from Day 1
Also i am using Docker, below is the command i ran during the initial setup was done from Chapter 3 steps 12-14kubeadm join k8smaster:6443 --token cygv1k.vp48935xisk6y147 --discovery-token-ca-cert-hash sha256:63750c6e81e9b0cd7da27ddccaef600d247747b309fc5b35e6e716f207819869 --control-plane --certificate-key 006ede7931e83aa011c037b212826ede98f38666bbda03bc4d2ebf06210ef462
systemctl enable docker.service
kubeadm join k8smaster:6443 --token cygv1k.vp48935xisk6y147 --discovery-token-ca-cert-hash sha256:63750c6e81e9b0cd7da27ddccaef600d247747b309fc5b35e6e716f207819869 --control-plane --certificate-key 006ede7931e83aa011c037b212826ede98f38666bbda03bc4d2ebf06210ef462
32 kubectl get pods
33 kubectl get pod
34 kubeadm init
35 kubectl get nodes
36 ls -l /etc/docker/daemon.json
37 mkdir -p /etc/systemd/system/docker.service.d
38 systemctl daemon-reload
39 sudo reboot
40 systemctl restart docker
41 sudo systemctl enable dockerThanks in advance
0 -
Hi @RajG,
There are some discrepancies between the shell commands history and your actual environment. From the shell history, it seems you ran the
join
command which included the--control-plane
and--cert-key
flags, specific for adding master nodes into the cluster. From yourget nodes
command, it seems that you only have one master and one worker, no sign of additional masters. At what point did you fix your cluster and what steps did you take when removing the additional master before adding a worker instead? During the cleanup phase, some configurations may have persisted, and those may be affecting your cluster's performance now.Regards,
-Chris0 -
@serewicz said:
kubectl -n kube-system edit deployments.apps metrics-server
You may need to add this one, (some have reported it's required):
- --kubelet-preferred-address-types=InternalIP,ExternalIP,HostnameI have a local VirtualBox setup and I was seeing "unable to fetch node metrics" and "no such host" errors in the metric server log - this fixed my problem, thanks!
1 -
Great! Thanks for the feedback.
It's not a consistent issue, there must be some difference, perhaps in the operating system of the host, I don't know why. I will look to update the material with this item.
Regards,
0 -
I had this problem and it took a bit more than for me. I have tried this on VMs running on both Unraid and XCP using Ubuntu 18.04.03 and both had the same issue. I had do to this to get them to work....
args:
- --cert-dir=/tmp
- --secure-port=4443
- command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,External IP,Hostname_0 -
Thanks for the feedback!
0 -
@skl said:
@serewicz said:
kubectl -n kube-system edit deployments.apps metrics-server
You may need to add this one, (some have reported it's required):
- --kubelet-preferred-address-types=InternalIP,ExternalIP,HostnameI have a local VirtualBox setup and I was seeing "unable to fetch node metrics" and "no such host" errors in the metric server log - this fixed my problem, thanks!
This also fixed my issue. Running vmware workstation here.
0
Categories
- All Categories
- 217 LFX Mentorship
- 217 LFX Mentorship: Linux Kernel
- 788 Linux Foundation IT Professional Programs
- 352 Cloud Engineer IT Professional Program
- 177 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 146 Cloud Native Developer IT Professional Program
- 137 Express Training Courses
- 137 Express Courses - Discussion Forum
- 6.2K Training Courses
- 46 LFC110 Class Forum - Discontinued
- 70 LFC131 Class Forum
- 42 LFD102 Class Forum
- 226 LFD103 Class Forum
- 18 LFD110 Class Forum
- 37 LFD121 Class Forum
- 18 LFD133 Class Forum
- 7 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 4 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 693 LFD259 Class Forum
- 111 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 145 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 2 LFS116 Class Forum
- 4 LFS118 Class Forum
- 6 LFS142 Class Forum
- 5 LFS144 Class Forum
- 4 LFS145 Class Forum
- 2 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 2 LFS157 Class Forum
- 25 LFS158 Class Forum
- 7 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 31 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム
- 18 LFS203 Class Forum
- 130 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 2 LFS245 Class Forum
- LFS246 Class Forum
- 48 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 150 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 7 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 118 LFS260 Class Forum
- 159 LFS261 Class Forum
- 42 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 22 LFS268 Class Forum
- 30 LFS269 Class Forum
- LFS270 Class Forum
- 202 LFS272 Class Forum
- 2 LFS272-JP クラス フォーラム
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 9 LFW111 Class Forum
- 259 LFW211 Class Forum
- 181 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 795 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 102 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 63 Mobile Computing
- 18 Android
- 33 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 370 Off Topic
- 114 Introductions
- 173 Small Talk
- 22 Study Material
- 805 Programming and Development
- 303 Kernel Development
- 484 Software Development
- 1.8K Software
- 261 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 96 All In Program
- 96 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)