Welcome to the Linux Foundation Forum!

LAB 12.3 metrics-server Issues

HI Team,

i'm having issues getting that metrics-server matrix working, below are the output of some general commands

ubuntu@ip-172-31-26-86:~$ kubectl -n kube-system logs metrics-server-5f4ffd464c-fg7ss
I0815 07:01:40.290885 1 serving.go:312] Generated self-signed cert (/tmp/apiserver.crt, /tmp/apiserver.key)
I0815 07:01:41.111497 1 secure_serving.go:116] Serving securely on [::]:4443
ubuntu@ip-172-31-26-86:~$
ubuntu@ip-172-31-26-86:~$
ubuntu@ip-172-31-26-86:~$ kubectl top pod --all-namespaces
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
ubuntu@ip-172-31-26-86:~$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

ubuntu@ip-172-31-26-86:~$ kubectl -n kube-system get svc,ep
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 44d
service/metrics-server ClusterIP 10.103.33.46 443/TCP 13m
service/traefik-ingress-service ClusterIP 10.102.129.203 80/TCP,8080/TCP 6d

NAME ENDPOINTS AGE
endpoints/kube-controller-manager 44d
endpoints/kube-dns 192.168.119.10:53,192.168.119.18:53,192.168.119.10:9153 + 3 more... 44d
endpoints/kube-scheduler 44d
endpoints/metrics-server 192.168.66.223:4443 13m
endpoints/traefik-ingress-service 172.31.26.226:8080,172.31.26.86:8080,172.31.26.226:80 + 1 more... 6d

Please note that i use Ubuntu on AWS

ubuntu@ip-172-31-26-86:~$ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:38:50Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.1", GitCommit:"7879fc12a63337efff607952a323df90cdc7a335", GitTreeState:"clean", BuildDate:"2020-04-08T17:30:47Z", GoVersion:"go1.13.9", Compiler:"gc", Platform:"linux/amd64"}

Thanks in advance
Raj.

Answers

  • RajG
    RajG Posts: 8

    Thanks for the reply..

    Below is the output

    kubectl get pods -o wide --all-namespaces

    NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    calico-system calico-kube-controllers-89df8c6f8-cx6bc 1/1 Running 5 45d 192.168.119.31 ip-172-31-26-86
    calico-system calico-node-hkmjf 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
    calico-system calico-node-lcsc7 1/1 Running 7 47d 172.31.26.226 ip-172-31-26-226
    calico-system calico-typha-f5f97d556-chl6n 1/1 Running 7 45d 172.31.26.226 ip-172-31-26-226
    calico-system calico-typha-f5f97d556-ql7f9 1/1 Running 7 45d 172.31.26.86 ip-172-31-26-86
    default dev-web-7b474799bb-8jd9w 1/1 Running 2 4d21h 192.168.119.29 ip-172-31-26-86
    default dev-web-7b474799bb-9c6h9 1/1 Running 2 4d21h 192.168.119.35 ip-172-31-26-86
    default dev-web-7b474799bb-cfwj7 1/1 Running 2 4d21h 192.168.119.34 ip-172-31-26-86
    default dev-web-7b474799bb-m7cx5 1/1 Running 2 4d21h 192.168.119.28 ip-172-31-26-86
    default dev-web-7b474799bb-p7lfh 1/1 Running 2 4d21h 192.168.119.14 ip-172-31-26-86
    default dev-web-7b474799bb-sjlbq 1/1 Running 2 4d21h 192.168.119.30 ip-172-31-26-86
    default ghost-6bbd97db54-l4rbp 1/1 Running 2 4d21h 192.168.119.27 ip-172-31-26-86
    default nginx-7d88d7b787-2qbhp 1/1 Running 2 4d21h 192.168.119.26 ip-172-31-26-86
    default sleepy-1595053200-crn74 0/1 Terminating 0 31d 192.168.66.231 ip-172-31-26-226
    kube-system coredns-66bff467f8-jj7xr 1/1 Running 5 45d 192.168.119.4 ip-172-31-26-86
    kube-system coredns-66bff467f8-tgl7l 1/1 Running 5 47d 192.168.119.33 ip-172-31-26-86
    kube-system etcd-ip-172-31-26-86 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
    kube-system kube-apiserver-ip-172-31-26-86 1/1 Running 6 47d 172.31.26.86 ip-172-31-26-86
    kube-system kube-controller-manager-ip-172-31-26-86 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
    kube-system kube-proxy-9tbk4 1/1 Running 4 47d 172.31.26.226 ip-172-31-26-226
    kube-system kube-proxy-qsk7f 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
    kube-system kube-scheduler-ip-172-31-26-86 1/1 Running 5 47d 172.31.26.86 ip-172-31-26-86
    kube-system metrics-server-5f4ffd464c-fg7ss 1/1 Running 1 3d3h 192.168.66.222 ip-172-31-26-226
    kube-system traefik-ingress-controller-qcxlq 1/1 Running 2 4d20h 172.31.26.226 ip-172-31-26-226
    kube-system traefik-ingress-controller-wrp5k 1/1 Running 3 9d 172.31.26.86 ip-172-31-26-86
    low-usage-limit limited-hog-d9d756c45-5zs6w 1/1 Running 5 45d 192.168.119.1 ip-172-31-26-86
    tigera-operator tigera-operator-c9cf5b94d-wfk5k 1/1 Running 8 45d 172.31.26.86 ip-172-31-26-86

    This is the Metric server edits looks like

    labels:
        k8s-app: metrics-server
      name: metrics-server
    spec:
      containers:
      - args:
        - --cert-dir=/tmp
        - --secure-port=4443
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
        imagePullPolicy: IfNotPresent
        name: metrics-server
        ports:
        - containerPort: 4443
          name: main-port
          protocol: TCP
        resources: {}
        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
    

    I'm using AWS Ubuntu m5a.large instances which have 2 vCpu and 8GB RAM

    Thanks

  • RajG
    RajG Posts: 8

    kubectl -n kube-system logs metrics-server-5f4ffd464c-fg7ss
    I0818 10:05:15.137550 1 secure_serving.go:116] Serving securely on [::]:4443

  • chrispokorni
    chrispokorni Posts: 2,606

    Hi @RajG,

    It seems that all client workload is deployed on a single node, and the only client pod (sleepy) deployed on the other node is in terminating state.

    This may indicate that one of your nodes could be tainted and does not allow workload to be evenly distributed in the cluster. What is the output of kubectl get nodes ?

    Regards,
    -Chris

  • RajG
    RajG Posts: 8

    Thanks for the reply..

    Below is the output, please look into that

    ubuntu@ip-172-31-26-86:~/metrics-server$ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    ip-172-31-26-226 Ready 48d v1.18.1
    ip-172-31-26-86 Ready master 48d v1.18.1

  • RajG
    RajG Posts: 8

    Some more output in verbose mode

    kubectl -n kube-system logs metrics-server-5f4ffd464c-fg7ss -v=9
    I0819 11:17:37.116378 11380 loader.go:375] Config loaded from file: /home/ubuntu/.kube/config
    I0819 11:17:37.117672 11380 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.18.1 (linux/amd64) kubernetes/7879fc1" 'https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s'
    I0819 11:17:37.132460 11380 round_trippers.go:443] GET https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s 503 Service Unavailable in 14 milliseconds
    I0819 11:17:37.132491 11380 round_trippers.go:449] Response Headers:
    I0819 11:17:37.132509 11380 round_trippers.go:452] Content-Type: text/plain; charset=utf-8
    I0819 11:17:37.132523 11380 round_trippers.go:452] X-Content-Type-Options: nosniff
    I0819 11:17:37.132537 11380 round_trippers.go:452] Content-Length: 20
    I0819 11:17:37.132551 11380 round_trippers.go:452] Date: Wed, 19 Aug 2020 11:17:37 GMT
    I0819 11:17:37.134520 11380 request.go:1068] Response Body: service unavailable
    I0819 11:17:37.136341 11380 request.go:1271] body was not decodable (unable to check for Status): couldn't get version/kind; json parse error: json: cannot unmarshal string into Go value of type struct { APIVersion string "json:\"apiVersion,omitempty\""; Kind string "json:\"kind,omitempty\"" }
    I0819 11:17:37.136374 11380 cached_discovery.go:78] skipped caching discovery info due to the server is currently unable to handle the request
    I0819 11:17:37.136422 11380 shortcut.go:89] Error loading discovery information: unable to retrieve the complete list of server APIs: metrics.k8s.io/v1beta1: the server is currently unable to handle the request
    I0819 11:17:37.136861 11380 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.18.1 (linux/amd64) kubernetes/7879fc1" 'https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s'
    I0819 11:17:37.139067 11380 round_trippers.go:443] GET https://k8smaster:6443/apis/metrics.k8s.io/v1beta1?timeout=32s 503 Service Unavailable in 2 milliseconds

  • chrispokorni
    chrispokorni Posts: 2,606

    Hi @RajG,

    In an earlier exercise we explored taints and tolerations. There may be a chance that one of your nodes is still tainted and prevents scheduling of new workload.

    Run kubectl describe nodes | grep -i taint and if a taint is found on one node then you'd need to revisit exercise 11.2 steps 8, 10 or 12 to remove it, depending which taint(s) is/are found.

    Regards,
    -Chris

  • RajG
    RajG Posts: 8

    Hi Chris,

    I found no taint on both nodes

    kubectl describe nodes | grep -i taint
    Taints:
    Taints:

    **Below is verbose output **

    I0819 12:43:52.201098 10728 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.18.1 (linux/amd64) kubernetes/7879fc1" 'https://k8smaster:6443/apis/metrics.k8s.io/v1beta1/nodes'
    I0819 12:43:52.204297 10728 round_trippers.go:443] GET https://k8smaster:6443/apis/metrics.k8s.io/v1beta1/nodes 503 Service Unavailable in 3 milliseconds
    I0819 12:43:52.204382 10728 round_trippers.go:449] Response Headers:
    I0819 12:43:52.204437 10728 round_trippers.go:452] Content-Type: text/plain; charset=utf-8
    I0819 12:43:52.204488 10728 round_trippers.go:452] X-Content-Type-Options: nosniff
    I0819 12:43:52.204505 10728 round_trippers.go:452] Content-Length: 20
    I0819 12:43:52.204532 10728 round_trippers.go:452] Date: Wed, 19 Aug 2020 12:43:52 GMT
    I0819 12:43:52.204574 10728 request.go:1068] Response Body: service unavailable
    I0819 12:43:52.204767 10728 helpers.go:216] server response object: [{
    "metadata": {},
    "status": "Failure",
    "message": "the server is currently unable to handle the request (get nodes.metrics.k8s.io)",
    "reason": "ServiceUnavailable",
    "details": {
    "group": "metrics.k8s.io",
    "kind": "nodes",
    "causes": [
    {
    "reason": "UnexpectedServerResponse",
    "message": "service unavailable"
    }
    ]
    },
    "code": 503
    }]
    F0819 12:43:52.204811 10728 helpers.go:115] Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

    **Also please see below output
    **
    kubectl get svc -n kube-system
    NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 49d
    metrics-server ClusterIP 10.103.33.46 443/TCP 4d5h
    traefik-ingress-service ClusterIP 10.102.129.203 80/TCP,8080/TCP 10d
    ubuntu@ip-172-31-26-86:~/metrics-server$ ping 10.103.33.46 -c 1
    PING 10.103.33.46 (10.103.33.46) 56(84) bytes of data.

    I tried the below additional lines without any luck

    spec:
    containers:
    - args:
    - --cert-dir=/tmp
    - --secure-port=4443
    - --kubelet-insecure-tls
    - --requestheader-allowed-names=aggregator
    - --metric-resolution=30s
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname,InternalDNS,ExternalDNS
    image: k8s.gcr.io/metrics-server/metrics-server:v0.3.7
    imagePullPolicy: IfNotPresent
    name: metrics-server
    ports:
    - containerPort: 4443
    name: main-port
    protocol: TCP
    resources: {}
    securityContext:

  • RajG
    RajG Posts: 8

    My host file is per below

    cat /etc/hosts
    127.0.0.1 localhost

    The following lines are desirable for IPv6 capable hosts

    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts
    172.31.26.86 k8smaster

    Yes i created the CLuster a while ago and working on these Labs whenever i have spare time

    Thanks

  • RajG
    RajG Posts: 8

    Hello

    Please note that i didn't change the name of the host now, it was there from Day 1
    Also i am using Docker, below is the command i ran during the initial setup was done from Chapter 3 steps 12-14

    kubeadm join k8smaster:6443 --token cygv1k.vp48935xisk6y147 --discovery-token-ca-cert-hash sha256:63750c6e81e9b0cd7da27ddccaef600d247747b309fc5b35e6e716f207819869 --control-plane --certificate-key 006ede7931e83aa011c037b212826ede98f38666bbda03bc4d2ebf06210ef462
    systemctl enable docker.service
    kubeadm join k8smaster:6443 --token cygv1k.vp48935xisk6y147 --discovery-token-ca-cert-hash sha256:63750c6e81e9b0cd7da27ddccaef600d247747b309fc5b35e6e716f207819869 --control-plane --certificate-key 006ede7931e83aa011c037b212826ede98f38666bbda03bc4d2ebf06210ef462
    32 kubectl get pods
    33 kubectl get pod
    34 kubeadm init
    35 kubectl get nodes
    36 ls -l /etc/docker/daemon.json
    37 mkdir -p /etc/systemd/system/docker.service.d
    38 systemctl daemon-reload
    39 sudo reboot
    40 systemctl restart docker
    41 sudo systemctl enable docker

    Thanks in advance

  • chrispokorni
    chrispokorni Posts: 2,606

    Hi @RajG,

    There are some discrepancies between the shell commands history and your actual environment. From the shell history, it seems you ran the join command which included the --control-plane and --cert-key flags, specific for adding master nodes into the cluster. From your get nodes command, it seems that you only have one master and one worker, no sign of additional masters. At what point did you fix your cluster and what steps did you take when removing the additional master before adding a worker instead? During the cleanup phase, some configurations may have persisted, and those may be affecting your cluster's performance now.

    Regards,
    -Chris

  • skl
    skl Posts: 3

    @serewicz said:
    kubectl -n kube-system edit deployments.apps metrics-server
    You may need to add this one, (some have reported it's required):
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

    I have a local VirtualBox setup and I was seeing "unable to fetch node metrics" and "no such host" errors in the metric server log - this fixed my problem, thanks!

  • recentcoin
    recentcoin Posts: 21
    edited September 2020

    I had this problem and it took a bit more than for me. I have tried this on VMs running on both Unraid and XCP using Ubuntu 18.04.03 and both had the same issue. I had do to this to get them to work....

    args:
    - --cert-dir=/tmp
    - --secure-port=4443
    - command:
    - /metrics-server
    - --kubelet-insecure-tls
    - --kubelet-preferred-address-types=InternalIP,External IP,Hostname_

  • @skl said:

    @serewicz said:
    kubectl -n kube-system edit deployments.apps metrics-server
    You may need to add this one, (some have reported it's required):
    - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname

    I have a local VirtualBox setup and I was seeing "unable to fetch node metrics" and "no such host" errors in the metric server log - this fixed my problem, thanks!

    This also fixed my issue. Running vmware workstation here.

Categories

Upcoming Training