core-dns is get getting ready on worker / minion node

furqanbaqai · January 2021

Hi,

CoreDNS is not getting to the ready state on minoin / work node. This is causing kubeadm to fail while upgrading the cluster. Please find the following logs for more details:

baqai@k8smaster:~/util/LFS258/SOLUTIONS/s_03$ kubectl get po -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE        NOMINATED NODE   READINESS GATES
calico-kube-controllers-7dbc97f587-72gcf   1/1     Running   0          78m     192.168.16.130    k8smaster   <none>           <none>
calico-node-tmxjx                          1/1     Running   0          78m     192.168.159.145   k8smaster   <none>           <none>
calico-node-xftlm                          1/1     Running   0          4m57s   192.168.159.146   node        <none>           <none>
coredns-66bff467f8-dgtvc                   0/1     Running   0          4m17s   192.168.167.129   node        <none>           <none>
coredns-66bff467f8-l8m74                   1/1     Running   0          7m4s    192.168.16.132    k8smaster   <none>           <none>
etcd-k8smaster                             1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-apiserver-k8smaster                   1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-controller-manager-k8smaster          1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-proxy-jtlpt                           1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-proxy-n5zt9                           1/1     Running   0          4m57s   192.168.159.146   node        <none>           <none>
kube-scheduler-k8smaster                   1/1     Running   1          83m     192.168.159.145   k8smaster   <none>           <none>

baqai@k8smaster:~/util/LFS258/SOLUTIONS/s_03$ kubectl logs coredns-66bff467f8-dgtvc
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0126 10:49:28.943386       1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.942842255 +0000 UTC m=+0.024568617) (total time: 30.000427822s):
Trace[2019727887]: [30.000427822s] [30.000427822s] END
E0126 10:49:28.943419       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0126 10:49:28.943821       1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.94341507 +0000 UTC m=+0.025141409) (total time: 30.000395886s):
Trace[1427131847]: [30.000395886s] [30.000395886s] END
E0126 10:49:28.943828       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0126 10:49:28.944259       1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.943549088 +0000 UTC m=+0.025275450) (total time: 30.000697364s):
Trace[939984059]: [30.000697364s] [30.000697364s] END
E0126 10:49:28.944269       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"

chrispokorni · January 2021

Hi @furqanbaqai,

It seems the connection attempts are timing out, when accessing the kubernetes Service ClusterIP.
Did you try deleting the Pod, and force the controller to replace it?
What happens when you run curl https://10.96.0.1:443 and kubectl describe svc kubernetes ?

Regards,
-Chris

furqanbaqai · January 2021

Hi @chrispokorni ,

Thanks for the response. All coreDNS pods are running perfectly on the master node, when i delete any one of the pod, it is scheduled on the secondary node and this error comes. For the other questions:
1. I did curl and i am getting following result:

baqai@oftl-ub180464:/var/log$ curl https://10.96.0.1:443 -k
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
  "reason": "Forbidden",
  "details": {

  },
  "code": 403
}

Service kubernetes description:

baqai@k8smaster:/var/log/calico/cni$ kubectl describe svc kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.159.145:6443
Session Affinity:  None
Events:            <none>

Just to highlight, the same pattern is observed in v.19.5

Thanks in advance

chrispokorni · January 2021

Hi @furqanbaqai,

Thanks for the detailed outputs. There are a few things here that seem to be causing your issues.

The IP addresses of your hosts/Nodes/VMs are overlapping with the default Pod IP network managed by Calico, which is 192.168.0.0/16. There should be no overlap of any kind between Node IP network and Pod network. The recommendation is to either provision new VMs with IP addresses that do not overlap the Pod network 192.168.0.0/16, OR to re-deploy your cluster while re-configuring Calico and the kubeadm-config.yaml file with a new Pod network, in order to avoid overlaps.
You may run into issues later on because of your host naming convention. k8smaster is intended to be used only as an alias for the control plane (which in early labs is represented by master1 node, and later by an entire cluster of 3 masters and an HAProxy server). You seem to have introduced in your environment k8smaster as the hostname of your master node also.

Regards,
-Chris

furqanbaqai · January 2021

Hi @chrispokorni ,

Thanks for your response. Let me try this out and provide you the feedback.

furqanbaqai · January 2021

Hi @chrispokorni ,

Thank you for your help and support. This is to confirm that by changing the cidr IP in my kubeadm-config.yaml file with an IP not conflicting with the IP range of the local vms, coreDNS is getting scheduled in second node as well. I'll proceed and upgrade the cluster to newer version according to the exercise.

My kubeadm-config.yaml for reference along with the output of kubectl get po --all-namespaces:

NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE       NOMINATED NODE   READINESS GATES
calico-kube-controllers-7dbc97f587-9p6v8   1/1     Running   0          16m     10.6.0.67         lfs25801   <none>           <none>
calico-node-9qfrw                          1/1     Running   0          16m     192.168.159.145   lfs25801   <none>           <none>
calico-node-djdch                          1/1     Running   0          7m17s   192.168.159.146   lfs25802   <none>           <none>
coredns-66bff467f8-2ljdn                   1/1     Running   0          4m2s    10.6.0.193        lfs25802   <none>           <none>
coredns-66bff467f8-sp5pc                   1/1     Running   0          18m     10.6.0.66         lfs25801   <none>           <none>
etcd-lfs25801                              1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
kube-apiserver-lfs25801                    1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
kube-controller-manager-lfs25801           1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
kube-proxy-kxlht                           1/1     Running   0          7m17s   192.168.159.146   lfs25802   <none>           <none>
kube-proxy-w9djn                           1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
kube-scheduler-lfs25801                    1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>

apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.18.15
controlPlaneEndpoint: "lfs25801:6443"
networking:
  podSubnet: 10.6.0.0/24

chrispokorni · January 2021

Hi @furqanbaqai,

The IP addresses of your pods and nodes look good this time.

However, you may have missed 2 steps in Lab 3.1:
1. Step 12, where an alias is mapped to the Private IP address of the Master Node in the /etc/hosts file. The same alias and IP pair is expected to be used later in Lab 3.2 Step 6 in the /etc/hosts file of the Minion Node.
2. Step 13 where the alias (not the hostname of the Master Node) from Step 12 is included in the kubeadm-config.yaml manifest.

For consistency, I presume the calico.yaml manifest has been updated with:

- name: CALICO_IPV4POOL_CIDR
  value: "10.6.0.0/24"

Regards,
-Chris

core-dns is get getting ready on worker / minion node

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)