Welcome to the Linux Foundation Forum!

core-dns is get getting ready on worker / minion node

Hi,

CoreDNS is not getting to the ready state on minoin / work node. This is causing kubeadm to fail while upgrading the cluster. Please find the following logs for more details:

baqai@k8smaster:~/util/LFS258/SOLUTIONS/s_03$ kubectl get po -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE        NOMINATED NODE   READINESS GATES
calico-kube-controllers-7dbc97f587-72gcf   1/1     Running   0          78m     192.168.16.130    k8smaster   <none>           <none>
calico-node-tmxjx                          1/1     Running   0          78m     192.168.159.145   k8smaster   <none>           <none>
calico-node-xftlm                          1/1     Running   0          4m57s   192.168.159.146   node        <none>           <none>
coredns-66bff467f8-dgtvc                   0/1     Running   0          4m17s   192.168.167.129   node        <none>           <none>
coredns-66bff467f8-l8m74                   1/1     Running   0          7m4s    192.168.16.132    k8smaster   <none>           <none>
etcd-k8smaster                             1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-apiserver-k8smaster                   1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-controller-manager-k8smaster          1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-proxy-jtlpt                           1/1     Running   0          83m     192.168.159.145   k8smaster   <none>           <none>
kube-proxy-n5zt9                           1/1     Running   0          4m57s   192.168.159.146   node        <none>           <none>
kube-scheduler-k8smaster                   1/1     Running   1          83m     192.168.159.145   k8smaster   <none>           <none>
baqai@k8smaster:~/util/LFS258/SOLUTIONS/s_03$ kubectl logs coredns-66bff467f8-dgtvc
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0126 10:49:28.943386       1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.942842255 +0000 UTC m=+0.024568617) (total time: 30.000427822s):
Trace[2019727887]: [30.000427822s] [30.000427822s] END
E0126 10:49:28.943419       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0126 10:49:28.943821       1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.94341507 +0000 UTC m=+0.025141409) (total time: 30.000395886s):
Trace[1427131847]: [30.000395886s] [30.000395886s] END
E0126 10:49:28.943828       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0126 10:49:28.944259       1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.943549088 +0000 UTC m=+0.025275450) (total time: 30.000697364s):
Trace[939984059]: [30.000697364s] [30.000697364s] END
E0126 10:49:28.944269       1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
[INFO] plugin/ready: Still waiting on: "kubernetes"

Comments

  • chrispokorni
    chrispokorni Posts: 2,372
    edited January 2021

    Hi @furqanbaqai,

    It seems the connection attempts are timing out, when accessing the kubernetes Service ClusterIP.
    Did you try deleting the Pod, and force the controller to replace it?
    What happens when you run curl https://10.96.0.1:443 and kubectl describe svc kubernetes ?

    Regards,
    -Chris

  • Hi @chrispokorni ,

    Thanks for the response. All coreDNS pods are running perfectly on the master node, when i delete any one of the pod, it is scheduled on the secondary node and this error comes. For the other questions:
    1. I did curl and i am getting following result:

    baqai@oftl-ub180464:/var/log$ curl https://10.96.0.1:443 -k
    {
      "kind": "Status",
      "apiVersion": "v1",
      "metadata": {
    
      },
      "status": "Failure",
      "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
      "reason": "Forbidden",
      "details": {
    
      },
      "code": 403
    }
    
    1. Service kubernetes description:
    baqai@k8smaster:/var/log/calico/cni$ kubectl describe svc kubernetes
    Name:              kubernetes
    Namespace:         default
    Labels:            component=apiserver
                       provider=kubernetes
    Annotations:       <none>
    Selector:          <none>
    Type:              ClusterIP
    IP:                10.96.0.1
    Port:              https  443/TCP
    TargetPort:        6443/TCP
    Endpoints:         192.168.159.145:6443
    Session Affinity:  None
    Events:            <none>
    

    Just to highlight, the same pattern is observed in v.19.5

    Thanks in advance

  • chrispokorni
    chrispokorni Posts: 2,372

    Hi @furqanbaqai,

    Thanks for the detailed outputs. There are a few things here that seem to be causing your issues.

    1. The IP addresses of your hosts/Nodes/VMs are overlapping with the default Pod IP network managed by Calico, which is 192.168.0.0/16. There should be no overlap of any kind between Node IP network and Pod network. The recommendation is to either provision new VMs with IP addresses that do not overlap the Pod network 192.168.0.0/16, OR to re-deploy your cluster while re-configuring Calico and the kubeadm-config.yaml file with a new Pod network, in order to avoid overlaps.
    2. You may run into issues later on because of your host naming convention. k8smaster is intended to be used only as an alias for the control plane (which in early labs is represented by master1 node, and later by an entire cluster of 3 masters and an HAProxy server). You seem to have introduced in your environment k8smaster as the hostname of your master node also.

    Regards,
    -Chris

  • Hi @chrispokorni ,

    Thanks for your response. Let me try this out and provide you the feedback.

  • Hi @chrispokorni ,

    Thank you for your help and support. This is to confirm that by changing the cidr IP in my kubeadm-config.yaml file with an IP not conflicting with the IP range of the local vms, coreDNS is getting scheduled in second node as well. I'll proceed and upgrade the cluster to newer version according to the exercise.

    My kubeadm-config.yaml for reference along with the output of kubectl get po --all-namespaces:

    NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE       NOMINATED NODE   READINESS GATES
    calico-kube-controllers-7dbc97f587-9p6v8   1/1     Running   0          16m     10.6.0.67         lfs25801   <none>           <none>
    calico-node-9qfrw                          1/1     Running   0          16m     192.168.159.145   lfs25801   <none>           <none>
    calico-node-djdch                          1/1     Running   0          7m17s   192.168.159.146   lfs25802   <none>           <none>
    coredns-66bff467f8-2ljdn                   1/1     Running   0          4m2s    10.6.0.193        lfs25802   <none>           <none>
    coredns-66bff467f8-sp5pc                   1/1     Running   0          18m     10.6.0.66         lfs25801   <none>           <none>
    etcd-lfs25801                              1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
    kube-apiserver-lfs25801                    1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
    kube-controller-manager-lfs25801           1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
    kube-proxy-kxlht                           1/1     Running   0          7m17s   192.168.159.146   lfs25802   <none>           <none>
    kube-proxy-w9djn                           1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
    kube-scheduler-lfs25801                    1/1     Running   0          18m     192.168.159.145   lfs25801   <none>           <none>
    
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterConfiguration
    kubernetesVersion: 1.18.15
    controlPlaneEndpoint: "lfs25801:6443"
    networking:
      podSubnet: 10.6.0.0/24
    
  • chrispokorni
    chrispokorni Posts: 2,372

    Hi @furqanbaqai,

    The IP addresses of your pods and nodes look good this time.

    However, you may have missed 2 steps in Lab 3.1:
    1. Step 12, where an alias is mapped to the Private IP address of the Master Node in the /etc/hosts file. The same alias and IP pair is expected to be used later in Lab 3.2 Step 6 in the /etc/hosts file of the Minion Node.
    2. Step 13 where the alias (not the hostname of the Master Node) from Step 12 is included in the kubeadm-config.yaml manifest.

    For consistency, I presume the calico.yaml manifest has been updated with:

    - name: CALICO_IPV4POOL_CIDR
      value: "10.6.0.0/24"
    

    Regards,
    -Chris

Categories

Upcoming Training