Welcome to the Linux Foundation Forum!

core-dns is get getting ready on worker / minion node

Hi,

CoreDNS is not getting to the ready state on minoin / work node. This is causing kubeadm to fail while upgrading the cluster. Please find the following logs for more details:

  1. baqai@k8smaster:~/util/LFS258/SOLUTIONS/s_03$ kubectl get po -n kube-system -o wide
  2. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
  3. calico-kube-controllers-7dbc97f587-72gcf 1/1 Running 0 78m 192.168.16.130 k8smaster <none> <none>
  4. calico-node-tmxjx 1/1 Running 0 78m 192.168.159.145 k8smaster <none> <none>
  5. calico-node-xftlm 1/1 Running 0 4m57s 192.168.159.146 node <none> <none>
  6. coredns-66bff467f8-dgtvc 0/1 Running 0 4m17s 192.168.167.129 node <none> <none>
  7. coredns-66bff467f8-l8m74 1/1 Running 0 7m4s 192.168.16.132 k8smaster <none> <none>
  8. etcd-k8smaster 1/1 Running 0 83m 192.168.159.145 k8smaster <none> <none>
  9. kube-apiserver-k8smaster 1/1 Running 0 83m 192.168.159.145 k8smaster <none> <none>
  10. kube-controller-manager-k8smaster 1/1 Running 0 83m 192.168.159.145 k8smaster <none> <none>
  11. kube-proxy-jtlpt 1/1 Running 0 83m 192.168.159.145 k8smaster <none> <none>
  12. kube-proxy-n5zt9 1/1 Running 0 4m57s 192.168.159.146 node <none> <none>
  13. kube-scheduler-k8smaster 1/1 Running 1 83m 192.168.159.145 k8smaster <none> <none>
  1. baqai@k8smaster:~/util/LFS258/SOLUTIONS/s_03$ kubectl logs coredns-66bff467f8-dgtvc
  2. .:53
  3. [INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
  4. CoreDNS-1.6.7
  5. linux/amd64, go1.13.6, da7f65b
  6. [INFO] plugin/ready: Still waiting on: "kubernetes"
  7. [INFO] plugin/ready: Still waiting on: "kubernetes"
  8. [INFO] plugin/ready: Still waiting on: "kubernetes"
  9. I0126 10:49:28.943386 1 trace.go:116] Trace[2019727887]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.942842255 +0000 UTC m=+0.024568617) (total time: 30.000427822s):
  10. Trace[2019727887]: [30.000427822s] [30.000427822s] END
  11. E0126 10:49:28.943419 1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
  12. I0126 10:49:28.943821 1 trace.go:116] Trace[1427131847]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.94341507 +0000 UTC m=+0.025141409) (total time: 30.000395886s):
  13. Trace[1427131847]: [30.000395886s] [30.000395886s] END
  14. E0126 10:49:28.943828 1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
  15. I0126 10:49:28.944259 1 trace.go:116] Trace[939984059]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105 (started: 2021-01-26 10:48:58.943549088 +0000 UTC m=+0.025275450) (total time: 30.000697364s):
  16. Trace[939984059]: [30.000697364s] [30.000697364s] END
  17. E0126 10:49:28.944269 1 reflector.go:153] pkg/mod/k8s.io/client-go@v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
  18. [INFO] plugin/ready: Still waiting on: "kubernetes"
  19. [INFO] plugin/ready: Still waiting on: "kubernetes"

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,434
    edited January 2021

    Hi @furqanbaqai,

    It seems the connection attempts are timing out, when accessing the kubernetes Service ClusterIP.
    Did you try deleting the Pod, and force the controller to replace it?
    What happens when you run curl https://10.96.0.1:443 and kubectl describe svc kubernetes ?

    Regards,
    -Chris

  • Hi @chrispokorni ,

    Thanks for the response. All coreDNS pods are running perfectly on the master node, when i delete any one of the pod, it is scheduled on the secondary node and this error comes. For the other questions:
    1. I did curl and i am getting following result:

    1. baqai@oftl-ub180464:/var/log$ curl https://10.96.0.1:443 -k
    2. {
    3. "kind": "Status",
    4. "apiVersion": "v1",
    5. "metadata": {
    6.  
    7. },
    8. "status": "Failure",
    9. "message": "forbidden: User \"system:anonymous\" cannot get path \"/\"",
    10. "reason": "Forbidden",
    11. "details": {
    12.  
    13. },
    14. "code": 403
    15. }
    1. Service kubernetes description:
    1. baqai@k8smaster:/var/log/calico/cni$ kubectl describe svc kubernetes
    2. Name: kubernetes
    3. Namespace: default
    4. Labels: component=apiserver
    5. provider=kubernetes
    6. Annotations: <none>
    7. Selector: <none>
    8. Type: ClusterIP
    9. IP: 10.96.0.1
    10. Port: https 443/TCP
    11. TargetPort: 6443/TCP
    12. Endpoints: 192.168.159.145:6443
    13. Session Affinity: None
    14. Events: <none>

    Just to highlight, the same pattern is observed in v.19.5

    Thanks in advance

  • Posts: 2,434

    Hi @furqanbaqai,

    Thanks for the detailed outputs. There are a few things here that seem to be causing your issues.

    1. The IP addresses of your hosts/Nodes/VMs are overlapping with the default Pod IP network managed by Calico, which is 192.168.0.0/16. There should be no overlap of any kind between Node IP network and Pod network. The recommendation is to either provision new VMs with IP addresses that do not overlap the Pod network 192.168.0.0/16, OR to re-deploy your cluster while re-configuring Calico and the kubeadm-config.yaml file with a new Pod network, in order to avoid overlaps.
    2. You may run into issues later on because of your host naming convention. k8smaster is intended to be used only as an alias for the control plane (which in early labs is represented by master1 node, and later by an entire cluster of 3 masters and an HAProxy server). You seem to have introduced in your environment k8smaster as the hostname of your master node also.

    Regards,
    -Chris

  • Hi @chrispokorni ,

    Thanks for your response. Let me try this out and provide you the feedback.

  • Hi @chrispokorni ,

    Thank you for your help and support. This is to confirm that by changing the cidr IP in my kubeadm-config.yaml file with an IP not conflicting with the IP range of the local vms, coreDNS is getting scheduled in second node as well. I'll proceed and upgrade the cluster to newer version according to the exercise.

    My kubeadm-config.yaml for reference along with the output of kubectl get po --all-namespaces:

    1. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    2. calico-kube-controllers-7dbc97f587-9p6v8 1/1 Running 0 16m 10.6.0.67 lfs25801 <none> <none>
    3. calico-node-9qfrw 1/1 Running 0 16m 192.168.159.145 lfs25801 <none> <none>
    4. calico-node-djdch 1/1 Running 0 7m17s 192.168.159.146 lfs25802 <none> <none>
    5. coredns-66bff467f8-2ljdn 1/1 Running 0 4m2s 10.6.0.193 lfs25802 <none> <none>
    6. coredns-66bff467f8-sp5pc 1/1 Running 0 18m 10.6.0.66 lfs25801 <none> <none>
    7. etcd-lfs25801 1/1 Running 0 18m 192.168.159.145 lfs25801 <none> <none>
    8. kube-apiserver-lfs25801 1/1 Running 0 18m 192.168.159.145 lfs25801 <none> <none>
    9. kube-controller-manager-lfs25801 1/1 Running 0 18m 192.168.159.145 lfs25801 <none> <none>
    10. kube-proxy-kxlht 1/1 Running 0 7m17s 192.168.159.146 lfs25802 <none> <none>
    11. kube-proxy-w9djn 1/1 Running 0 18m 192.168.159.145 lfs25801 <none> <none>
    12. kube-scheduler-lfs25801 1/1 Running 0 18m 192.168.159.145 lfs25801 <none> <none>
    1. apiVersion: kubeadm.k8s.io/v1beta2
    2. kind: ClusterConfiguration
    3. kubernetesVersion: 1.18.15
    4. controlPlaneEndpoint: "lfs25801:6443"
    5. networking:
    6. podSubnet: 10.6.0.0/24
  • Posts: 2,434

    Hi @furqanbaqai,

    The IP addresses of your pods and nodes look good this time.

    However, you may have missed 2 steps in Lab 3.1:
    1. Step 12, where an alias is mapped to the Private IP address of the Master Node in the /etc/hosts file. The same alias and IP pair is expected to be used later in Lab 3.2 Step 6 in the /etc/hosts file of the Minion Node.
    2. Step 13 where the alias (not the hostname of the Master Node) from Step 12 is included in the kubeadm-config.yaml manifest.

    For consistency, I presume the calico.yaml manifest has been updated with:

    1. - name: CALICO_IPV4POOL_CIDR
    2. value: "10.6.0.0/24"

    Regards,
    -Chris

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training