Welcome to the Linux Foundation Forum!

Lab 10.1 - need a little assistance - coredns issue?

Options

My post was too long with all the debugging info, so I attached the logs from the lab as a text file.

Any help would be appreciated, to my untrained eye it looks like coredns is confused by the lab machine having two NICs.

Comments

  • chrispokorni
    chrispokorni Posts: 2,165
    Options

    Hi, are your VM private IPs and the Pod network IPs overlapping in any way? If they are, then you would run into resolution problems. As far as the coredns pods, you can delete the ones in trouble and allow new ones to be created - this tends to fix minor coredns issues.

    Regards,
    -Chris

  • slfav
    slfav Posts: 6
    Options

    192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

    [kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
    NAME                              HOSTS             ADDRESS   PORTS   AGE
    ingress.extensions/ingress-test   www.example.com             80      5m7s
    
    NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE    SELECTOR
    service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        69d    <none>
    service/secondapp    NodePort    10.107.187.19   <none>        80:31959/TCP   6m5s   app=secondapp
    
    NAME                              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES   SELECTOR
    deployment.extensions/secondapp   1/1     1            1           6m18s   nginx        nginx    app=secondapp
    
    NAME                   ENDPOINTS             AGE
    endpoints/kubernetes   192.168.58.101:6443   69d
    endpoints/secondapp    192.168.2.68:80       6m5s
    
    NAME                             READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
    pod/secondapp-677d65d8bd-hlj48   1/1     Running   0          6m18s   192.168.2.68   knode3.k8s.local   <none>           <none>
    

    Note that "address" is blank. This makes me think it might be some issue with coredns.

    [kate@knode1 s_10]$ which kls
    alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
        /usr/bin/kubectl
    [kate@knode1 s_10]$ kls -A | grep dns
    
    
    kube-system   service/kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   69d     k8s-app=kube-dns
    
    kube-system   endpoints/kube-dns                  192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153                   69d
    
    
    kube-system   deployment.extensions/coredns        1/2     2            1           69d     coredns        k8s.gcr.io/coredns:1.3.1   k8s-app=kube-dns
    kube-system   pod/coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   343        63d     192.168.1.56     knode2.k8s.local   <none>           <none>
    kube-system   pod/coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   292        46d     192.168.3.47     knode4.k8s.local   <none>           <none>
    kube-system   pod/coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d     192.168.0.27     knode1.k8s.local   <none>           <none>
    

    Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

    This is me fumbling around trying to get dns logs:

    [kate@knode1 s_10]$ kubectl get pods -A
    NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
    default       secondapp-677d65d8bd-hlj48                 1/1     Running            0          11m
    kube-system   calico-node-79v6w                          2/2     Running            33         69d
    kube-system   calico-node-7lntr                          2/2     Running            32         69d
    kube-system   calico-node-dtjwb                          2/2     Running            29         69d
    kube-system   calico-node-g2hrq                          2/2     Running            28         63d
    kube-system   coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   344        63d
    kube-system   coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   293        46d
    kube-system   coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d
    kube-system   etcd-knode1.k8s.local                      1/1     Running            15         63d
    kube-system   kube-apiserver-knode1.k8s.local            1/1     Running            4          12d
    kube-system   kube-controller-manager-knode1.k8s.local   1/1     Running            3          12d
    kube-system   kube-proxy-9zv42                           1/1     Running            3          12d
    kube-system   kube-proxy-d5tjx                           1/1     Running            3          12d
    kube-system   kube-proxy-flmnl                           1/1     Running            2          12d
    kube-system   kube-proxy-s9nhr                           1/1     Running            3          12d
    kube-system   kube-scheduler-knode1.k8s.local            1/1     Running            3          12d
    kube-system   traefik-ingress-controller-cxkcl           1/1     Running            0          10m
    kube-system   traefik-ingress-controller-p49jh           1/1     Running            0          10m
    kube-system   traefik-ingress-controller-v5jcg           1/1     Running            0          10m
    [kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
    error: the server doesn't have a resource type "logs"
    [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
    .:53
    2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
    2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    CoreDNS-1.3.1
    linux/amd64, go1.11.4, 6b56a9c
    2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
    .:53
    2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
    2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    CoreDNS-1.3.1
    linux/amd64, go1.11.4, 6b56a9c
    2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory
    
  • slfav
    slfav Posts: 6
    Options

    192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

    [kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
    NAME                              HOSTS             ADDRESS   PORTS   AGE
    ingress.extensions/ingress-test   www.example.com             80      5m7s
    
    NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE    SELECTOR
    service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        69d    <none>
    service/secondapp    NodePort    10.107.187.19   <none>        80:31959/TCP   6m5s   app=secondapp
    
    NAME                              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES   SELECTOR
    deployment.extensions/secondapp   1/1     1            1           6m18s   nginx        nginx    app=secondapp
    
    NAME                   ENDPOINTS             AGE
    endpoints/kubernetes   192.168.58.101:6443   69d
    endpoints/secondapp    192.168.2.68:80       6m5s
    

    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local

    Note that "address" is blank. This makes me think it might be some issue with coredns.

    [kate@knode1 s_10]$ which kls
    alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
        /usr/bin/kubectl
    [kate@knode1 s_10]$ kls -A | grep dns
    
    
    kube-system   service/kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   69d     k8s-app=kube-dns
    
    kube-system   endpoints/kube-dns                  192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153                   69d
    
    
    kube-system   deployment.extensions/coredns        1/2     2            1           69d     coredns        k8s.gcr.io/coredns:1.3.1   k8s-app=kube-dns
    kube-system   pod/coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   343        63d     192.168.1.56     knode2.k8s.local   <none>           <none>
    kube-system   pod/coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   292        46d     192.168.3.47     knode4.k8s.local   <none>           <none>
    kube-system   pod/coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d     192.168.0.27     knode1.k8s.local   <none>           <none>
    

    Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

    This is me fumbling around trying to get dns logs:

    [kate@knode1 s_10]$ kubectl get pods -A
    NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
    default       secondapp-677d65d8bd-hlj48                 1/1     Running            0          11m
    kube-system   calico-node-79v6w                          2/2     Running            33         69d
    kube-system   calico-node-7lntr                          2/2     Running            32         69d
    kube-system   calico-node-dtjwb                          2/2     Running            29         69d
    kube-system   calico-node-g2hrq                          2/2     Running            28         63d
    kube-system   coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   344        63d
    kube-system   coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   293        46d
    kube-system   coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d
    kube-system   etcd-knode1.k8s.local                      1/1     Running            15         63d
    kube-system   kube-apiserver-knode1.k8s.local            1/1     Running            4          12d
    kube-system   kube-controller-manager-knode1.k8s.local   1/1     Running            3          12d
    kube-system   kube-proxy-9zv42                           1/1     Running            3          12d
    kube-system   kube-proxy-d5tjx                           1/1     Running            3          12d
    kube-system   kube-proxy-flmnl                           1/1     Running            2          12d
    kube-system   kube-proxy-s9nhr                           1/1     Running            3          12d
    kube-system   kube-scheduler-knode1.k8s.local            1/1     Running            3          12d
    kube-system   traefik-ingress-controller-cxkcl           1/1     Running            0          10m
    kube-system   traefik-ingress-controller-p49jh           1/1     Running            0          10m
    kube-system   traefik-ingress-controller-v5jcg           1/1     Running            0          10m
    [kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
    error: the server doesn't have a resource type "logs"
    [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
    .:53
    2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
    2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    CoreDNS-1.3.1
    linux/amd64, go1.11.4, 6b56a9c
    2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
    .:53
    2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
    2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    CoreDNS-1.3.1
    linux/amd64, go1.11.4, 6b56a9c
    2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory
    
  • slfav
    slfav Posts: 6
    Options

    192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

    [kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
    NAME                              HOSTS             ADDRESS   PORTS   AGE
    ingress.extensions/ingress-test   www.example.com             80      5m7s
    
    NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE    SELECTOR
    service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        69d    <none>
    service/secondapp    NodePort    10.107.187.19   <none>        80:31959/TCP   6m5s   app=secondapp
    
    NAME                              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES   SELECTOR
    deployment.extensions/secondapp   1/1     1            1           6m18s   nginx        nginx    app=secondapp
    
    NAME                   ENDPOINTS             AGE
    endpoints/kubernetes   192.168.58.101:6443   69d
    endpoints/secondapp    192.168.2.68:80       6m5s
    

    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local

    Note that "address" is blank. This makes me think it might be some issue with coredns.

    [kate@knode1 s_10]$ which kls
    alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
        /usr/bin/kubectl
    [kate@knode1 s_10]$ kls -A | grep dns
    
    
    kube-system   service/kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   69d     k8s-app=kube-dns
    
    kube-system   endpoints/kube-dns                  192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153                   69d
    
    
    kube-system   deployment.extensions/coredns        1/2     2            1           69d     coredns        k8s.gcr.io/coredns:1.3.1   k8s-app=kube-dns
    kube-system   pod/coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   343        63d     192.168.1.56     knode2.k8s.local   <none>           <none>
    kube-system   pod/coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   292        46d     192.168.3.47     knode4.k8s.local   <none>           <none>
    kube-system   pod/coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d     192.168.0.27     knode1.k8s.local   <none>           <none>
    

    Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

    This is me fumbling around trying to get dns logs:

    [kate@knode1 s_10]$ kubectl get pods -A
    NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
    default       secondapp-677d65d8bd-hlj48                 1/1     Running            0          11m
    kube-system   calico-node-79v6w                          2/2     Running            33         69d
    kube-system   calico-node-7lntr                          2/2     Running            32         69d
    kube-system   calico-node-dtjwb                          2/2     Running            29         69d
    kube-system   calico-node-g2hrq                          2/2     Running            28         63d
    kube-system   coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   344        63d
    kube-system   coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   293        46d
    kube-system   coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d
    kube-system   etcd-knode1.k8s.local                      1/1     Running            15         63d
    kube-system   kube-apiserver-knode1.k8s.local            1/1     Running            4          12d
    kube-system   kube-controller-manager-knode1.k8s.local   1/1     Running            3          12d
    kube-system   kube-proxy-9zv42                           1/1     Running            3          12d
    kube-system   kube-proxy-d5tjx                           1/1     Running            3          12d
    kube-system   kube-proxy-flmnl                           1/1     Running            2          12d
    kube-system   kube-proxy-s9nhr                           1/1     Running            3          12d
    kube-system   kube-scheduler-knode1.k8s.local            1/1     Running            3          12d
    kube-system   traefik-ingress-controller-cxkcl           1/1     Running            0          10m
    kube-system   traefik-ingress-controller-p49jh           1/1     Running            0          10m
    kube-system   traefik-ingress-controller-v5jcg           1/1     Running            0          10m
    [kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
    error: the server doesn't have a resource type "logs"
    [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
    .:53
    2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
    2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    CoreDNS-1.3.1
    linux/amd64, go1.11.4, 6b56a9c
    2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
    .:53
    2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
    2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    CoreDNS-1.3.1
    linux/amd64, go1.11.4, 6b56a9c
    2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory
    
  • chrispokorni
    Options

    Hi @slfav,

    The IP range of the Nodes 192.168.58.x is overlapping with the IP range of the Pods 192.168.0.0/16, initialized when kubeadm init was issued and also configured in the calico network plugin YAML file.
    The issue starts from a networking problem, and it reflects in the behavior of the DNS server as well. The DNS server is confused by the overlapping IP subnets.
    I suggest comparing your IP ranges and then separating them to avoid overlapping subnets.

    The web tool below shows the IP ranges for a particular cidr block, and could be a good starting point to troubleshoot the issue:
    jodies.de/ipcalc

    Where are the VMs running? Are they local VMs (a hypervisor on your workstation) or cloud instances?
    Which version of the Lab book are you following?

    Regards,
    -Chris

  • slfav
    Options

    These machines are virtual machines on a local lan hosted with virtualbox.

    I do not know what version the training is. It is whatever is published to the online course material.

    Assuming I can adjust the ip ranges manually in virtualbox, or as root on the host, and they are all static ip addresses, what do I need to do to fix the host machine configuration so that they can resolve correctly?

    I don't fully understand what calico is doing with the network. Here is what I think I know:

    There are 4 nodes, each with two nics:

    enp0s3: nat bridge outbound to internet
    enp0s8: host-only communication for node to node traffic

    What I am trying to achieve is internal traffic happening only on enp0s8, what files do I need to edit or what commands do I need to run on what machines to repair the installation? I can select a different network block for enp0s8 if need be if there is no way to change calico easily.

  • chrispokorni
    Options

    Hi @slfav ,
    Calico's official documentation provides a guide on how to change the pod-network-cidr on a live cluster.

    https://docs.projectcalico.org/v3.8/networking/changing-ip-pools

    Regards,
    -Chris

  • slfav
    Options

    When I first started the course, I was following the guide for k8s 1.14. At some point it was updated, so I ended up just rebuilding the whole lab using the updated guide for k8s 1.15 and using a single network card instead of two. Following the updated procedure, Traefik never did find the correct ip address for the ingress, but it did generate a service that worked and I do now have a working coredns, so I think I can continue with the new lab setup.

Categories

Upcoming Training