Welcome to the Linux Foundation Forum!

Lab 10.1 - need a little assistance - coredns issue?

My post was too long with all the debugging info, so I attached the logs from the lab as a text file.

Any help would be appreciated, to my untrained eye it looks like coredns is confused by the lab machine having two NICs.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,434

    Hi, are your VM private IPs and the Pod network IPs overlapping in any way? If they are, then you would run into resolution problems. As far as the coredns pods, you can delete the ones in trouble and allow new ones to be created - this tends to fix minor coredns issues.

    Regards,
    -Chris

  • Posts: 6

    192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

    1. [kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
    2. NAME HOSTS ADDRESS PORTS AGE
    3. ingress.extensions/ingress-test www.example.com 80 5m7s
    4.  
    5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
    6. service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 69d <none>
    7. service/secondapp NodePort 10.107.187.19 <none> 80:31959/TCP 6m5s app=secondapp
    8.  
    9. NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
    10. deployment.extensions/secondapp 1/1 1 1 6m18s nginx nginx app=secondapp
    11.  
    12. NAME ENDPOINTS AGE
    13. endpoints/kubernetes 192.168.58.101:6443 69d
    14. endpoints/secondapp 192.168.2.68:80 6m5s
    15.  
    16. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    17. pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local <none> <none>

    Note that "address" is blank. This makes me think it might be some issue with coredns.

    1. [kate@knode1 s_10]$ which kls
    2. alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
    3. /usr/bin/kubectl
    4. [kate@knode1 s_10]$ kls -A | grep dns
    5.  
    6.  
    7. kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 69d k8s-app=kube-dns
    8.  
    9. kube-system endpoints/kube-dns 192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153 69d
    10.  
    11.  
    12. kube-system deployment.extensions/coredns 1/2 2 1 69d coredns k8s.gcr.io/coredns:1.3.1 k8s-app=kube-dns
    13. kube-system pod/coredns-5c98db65d4-ctnl7 0/1 CrashLoopBackOff 343 63d 192.168.1.56 knode2.k8s.local <none> <none>
    14. kube-system pod/coredns-5c98db65d4-dj9xg 0/1 CrashLoopBackOff 292 46d 192.168.3.47 knode4.k8s.local <none> <none>
    15. kube-system pod/coredns-fb8b8dccf-dzvxg 1/1 Running 18 69d 192.168.0.27 knode1.k8s.local <none> <none>

    Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

    This is me fumbling around trying to get dns logs:

    1. [kate@knode1 s_10]$ kubectl get pods -A
    2. NAMESPACE NAME READY STATUS RESTARTS AGE
    3. default secondapp-677d65d8bd-hlj48 1/1 Running 0 11m
    4. kube-system calico-node-79v6w 2/2 Running 33 69d
    5. kube-system calico-node-7lntr 2/2 Running 32 69d
    6. kube-system calico-node-dtjwb 2/2 Running 29 69d
    7. kube-system calico-node-g2hrq 2/2 Running 28 63d
    8. kube-system coredns-5c98db65d4-ctnl7 0/1 CrashLoopBackOff 344 63d
    9. kube-system coredns-5c98db65d4-dj9xg 0/1 CrashLoopBackOff 293 46d
    10. kube-system coredns-fb8b8dccf-dzvxg 1/1 Running 18 69d
    11. kube-system etcd-knode1.k8s.local 1/1 Running 15 63d
    12. kube-system kube-apiserver-knode1.k8s.local 1/1 Running 4 12d
    13. kube-system kube-controller-manager-knode1.k8s.local 1/1 Running 3 12d
    14. kube-system kube-proxy-9zv42 1/1 Running 3 12d
    15. kube-system kube-proxy-d5tjx 1/1 Running 3 12d
    16. kube-system kube-proxy-flmnl 1/1 Running 2 12d
    17. kube-system kube-proxy-s9nhr 1/1 Running 3 12d
    18. kube-system kube-scheduler-knode1.k8s.local 1/1 Running 3 12d
    19. kube-system traefik-ingress-controller-cxkcl 1/1 Running 0 10m
    20. kube-system traefik-ingress-controller-p49jh 1/1 Running 0 10m
    21. kube-system traefik-ingress-controller-v5jcg 1/1 Running 0 10m
    22. [kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
    23. error: the server doesn't have a resource type "logs"
    24. [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
    25. .:53
    26. 2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
    27. 2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    28. CoreDNS-1.3.1
    29. linux/amd64, go1.11.4, 6b56a9c
    30. 2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    31. [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
    32. .:53
    33. 2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
    34. 2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    35. CoreDNS-1.3.1
    36. linux/amd64, go1.11.4, 6b56a9c
    37. 2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    38. E0827 13:47:12.540775 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    39. E0827 13:47:12.540775 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    40. log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory
  • Posts: 6

    192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

    1. [kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
    2. NAME HOSTS ADDRESS PORTS AGE
    3. ingress.extensions/ingress-test www.example.com 80 5m7s
    4.  
    5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
    6. service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 69d <none>
    7. service/secondapp NodePort 10.107.187.19 <none> 80:31959/TCP 6m5s app=secondapp
    8.  
    9. NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
    10. deployment.extensions/secondapp 1/1 1 1 6m18s nginx nginx app=secondapp
    11.  
    12. NAME ENDPOINTS AGE
    13. endpoints/kubernetes 192.168.58.101:6443 69d
    14. endpoints/secondapp 192.168.2.68:80 6m5s

    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local

    Note that "address" is blank. This makes me think it might be some issue with coredns.

    1. [kate@knode1 s_10]$ which kls
    2. alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
    3. /usr/bin/kubectl
    4. [kate@knode1 s_10]$ kls -A | grep dns
    5.  
    6.  
    7. kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 69d k8s-app=kube-dns
    8.  
    9. kube-system endpoints/kube-dns 192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153 69d
    10.  
    11.  
    12. kube-system deployment.extensions/coredns 1/2 2 1 69d coredns k8s.gcr.io/coredns:1.3.1 k8s-app=kube-dns
    13. kube-system pod/coredns-5c98db65d4-ctnl7 0/1 CrashLoopBackOff 343 63d 192.168.1.56 knode2.k8s.local <none> <none>
    14. kube-system pod/coredns-5c98db65d4-dj9xg 0/1 CrashLoopBackOff 292 46d 192.168.3.47 knode4.k8s.local <none> <none>
    15. kube-system pod/coredns-fb8b8dccf-dzvxg 1/1 Running 18 69d 192.168.0.27 knode1.k8s.local <none> <none>

    Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

    This is me fumbling around trying to get dns logs:

    1. [kate@knode1 s_10]$ kubectl get pods -A
    2. NAMESPACE NAME READY STATUS RESTARTS AGE
    3. default secondapp-677d65d8bd-hlj48 1/1 Running 0 11m
    4. kube-system calico-node-79v6w 2/2 Running 33 69d
    5. kube-system calico-node-7lntr 2/2 Running 32 69d
    6. kube-system calico-node-dtjwb 2/2 Running 29 69d
    7. kube-system calico-node-g2hrq 2/2 Running 28 63d
    8. kube-system coredns-5c98db65d4-ctnl7 0/1 CrashLoopBackOff 344 63d
    9. kube-system coredns-5c98db65d4-dj9xg 0/1 CrashLoopBackOff 293 46d
    10. kube-system coredns-fb8b8dccf-dzvxg 1/1 Running 18 69d
    11. kube-system etcd-knode1.k8s.local 1/1 Running 15 63d
    12. kube-system kube-apiserver-knode1.k8s.local 1/1 Running 4 12d
    13. kube-system kube-controller-manager-knode1.k8s.local 1/1 Running 3 12d
    14. kube-system kube-proxy-9zv42 1/1 Running 3 12d
    15. kube-system kube-proxy-d5tjx 1/1 Running 3 12d
    16. kube-system kube-proxy-flmnl 1/1 Running 2 12d
    17. kube-system kube-proxy-s9nhr 1/1 Running 3 12d
    18. kube-system kube-scheduler-knode1.k8s.local 1/1 Running 3 12d
    19. kube-system traefik-ingress-controller-cxkcl 1/1 Running 0 10m
    20. kube-system traefik-ingress-controller-p49jh 1/1 Running 0 10m
    21. kube-system traefik-ingress-controller-v5jcg 1/1 Running 0 10m
    22. [kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
    23. error: the server doesn't have a resource type "logs"
    24. [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
    25. .:53
    26. 2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
    27. 2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    28. CoreDNS-1.3.1
    29. linux/amd64, go1.11.4, 6b56a9c
    30. 2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    31. [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
    32. .:53
    33. 2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
    34. 2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    35. CoreDNS-1.3.1
    36. linux/amd64, go1.11.4, 6b56a9c
    37. 2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    38. E0827 13:47:12.540775 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    39. E0827 13:47:12.540775 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    40. log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory
  • Posts: 6

    192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

    1. [kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
    2. NAME HOSTS ADDRESS PORTS AGE
    3. ingress.extensions/ingress-test www.example.com 80 5m7s
    4.  
    5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
    6. service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 69d <none>
    7. service/secondapp NodePort 10.107.187.19 <none> 80:31959/TCP 6m5s app=secondapp
    8.  
    9. NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
    10. deployment.extensions/secondapp 1/1 1 1 6m18s nginx nginx app=secondapp
    11.  
    12. NAME ENDPOINTS AGE
    13. endpoints/kubernetes 192.168.58.101:6443 69d
    14. endpoints/secondapp 192.168.2.68:80 6m5s

    NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local

    Note that "address" is blank. This makes me think it might be some issue with coredns.

    1. [kate@knode1 s_10]$ which kls
    2. alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
    3. /usr/bin/kubectl
    4. [kate@knode1 s_10]$ kls -A | grep dns
    5.  
    6.  
    7. kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 69d k8s-app=kube-dns
    8.  
    9. kube-system endpoints/kube-dns 192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153 69d
    10.  
    11.  
    12. kube-system deployment.extensions/coredns 1/2 2 1 69d coredns k8s.gcr.io/coredns:1.3.1 k8s-app=kube-dns
    13. kube-system pod/coredns-5c98db65d4-ctnl7 0/1 CrashLoopBackOff 343 63d 192.168.1.56 knode2.k8s.local <none> <none>
    14. kube-system pod/coredns-5c98db65d4-dj9xg 0/1 CrashLoopBackOff 292 46d 192.168.3.47 knode4.k8s.local <none> <none>
    15. kube-system pod/coredns-fb8b8dccf-dzvxg 1/1 Running 18 69d 192.168.0.27 knode1.k8s.local <none> <none>

    Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

    This is me fumbling around trying to get dns logs:

    1. [kate@knode1 s_10]$ kubectl get pods -A
    2. NAMESPACE NAME READY STATUS RESTARTS AGE
    3. default secondapp-677d65d8bd-hlj48 1/1 Running 0 11m
    4. kube-system calico-node-79v6w 2/2 Running 33 69d
    5. kube-system calico-node-7lntr 2/2 Running 32 69d
    6. kube-system calico-node-dtjwb 2/2 Running 29 69d
    7. kube-system calico-node-g2hrq 2/2 Running 28 63d
    8. kube-system coredns-5c98db65d4-ctnl7 0/1 CrashLoopBackOff 344 63d
    9. kube-system coredns-5c98db65d4-dj9xg 0/1 CrashLoopBackOff 293 46d
    10. kube-system coredns-fb8b8dccf-dzvxg 1/1 Running 18 69d
    11. kube-system etcd-knode1.k8s.local 1/1 Running 15 63d
    12. kube-system kube-apiserver-knode1.k8s.local 1/1 Running 4 12d
    13. kube-system kube-controller-manager-knode1.k8s.local 1/1 Running 3 12d
    14. kube-system kube-proxy-9zv42 1/1 Running 3 12d
    15. kube-system kube-proxy-d5tjx 1/1 Running 3 12d
    16. kube-system kube-proxy-flmnl 1/1 Running 2 12d
    17. kube-system kube-proxy-s9nhr 1/1 Running 3 12d
    18. kube-system kube-scheduler-knode1.k8s.local 1/1 Running 3 12d
    19. kube-system traefik-ingress-controller-cxkcl 1/1 Running 0 10m
    20. kube-system traefik-ingress-controller-p49jh 1/1 Running 0 10m
    21. kube-system traefik-ingress-controller-v5jcg 1/1 Running 0 10m
    22. [kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
    23. error: the server doesn't have a resource type "logs"
    24. [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
    25. .:53
    26. 2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
    27. 2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    28. CoreDNS-1.3.1
    29. linux/amd64, go1.11.4, 6b56a9c
    30. 2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    31. [kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
    32. .:53
    33. 2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
    34. 2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
    35. CoreDNS-1.3.1
    36. linux/amd64, go1.11.4, 6b56a9c
    37. 2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
    38. E0827 13:47:12.540775 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    39. E0827 13:47:12.540775 1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
    40. log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory
  • Hi @slfav,

    The IP range of the Nodes 192.168.58.x is overlapping with the IP range of the Pods 192.168.0.0/16, initialized when kubeadm init was issued and also configured in the calico network plugin YAML file.
    The issue starts from a networking problem, and it reflects in the behavior of the DNS server as well. The DNS server is confused by the overlapping IP subnets.
    I suggest comparing your IP ranges and then separating them to avoid overlapping subnets.

    The web tool below shows the IP ranges for a particular cidr block, and could be a good starting point to troubleshoot the issue:
    jodies.de/ipcalc

    Where are the VMs running? Are they local VMs (a hypervisor on your workstation) or cloud instances?
    Which version of the Lab book are you following?

    Regards,
    -Chris

  • These machines are virtual machines on a local lan hosted with virtualbox.

    I do not know what version the training is. It is whatever is published to the online course material.

    Assuming I can adjust the ip ranges manually in virtualbox, or as root on the host, and they are all static ip addresses, what do I need to do to fix the host machine configuration so that they can resolve correctly?

    I don't fully understand what calico is doing with the network. Here is what I think I know:

    There are 4 nodes, each with two nics:

    enp0s3: nat bridge outbound to internet
    enp0s8: host-only communication for node to node traffic

    What I am trying to achieve is internal traffic happening only on enp0s8, what files do I need to edit or what commands do I need to run on what machines to repair the installation? I can select a different network block for enp0s8 if need be if there is no way to change calico easily.

  • Hi @slfav ,
    Calico's official documentation provides a guide on how to change the pod-network-cidr on a live cluster.

    https://docs.projectcalico.org/v3.8/networking/changing-ip-pools

    Regards,
    -Chris

  • When I first started the course, I was following the guide for k8s 1.14. At some point it was updated, so I ended up just rebuilding the whole lab using the updated guide for k8s 1.15 and using a single network card instead of two. Following the updated procedure, Traefik never did find the correct ip address for the ingress, but it did generate a service that worked and I do now have a working coredns, so I think I can continue with the new lab setup.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training