Lab 10.1 - need a little assistance - coredns issue?

slfav · August 2019

My post was too long with all the debugging info, so I attached the logs from the lab as a text file.

Any help would be appreciated, to my untrained eye it looks like coredns is confused by the lab machine having two NICs.

chrispokorni · August 2019

Hi, are your VM private IPs and the Pod network IPs overlapping in any way? If they are, then you would run into resolution problems. As far as the coredns pods, you can delete the ones in trouble and allow new ones to be created - this tends to fix minor coredns issues.

Regards,
-Chris

slfav · August 2019

192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

[kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
NAME                              HOSTS             ADDRESS   PORTS   AGE
ingress.extensions/ingress-test   www.example.com             80      5m7s
 
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE    SELECTOR
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        69d    <none>
service/secondapp    NodePort    10.107.187.19   <none>        80:31959/TCP   6m5s   app=secondapp
 
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES   SELECTOR
deployment.extensions/secondapp   1/1     1            1           6m18s   nginx        nginx    app=secondapp
 
NAME                   ENDPOINTS             AGE
endpoints/kubernetes   192.168.58.101:6443   69d
endpoints/secondapp    192.168.2.68:80       6m5s
 
NAME                             READY   STATUS    RESTARTS   AGE     IP             NODE               NOMINATED NODE   READINESS GATES
pod/secondapp-677d65d8bd-hlj48   1/1     Running   0          6m18s   192.168.2.68   knode3.k8s.local   <none>           <none>

Note that "address" is blank. This makes me think it might be some issue with coredns.

[kate@knode1 s_10]$ which kls
alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
    /usr/bin/kubectl
[kate@knode1 s_10]$ kls -A | grep dns
 
 
kube-system   service/kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   69d     k8s-app=kube-dns
 
kube-system   endpoints/kube-dns                  192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153                   69d
 
 
kube-system   deployment.extensions/coredns        1/2     2            1           69d     coredns        k8s.gcr.io/coredns:1.3.1   k8s-app=kube-dns
kube-system   pod/coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   343        63d     192.168.1.56     knode2.k8s.local   <none>           <none>
kube-system   pod/coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   292        46d     192.168.3.47     knode4.k8s.local   <none>           <none>
kube-system   pod/coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d     192.168.0.27     knode1.k8s.local   <none>           <none>

Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

This is me fumbling around trying to get dns logs:

[kate@knode1 s_10]$ kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
default       secondapp-677d65d8bd-hlj48                 1/1     Running            0          11m
kube-system   calico-node-79v6w                          2/2     Running            33         69d
kube-system   calico-node-7lntr                          2/2     Running            32         69d
kube-system   calico-node-dtjwb                          2/2     Running            29         69d
kube-system   calico-node-g2hrq                          2/2     Running            28         63d
kube-system   coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   344        63d
kube-system   coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   293        46d
kube-system   coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d
kube-system   etcd-knode1.k8s.local                      1/1     Running            15         63d
kube-system   kube-apiserver-knode1.k8s.local            1/1     Running            4          12d
kube-system   kube-controller-manager-knode1.k8s.local   1/1     Running            3          12d
kube-system   kube-proxy-9zv42                           1/1     Running            3          12d
kube-system   kube-proxy-d5tjx                           1/1     Running            3          12d
kube-system   kube-proxy-flmnl                           1/1     Running            2          12d
kube-system   kube-proxy-s9nhr                           1/1     Running            3          12d
kube-system   kube-scheduler-knode1.k8s.local            1/1     Running            3          12d
kube-system   traefik-ingress-controller-cxkcl           1/1     Running            0          10m
kube-system   traefik-ingress-controller-p49jh           1/1     Running            0          10m
kube-system   traefik-ingress-controller-v5jcg           1/1     Running            0          10m
[kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
error: the server doesn't have a resource type "logs"
[kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
.:53
2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
[kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
.:53
2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory

slfav · August 2019

192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

[kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
NAME                              HOSTS             ADDRESS   PORTS   AGE
ingress.extensions/ingress-test   www.example.com             80      5m7s
 
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE    SELECTOR
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        69d    <none>
service/secondapp    NodePort    10.107.187.19   <none>        80:31959/TCP   6m5s   app=secondapp
 
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES   SELECTOR
deployment.extensions/secondapp   1/1     1            1           6m18s   nginx        nginx    app=secondapp
 
NAME                   ENDPOINTS             AGE
endpoints/kubernetes   192.168.58.101:6443   69d
endpoints/secondapp    192.168.2.68:80       6m5s

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local

Note that "address" is blank. This makes me think it might be some issue with coredns.

[kate@knode1 s_10]$ which kls
alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
    /usr/bin/kubectl
[kate@knode1 s_10]$ kls -A | grep dns
 
 
kube-system   service/kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   69d     k8s-app=kube-dns
 
kube-system   endpoints/kube-dns                  192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153                   69d
 
 
kube-system   deployment.extensions/coredns        1/2     2            1           69d     coredns        k8s.gcr.io/coredns:1.3.1   k8s-app=kube-dns
kube-system   pod/coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   343        63d     192.168.1.56     knode2.k8s.local   <none>           <none>
kube-system   pod/coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   292        46d     192.168.3.47     knode4.k8s.local   <none>           <none>
kube-system   pod/coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d     192.168.0.27     knode1.k8s.local   <none>           <none>

Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

This is me fumbling around trying to get dns logs:

[kate@knode1 s_10]$ kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
default       secondapp-677d65d8bd-hlj48                 1/1     Running            0          11m
kube-system   calico-node-79v6w                          2/2     Running            33         69d
kube-system   calico-node-7lntr                          2/2     Running            32         69d
kube-system   calico-node-dtjwb                          2/2     Running            29         69d
kube-system   calico-node-g2hrq                          2/2     Running            28         63d
kube-system   coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   344        63d
kube-system   coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   293        46d
kube-system   coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d
kube-system   etcd-knode1.k8s.local                      1/1     Running            15         63d
kube-system   kube-apiserver-knode1.k8s.local            1/1     Running            4          12d
kube-system   kube-controller-manager-knode1.k8s.local   1/1     Running            3          12d
kube-system   kube-proxy-9zv42                           1/1     Running            3          12d
kube-system   kube-proxy-d5tjx                           1/1     Running            3          12d
kube-system   kube-proxy-flmnl                           1/1     Running            2          12d
kube-system   kube-proxy-s9nhr                           1/1     Running            3          12d
kube-system   kube-scheduler-knode1.k8s.local            1/1     Running            3          12d
kube-system   traefik-ingress-controller-cxkcl           1/1     Running            0          10m
kube-system   traefik-ingress-controller-p49jh           1/1     Running            0          10m
kube-system   traefik-ingress-controller-v5jcg           1/1     Running            0          10m
[kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
error: the server doesn't have a resource type "logs"
[kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
.:53
2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
[kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
.:53
2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory

slfav · August 2019

192.168.58.* is used for node communication. 10.* is used to get out to the internet. Here is some more info about the setup:

[kate@knode1 s_10]$ kubectl get ingresses,service,deployments,endpoints,pods -o wide
NAME                              HOSTS             ADDRESS   PORTS   AGE
ingress.extensions/ingress-test   www.example.com             80      5m7s
 
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE    SELECTOR
service/kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP        69d    <none>
service/secondapp    NodePort    10.107.187.19   <none>        80:31959/TCP   6m5s   app=secondapp
 
NAME                              READY   UP-TO-DATE   AVAILABLE   AGE     CONTAINERS   IMAGES   SELECTOR
deployment.extensions/secondapp   1/1     1            1           6m18s   nginx        nginx    app=secondapp
 
NAME                   ENDPOINTS             AGE
endpoints/kubernetes   192.168.58.101:6443   69d
endpoints/secondapp    192.168.2.68:80       6m5s

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/secondapp-677d65d8bd-hlj48 1/1 Running 0 6m18s 192.168.2.68 knode3.k8s.local

Note that "address" is blank. This makes me think it might be some issue with coredns.

[kate@knode1 s_10]$ which kls
alias kls='kubectl get ingress,services,endpoints,deployments,daemonsets,pods -o wide'
    /usr/bin/kubectl
[kate@knode1 s_10]$ kls -A | grep dns
 
 
kube-system   service/kube-dns                  ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   69d     k8s-app=kube-dns
 
kube-system   endpoints/kube-dns                  192.168.0.27:53,192.168.0.27:53,192.168.0.27:9153                   69d
 
 
kube-system   deployment.extensions/coredns        1/2     2            1           69d     coredns        k8s.gcr.io/coredns:1.3.1   k8s-app=kube-dns
kube-system   pod/coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   343        63d     192.168.1.56     knode2.k8s.local   <none>           <none>
kube-system   pod/coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   292        46d     192.168.3.47     knode4.k8s.local   <none>           <none>
kube-system   pod/coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d     192.168.0.27     knode1.k8s.local   <none>           <none>

Note that the cluster ip for kube-dns is on the 10 block, but everything else is on the 192 block. I don't know anything about coredns, but "CrashLoopBackOff" looks bad to me.

This is me fumbling around trying to get dns logs:

[kate@knode1 s_10]$ kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
default       secondapp-677d65d8bd-hlj48                 1/1     Running            0          11m
kube-system   calico-node-79v6w                          2/2     Running            33         69d
kube-system   calico-node-7lntr                          2/2     Running            32         69d
kube-system   calico-node-dtjwb                          2/2     Running            29         69d
kube-system   calico-node-g2hrq                          2/2     Running            28         63d
kube-system   coredns-5c98db65d4-ctnl7                   0/1     CrashLoopBackOff   344        63d
kube-system   coredns-5c98db65d4-dj9xg                   0/1     CrashLoopBackOff   293        46d
kube-system   coredns-fb8b8dccf-dzvxg                    1/1     Running            18         69d
kube-system   etcd-knode1.k8s.local                      1/1     Running            15         63d
kube-system   kube-apiserver-knode1.k8s.local            1/1     Running            4          12d
kube-system   kube-controller-manager-knode1.k8s.local   1/1     Running            3          12d
kube-system   kube-proxy-9zv42                           1/1     Running            3          12d
kube-system   kube-proxy-d5tjx                           1/1     Running            3          12d
kube-system   kube-proxy-flmnl                           1/1     Running            2          12d
kube-system   kube-proxy-s9nhr                           1/1     Running            3          12d
kube-system   kube-scheduler-knode1.k8s.local            1/1     Running            3          12d
kube-system   traefik-ingress-controller-cxkcl           1/1     Running            0          10m
kube-system   traefik-ingress-controller-p49jh           1/1     Running            0          10m
kube-system   traefik-ingress-controller-v5jcg           1/1     Running            0          10m
[kate@knode1 s_10]$ kubectl -n kube-system get logs coredns-fb8b8dccf-dzvxg
error: the server doesn't have a resource type "logs"
[kate@knode1 s_10]$ kubectl -n kube-system logs coredns-fb8b8dccf-dzvxg
.:53
2019-08-27T13:04:21.765Z [INFO] CoreDNS-1.3.1
2019-08-27T13:04:21.765Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-08-27T13:04:21.765Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
[kate@knode1 s_10]$ kubectl -n kube-system logs coredns-5c98db65d4-dj9xg
.:53
2019-08-27T13:46:47.540Z [INFO] CoreDNS-1.3.1
2019-08-27T13:46:47.540Z [INFO] linux/amd64, go1.11.4, 6b56a9c
CoreDNS-1.3.1
linux/amd64, go1.11.4, 6b56a9c
2019-08-27T13:46:47.540Z [INFO] plugin/reload: Running configuration MD5 = 599b9eb76b8c147408aed6a0bbe0f669
E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0827 13:47:12.540775       1 reflector.go:134] github.com/coredns/coredns/plugin/kubernetes/controller.go:322: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
log: exiting because of error: log: cannot create log: open /tmp/coredns.coredns-5c98db65d4-dj9xg.unknownuser.log.ERROR.20190827-134712.1: no such file or directory

chrispokorni · September 2019

Hi @slfav,

The IP range of the Nodes 192.168.58.x is overlapping with the IP range of the Pods 192.168.0.0/16, initialized when kubeadm init was issued and also configured in the calico network plugin YAML file.
The issue starts from a networking problem, and it reflects in the behavior of the DNS server as well. The DNS server is confused by the overlapping IP subnets.
I suggest comparing your IP ranges and then separating them to avoid overlapping subnets.

The web tool below shows the IP ranges for a particular cidr block, and could be a good starting point to troubleshoot the issue:
jodies.de/ipcalc

Where are the VMs running? Are they local VMs (a hypervisor on your workstation) or cloud instances?
Which version of the Lab book are you following?

Regards,
-Chris

slfav · September 2019

These machines are virtual machines on a local lan hosted with virtualbox.

I do not know what version the training is. It is whatever is published to the online course material.

Assuming I can adjust the ip ranges manually in virtualbox, or as root on the host, and they are all static ip addresses, what do I need to do to fix the host machine configuration so that they can resolve correctly?

I don't fully understand what calico is doing with the network. Here is what I think I know:

There are 4 nodes, each with two nics:

enp0s3: nat bridge outbound to internet
enp0s8: host-only communication for node to node traffic

What I am trying to achieve is internal traffic happening only on enp0s8, what files do I need to edit or what commands do I need to run on what machines to repair the installation? I can select a different network block for enp0s8 if need be if there is no way to change calico easily.

chrispokorni · September 2019

Hi @slfav ,
Calico's official documentation provides a guide on how to change the pod-network-cidr on a live cluster.

https://docs.projectcalico.org/v3.8/networking/changing-ip-pools

Regards,
-Chris

slfav · September 2019

When I first started the course, I was following the guide for k8s 1.14. At some point it was updated, so I ended up just rebuilding the whole lab using the updated guide for k8s 1.15 and using a single network card instead of two. Following the updated procedure, Traefik never did find the correct ip address for the ingress, but it did generate a service that worked and I do now have a working coredns, so I think I can continue with the new lab setup.

Lab 10.1 - need a little assistance - coredns issue?

Welcome!

Comments

Welcome!

Welcome!

Quick Links

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)