Welcome to the Linux Foundation Forum!

Lab 3.3 - coredns CrashLoopBackoff

Following hardware issues I had to reinstall Kubernetes master and workers on another PC. I'm running a Ubuntu 20.04 based host with QEMU / kvm Ubuntu 18.04 server guests: master, worker1 to worker4.

Things I did in addition to the lab tutorial: Comment out swap creation in /etc/fstab. Networking is done by NetworkManager using static IP. The master also runs a bind9 DNS server (see further down).

All VMs are connected to a bridged network bridge0. ufw firewall is disabled on the VMs and the host.

Here is the output of

  1. kubectl get pods --all-namespaces
  2. NAMESPACE NAME READY STATUS RESTARTS AGE
  3. kube-system calico-kube-controllers-578894d4cd-mrj4l 1/1 Running 4 25h
  4. kube-system calico-node-9vsvc 0/1 Init:CrashLoopBackOff 7 25h
  5. kube-system calico-node-g9q9x 0/1 Running 4 134m
  6. kube-system calico-node-knppq 0/1 Completed 2 93m
  7. kube-system calico-node-wpfzq 1/1 Running 4 24h
  8. kube-system coredns-66bff467f8-gqgt9 0/1 Completed 0 57m
  9. kube-system coredns-66bff467f8-qnsjk 0/1 CrashLoopBackOff 11 56m
  10. kube-system etcd-master 1/1 Running 4 25h
  11. kube-system kube-apiserver-master 1/1 Running 6 25h
  12. kube-system kube-controller-manager-master 1/1 Running 7 25h
  13. kube-system kube-proxy-8wshb 1/1 Running 4 134m
  14. kube-system kube-proxy-gxnjw 0/1 Error 2 93m
  15. kube-system kube-proxy-hr92t 1/1 Running 4 24h
  16. kube-system kube-proxy-z8cx6 1/1 Running 4 25h
  17. kube-system kube-scheduler-master 1/1 Running 7 25h

I now disabled the ufw firewall on the host and deleted the calico-node... nodes. Then I deleted the coredns-... nodes and this is the result:

  1. kubectl get pods --all-namespaces
  2. NAMESPACE NAME READY STATUS RESTARTS AGE
  3. kube-system calico-kube-controllers-578894d4cd-mrj4l 1/1 Running 4 25h
  4. kube-system calico-node-knppq 1/1 Running 4 110m
  5. kube-system calico-node-kt5lh 1/1 Running 0 2m25s
  6. kube-system calico-node-wpfzq 1/1 Running 4 24h
  7. kube-system calico-node-z8h2t 1/1 Running 0 107s
  8. kube-system coredns-66bff467f8-9sjgq 0/1 CrashLoopBackOff 1 16s
  9. kube-system coredns-66bff467f8-hfk5b 0/1 Running 0 33s

I removed the bind9 DNS server on master but this led to other problems, among others it sometimes resolves names, at other times not. Right now name resolving doesn't work, though I tried to reverse the steps and have systemd-resolve up and running.

I guess I will be reinstalling the host, then the VMs and see if that solves the issues. I'm afraid the bind9 server on the master VM didn't help.

The other possible issue could be libvirt networking. I had manually configured a bridged network which usually works fine when editing the xml guest config files to enable bridged networking. Next time I will try to configure the bridge within virt-manager and see if it makes a difference.

Any suggestions as to the above CrashLoopBackoff errors are welcome. Perhaps I'm looking in the wrong place altogether.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 1,000

    Hello Heiko,

    I would agree with you that the issue is tied to own QEMU/KVM is passing the traffic. With both calico-node and coredns failing, I would guess on the worker node, they host is not properly passing all the network traffic back to the master. As the image was loaded by Docker, we can tell that overall the nodes have access to the Internet, so the issue may only be between nodes.

    If you start nginx or busybox with many replicas, do they reach Ready state on both the master and the worker? Does the output of kubectl get pod -o wide show any other issues that only happen on the worker node?

    To troubleshoot the issue I would start a wireshark on the primary interface of each node. Is there only one interface per vm? Multiple interfaces can be an issue as well. If you terminate calico or coredns pod on the worker, you should see traffic going from worker to master. Of course it is all using TLS, so you could add these two flags to --insecure-port to set a port which will be bound in insecure mode and set the Interface/IP to use with --insecure-bind-address

    You can also set the --bind-interface if you have multiple interfaces on the nodes, to narrow down where traffic goes.

    I have a feeling that the calico and coredns traffic is not being sent to the master. If other pod traffic works, if you can get them running, I'd lean towards a bug.

    Another idea is to put in a virtual switch with OvS. If it works, then we know the issue is QEMU/KVM networking and can explore options to changing the network type inside QEMU.

    Regards,

  • Posts: 1,000

    Hello Heiko,

    I spun up two U18 vms on my RHEL 8 system. Fresh install, then setup the cluster. No issue. Could be either something buggy in U20 (which uses SELinux AND apparmor and may have eBPF in there, or something about the networking of your QEMU/KVM instances. This is what I see:

  • Posts: 99

    Hello Tim, thanks for the detailed answers and the effort to replicate the problem.

    In the meantime I reinstalled the host and the master. Since I already had installed a master and multiple workers on another PC and that went fine, I started to suspect some VM config issues. In fact I had taken the configs from my other PC without change and noticed that I had over-provisioned the VMs. The PC I'm using now has only 32Gig memory and 6 cores / 12 threads, making it borderline specs for a master and 4 workers. In fact, the whole PC would freeze at times, often followed by a crash of a worker VM.

    I changed the VM configs but still get errors. So now I deleted the nodes and will try to recreate them.

    I had configured my nodes to use static IPs so I could easily access them via SSH. After I modified /etc/netplan/0...yaml to activate NetworkManager, I used nmcli to configure the VM network interfaces. Here an example for the master:

    nmcli con add con-name static0 type ethernet ifname enp1s0 autoconnect yes ipv4.addresses 192.168.0.130/24 gw4 192.168.0.1 ipv4.method manual ipv4.dns 8.8.8.8

    Unfortunately this didn't solve the issues. I then created a bridge on the host (bridge0), which is used in the VM configurations:

    In the past I had always configured a bridge on the host to be used for communication between VMs and host and VM to VM. Never had any issue. Seeing that you used virt-manager, I have some questions:

    1. Did you setup a virtual network in virt-manager, other than the default network with virbr0 device that uses DHCP range 192.168.122.2 - 192.168.122.254?
    2. Did you create a bridge on the host, or left networking at default?

    I will first kubectl delete node worker1... and recreate them to see if that helps. If not I'm going grudgingly to try the default network setup with DHCP.

    Thanks again for the help.

  • Posts: 99

    Seems like I created a mess that needed some cleaning up. If you look at my IP range for the VMs (host: 192.168.0.129, VMs: 192.168.0.130-134), it's the same as the calico range (192.168.0.0).

    I edited the calico.yaml and kubeadm-config.yaml files and changed the address range to 192.168.1.0/16 (in the calico file that meant to uncomment two lines).

    Since I had used kubeadm reset on the master and the workers several times, I finally RTFM-ed the output of that command. It reminds us to remove the files in /etc/cni/net.d/*, but more importantly it mentions that the iptables rules are NOT deleted. Moreover, /var/lib contains remnants of the deployment that should be removed.

    So here is what I did on the master and each worker:

    1. kubeadm reset
    2. rm /etc/cni/net.d/*
    3. cd /var/lib
    4. ls
    5. rm -rf calico
    6. rm -rf cni
    7. rm -rf kubelet
    8. cd /etc/kubernetes/
    9. ls
    10. ls manifests/
    11. ls pki/
    12. cd
    13. iptables -F
    14. iptables -L
    15. systemctl restart docker.service
    16. iptables -L
    17. kubeadm join k8smaster:6443 --token 4id98t.tzleaeq49ew8ahgm --discovery-token-ca-cert-hash sha256:1727ba505a5fe1dec308530497c56109bb7c92263c1464e78b6f19401ae1ec23
    18.  

    iptables -F flushes the iptables rules. The restart of docker.service is necessary to create new iptables rules for docker.

    Here the result:

    1. kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. master Ready master 51m v1.18.1
    4. worker1 Ready <none> 35m v1.18.1
    5. worker2 Ready <none> 28m v1.18.1
    6. worker3 Ready <none> 8m45s v1.18.1
    7. worker4 Ready <none> 3m12s v1.18.1

    Unfortunately my hardware cannot handle more than 4 nodes, as it freezes and crashes a worker node once I try to run the 5th node (4th worker node).

    1. kubectl get pods --all-namespaces
    2. NAMESPACE NAME READY STATUS RESTARTS AGE
    3. kube-system calico-kube-controllers-578894d4cd-dtdzx 1/1 Running 0 41m
    4. kube-system calico-node-2x7qz 1/1 Running 0 4m40s
    5. kube-system calico-node-4qqlc 1/1 Running 0 41m
    6. kube-system calico-node-lgdhf 0/1 Running 1 30m
    7. kube-system calico-node-nhr62 1/1 Running 0 36m
    8. kube-system calico-node-w29hn 1/1 Running 0 10m
    9. kube-system coredns-66bff467f8-2x44x 1/1 Running 0 52m
    10. kube-system coredns-66bff467f8-9s6hc 1/1 Running 0 52m
    11. kube-system etcd-master 1/1 Running 0 52m
    12. kube-system kube-apiserver-master 1/1 Running 0 52m
    13. kube-system kube-controller-manager-master 1/1 Running 2 52m
    14. kube-system kube-proxy-c45nx 1/1 Running 0 36m
    15. kube-system kube-proxy-d2z9k 1/1 Running 1 30m
    16. kube-system kube-proxy-rpmtk 1/1 Running 0 10m
    17. kube-system kube-proxy-tjwc5 1/1 Running 0 4m40s
    18. kube-system kube-proxy-z2ph4 1/1 Running 0 52m
    19. kube-system kube-scheduler-master 1/1 Running 2

    Still I'd say it works as advertised. I will remove the 4th worker and see how stable it is.

  • Posts: 99

    So I got rid of worker4:

    kubectl delete nodes worker4

    But that doesn't seem the right way, as calico still runs pods for the removed node:

    1. NAMESPACE NAME READY STATUS RESTARTS AGE
    2. kube-system calico-kube-controllers-578894d4cd-dtdzx 1/1 Running 1 57m
    3. kube-system calico-node-4qqlc 1/1 Running 1 57m
    4. kube-system calico-node-lgdhf 0/1 Init:CrashLoopBackOff 2 45m
    5. kube-system calico-node-nhr62 0/1 Running 2 52m
    6. kube-system calico-node-w29hn 0/1 Running 1 25m
    7. kube-system coredns-66bff467f8-2x44x 1/1 Running 1 67m
    8. kube-system coredns-66bff467f8-9s6hc 1/1 Running 1 67m

    This also affects the running pods, as you can see above 0/1 ready state. So I deleted the pods which finally gave me 1/1 results:

    1. NAMESPACE NAME READY STATUS RESTARTS AGE
    2. kube-system calico-kube-controllers-578894d4cd-dtdzx 1/1 Running 1 73m
    3. kube-system calico-node-4qqlc 1/1 Running 1 73m
    4. kube-system calico-node-57cpc 1/1 Running 0 62s
    5. kube-system calico-node-pqg6d 1/1 Running 0 62s
    6. kube-system calico-node-rlbtk 1/1 Running 0 102s
    7. kube-system coredns-66bff467f8-2x44x 1/1 Running 1 84m
    8. kube-system coredns-66bff467f8-9s6hc 1/1 Running 1 84m
  • Posts: 1,000

    Great Heiko! I'm glad you got it working.

    Sounds like a combo between left over config files and some resource issues. Congrats, you found the two most difficult things to troubleshoot :smile:

    If you want to add the fifth node, one idea would be to lower the resources on all the worker and the proxy to fit. If you don't have any big deployments the worker node and the HAProxy node don't use all that much resources. I always suggest folks use the same size as there is less chance of running out of resources with a high replica count, but the lab should work with smaller worker/proxy.

    Regards,

  • Posts: 99

    Now that the cluster is running I've been testing it until the smoke comes out.

    In exercise 3.4 I deployed the nginx container and web service. However, when I tried to access the nginx pod on worker1 from the master using curl IP-address it didn't work. I then realised that there was no tunl0 interface on the master. The workers are fine and all have the tunl0 interface. I suppose this isn't normal? I probably forgot to delete some stuff before I ran again the kubeadm init command. Is there a simple way to (re)activate the tunl0 interface?

    Having fun with running
    for i in {1..10000}; do curl 10.110.201.77:80 &>/dev/null; done

    in one terminal window, watching tcpdump in another
    sudo tcpdump -i tunl0

    and watching and deleting the pods in a third window

    1. kubectl get pods -o wide
    2. kubectl delete pod nginx-d46f5678b-xwdlj
    3.  

    The IP address in the curl command is the cluster-ip so the tcpdump shows different IPs for the end-points. In the worst case I managed to create a "service blackout" of 5-7 seconds in between pod deletion and recreation. In some instances the curl command would employ 2 nodes.

  • Posts: 99

    One more thing: You mentioned multiple interfaces could be an issue. Well, here is what I have on the master and the workers:

    1. nmcli con show
    2. NAME UUID TYPE DEVICE
    3. docker0 fa09a716-5e2c-4f26-9be4-20641e117a28 bridge docker0
    4. static2 ce36f612-d8da-423b-bc98-cde20aec5de7 ethernet enp1s0

    There are two interfaces, one that I created (static2) and one that docker created. Is that OK like that?

  • Posts: 2,451
    edited August 2020

    Hi @heiko_s,

    Once installed, Docker will create the docker bridge on the nodes where it is running. However, that bridge does not get utilized by Kubernetes, as it uses a third-party networking plugin. In other words, the docker bridge is harmless.

    Regards,
    -Chris

  • Posts: 1,000

    Hello,

    indeed. I was speaking of outbound interfaces, something like having eth0 and eth1 etc. But I don't think that's the case with your setup, but I there are extra considerations if your instances have multiple outbound interfaces.

    I have found that an existing cluster which goes through a kubeadm reset, will have odd networking issues. I would rebuild with freshly installed VMs. That way you know what you are working with is a typcial setup instead of an unknown and little experienced situation of kubeadm reset plus some less typical networking configuration.

    Regards,

  • Posts: 99

    Thanks Chris for clarifying the Docker bridge. Makes sense.

    Tim, today I booted the PC and VMs and the master now has a tunl0 interface. At first everything seems to be running fine:

    1. kubectl get nodes -o wide
    2. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    3. master Ready master 19h v1.18.1 192.168.0.130 <none> Ubuntu 18.04.5 LTS 4.15.0-112-generic docker://19.3.6
    4. worker1 Ready <none> 19h v1.18.1 192.168.0.131 <none> Ubuntu 18.04.5 LTS 4.15.0-112-generic docker://19.3.6
    5. worker2 Ready <none> 19h v1.18.1 192.168.0.132 <none> Ubuntu 18.04.5 LTS 4.15.0-112-generic docker://19.3.6
    6. worker3 Ready <none> 18h v1.18.1 192.168.0.133 <none> Ubuntu 18.04.5 LTS 4.15.0-112-generic docker://19.3.6
    1. kubectl get pods -o wide
    2. NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    3. nginx-d46f5678b-2mk4s 1/1 Running 1 11h 192.168.189.70 worker2 <none> <none>
    4. nginx-d46f5678b-d9v2v 1/1 Running 1 11h 192.168.189.69 worker2 <none> <none>
    5. nginx-d46f5678b-js92b 1/1 Running 1 10h 192.168.235.137 worker1 <none> <none>
    1. kubectl get service nginx
    2. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
    3. nginx LoadBalancer 10.105.137.239 <pending> 80:30747/TCP 10h
    1. kubectl get ep nginx
    2. NAME ENDPOINTS AGE
    3. nginx 192.168.189.69:80,192.168.189.70:80,192.168.235.137:80 10h
    1. kubectl get deployments.apps nginx -o wide
    2. NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
    3. nginx 3/3 3 3 13h nginx nginx app=nginx
    1. heiko@master:~$ curl 10.105.137.239:80
    2. <!DOCTYPE html>
    3. <html>
    4. <head>
    5. <title>Welcome to nginx!</title>

    "oldy" is the host:

    1. heiko@oldy:~$ curl 192.168.0.130:30747
    2. <!DOCTYPE html>
    3. <html>
    4. <head>
    5. <title>Welcome to nginx!</title>

    nginx is running on worker1 and worker2. I can access the webpage from the master now, using the cluster-ip. Access also works from outside (from host "oldy") using the LoadBalancer.

    As I type this, the host again froze and I got a OOM and worker1 was killed (qemu-... process). I don't understand why a worker running in a VM would cause an OOM on the host. Need to look into that. Any ideas are welcome.

    P.S.: All nodes / pods were running idle - no work load except kubernetes and the Firefox browser on the host.

    P.S.S.: Something is wrong with this web forum page. I get delays and what not and when accessing it from the Macbook it warns me that the page uses a lot of energy and suggests to close the window. This is very odd and indeed when I'm on this forum page the laptop draws more battery power and slows down. Someone needs to look into that.

  • Posts: 99

    OOM problem solved: It was of course my mistake. I had reserved 24Gig huge pages out of a total of 32Gig at boot time. But I forgot to edit the worker VM configurations to add:

    1. <memoryBacking>
    2. <hugepages/>
    3. </memoryBacking>

    Without the above option, the VMs use regular memory, not the huge pages. No wonder that everything ground to a halt when running 4 VMs each 4Gig and having only 8Gig left on the host, plus swap space. The host was essentially using memory and swap space to provide the memory that I had allocated.

    I discovered this when I ran the stress container in lab 4.1 and scaled to 3 replicas, with an eye on the host memory:

    watch -n 3 free -h

    It quickly shrank to 460Mi (from 8Gig total).

  • Posts: 1,000

    Hi Heiko,

    Glad you found the issue as I couldn't think of what could be causing a host OOM. As far as the forum causing issues, it may be tied to the browser you are using. Are you using chrome? Does a different browser have the same high utilization?

    I only have Linux systems to test with, and don't typically use Chrome. But when I do use Chrome, and especially if I have more than one tab open, I will notice it consumes a lot of resources.

    Regards,

  • Posts: 99

    @serewicz said:
    Hi Heiko,

    ...As far as the forum causing issues, it may be tied to the browser you are using. Are you using chrome? Does a different browser have the same high utilization?

    I only have Linux systems to test with, and don't typically use Chrome. But when I do use Chrome, and especially if I have more than one tab open, I will notice it consumes a lot of resources.

    Regards,

    I have several systems that I'm using, mostly Linux:
    On the Linux systems I run Firefox.
    My Macbook runs Safari.

    I never user Chrome - seems we have the same experience. I typically have multiple tabs open (sometimes several dozen). However, until now this never caused any issue.

    The message I got on the Macbook with the "using a lot of energy" was a first for me.

  • Posts: 2,451

    Hi @heiko_s,

    I experienced similar behavior with Chrome on Ubuntu, and I suspected it had something to do with my browser plugins. I disabled most of them, yet I was still experiencing page freezes with the forum. I always thought it must be an isolated case caused by my setup and did not think much else of it.

    Could it be related to the caching mechanism responsible for saving drafts?

    Regards,
    -Chris

  • I had the same issue when using Ubuntu 18.04.5 LTS for the master and worker. Everything worked fine until I applied the calico.yaml. Then docker was not able to pull the nginx image nor do any dns-lookup; they all timed out. I found out it was due to systemd-resolved. When I removed calico.yaml again, docker was able to pull the nginx image again and dns-lookups worked fine again. I solved it by using debian 10 in stead of Ubuntu. Never liked Ubuntu.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training