Lab 3.3 - coredns CrashLoopBackoff

heiko_s · August 2020

Following hardware issues I had to reinstall Kubernetes master and workers on another PC. I'm running a Ubuntu 20.04 based host with QEMU / kvm Ubuntu 18.04 server guests: master, worker1 to worker4.

Things I did in addition to the lab tutorial: Comment out swap creation in /etc/fstab. Networking is done by NetworkManager using static IP. The master also runs a bind9 DNS server (see further down).

All VMs are connected to a bridged network bridge0. ufw firewall is disabled on the VMs and the host.

Here is the output of

kubectl get pods --all-namespaces 
NAMESPACE     NAME                                       READY   STATUS                  RESTARTS   AGE
kube-system   calico-kube-controllers-578894d4cd-mrj4l   1/1     Running                 4          25h
kube-system   calico-node-9vsvc                          0/1     Init:CrashLoopBackOff   7          25h
kube-system   calico-node-g9q9x                          0/1     Running                 4          134m
kube-system   calico-node-knppq                          0/1     Completed               2          93m
kube-system   calico-node-wpfzq                          1/1     Running                 4          24h
kube-system   coredns-66bff467f8-gqgt9                   0/1     Completed               0          57m
kube-system   coredns-66bff467f8-qnsjk                   0/1     CrashLoopBackOff        11         56m
kube-system   etcd-master                                1/1     Running                 4          25h
kube-system   kube-apiserver-master                      1/1     Running                 6          25h
kube-system   kube-controller-manager-master             1/1     Running                 7          25h
kube-system   kube-proxy-8wshb                           1/1     Running                 4          134m
kube-system   kube-proxy-gxnjw                           0/1     Error                   2          93m
kube-system   kube-proxy-hr92t                           1/1     Running                 4          24h
kube-system   kube-proxy-z8cx6                           1/1     Running                 4          25h
kube-system   kube-scheduler-master                      1/1     Running                 7          25h

I now disabled the ufw firewall on the host and deleted the calico-node... nodes. Then I deleted the coredns-... nodes and this is the result:

kubectl get pods --all-namespaces 
NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
kube-system   calico-kube-controllers-578894d4cd-mrj4l   1/1     Running            4          25h
kube-system   calico-node-knppq                          1/1     Running            4          110m
kube-system   calico-node-kt5lh                          1/1     Running            0          2m25s
kube-system   calico-node-wpfzq                          1/1     Running            4          24h
kube-system   calico-node-z8h2t                          1/1     Running            0          107s
kube-system   coredns-66bff467f8-9sjgq                   0/1     CrashLoopBackOff   1          16s
kube-system   coredns-66bff467f8-hfk5b                   0/1     Running            0          33s

I removed the bind9 DNS server on master but this led to other problems, among others it sometimes resolves names, at other times not. Right now name resolving doesn't work, though I tried to reverse the steps and have systemd-resolve up and running.

I guess I will be reinstalling the host, then the VMs and see if that solves the issues. I'm afraid the bind9 server on the master VM didn't help.

The other possible issue could be libvirt networking. I had manually configured a bridged network which usually works fine when editing the xml guest config files to enable bridged networking. Next time I will try to configure the bridge within virt-manager and see if it makes a difference.

Any suggestions as to the above CrashLoopBackoff errors are welcome. Perhaps I'm looking in the wrong place altogether.

serewicz · August 2020

Hello Heiko,

I would agree with you that the issue is tied to own QEMU/KVM is passing the traffic. With both calico-node and coredns failing, I would guess on the worker node, they host is not properly passing all the network traffic back to the master. As the image was loaded by Docker, we can tell that overall the nodes have access to the Internet, so the issue may only be between nodes.

If you start nginx or busybox with many replicas, do they reach Ready state on both the master and the worker? Does the output of kubectl get pod -o wide show any other issues that only happen on the worker node?

To troubleshoot the issue I would start a wireshark on the primary interface of each node. Is there only one interface per vm? Multiple interfaces can be an issue as well. If you terminate calico or coredns pod on the worker, you should see traffic going from worker to master. Of course it is all using TLS, so you could add these two flags to --insecure-port to set a port which will be bound in insecure mode and set the Interface/IP to use with --insecure-bind-address

You can also set the --bind-interface if you have multiple interfaces on the nodes, to narrow down where traffic goes.

I have a feeling that the calico and coredns traffic is not being sent to the master. If other pod traffic works, if you can get them running, I'd lean towards a bug.

Another idea is to put in a virtual switch with OvS. If it works, then we know the issue is QEMU/KVM networking and can explore options to changing the network type inside QEMU.

Regards,

serewicz · August 2020

Hello Heiko,

I spun up two U18 vms on my RHEL 8 system. Fresh install, then setup the cluster. No issue. Could be either something buggy in U20 (which uses SELinux AND apparmor and may have eBPF in there, or something about the networking of your QEMU/KVM instances. This is what I see:

heiko_s · August 2020

Hello Tim, thanks for the detailed answers and the effort to replicate the problem.

In the meantime I reinstalled the host and the master. Since I already had installed a master and multiple workers on another PC and that went fine, I started to suspect some VM config issues. In fact I had taken the configs from my other PC without change and noticed that I had over-provisioned the VMs. The PC I'm using now has only 32Gig memory and 6 cores / 12 threads, making it borderline specs for a master and 4 workers. In fact, the whole PC would freeze at times, often followed by a crash of a worker VM.

I changed the VM configs but still get errors. So now I deleted the nodes and will try to recreate them.

I had configured my nodes to use static IPs so I could easily access them via SSH. After I modified /etc/netplan/0...yaml to activate NetworkManager, I used nmcli to configure the VM network interfaces. Here an example for the master:

nmcli con add con-name static0 type ethernet ifname enp1s0 autoconnect yes ipv4.addresses 192.168.0.130/24 gw4 192.168.0.1 ipv4.method manual ipv4.dns 8.8.8.8

Unfortunately this didn't solve the issues. I then created a bridge on the host (bridge0), which is used in the VM configurations:

In the past I had always configured a bridge on the host to be used for communication between VMs and host and VM to VM. Never had any issue. Seeing that you used virt-manager, I have some questions:

Did you setup a virtual network in virt-manager, other than the default network with virbr0 device that uses DHCP range 192.168.122.2 - 192.168.122.254?
Did you create a bridge on the host, or left networking at default?

I will first kubectl delete node worker1... and recreate them to see if that helps. If not I'm going grudgingly to try the default network setup with DHCP.

Thanks again for the help.

heiko_s · August 2020

Seems like I created a mess that needed some cleaning up. If you look at my IP range for the VMs (host: 192.168.0.129, VMs: 192.168.0.130-134), it's the same as the calico range (192.168.0.0).

I edited the calico.yaml and kubeadm-config.yaml files and changed the address range to 192.168.1.0/16 (in the calico file that meant to uncomment two lines).

Since I had used kubeadm reset on the master and the workers several times, I finally RTFM-ed the output of that command. It reminds us to remove the files in /etc/cni/net.d/*, but more importantly it mentions that the iptables rules are NOT deleted. Moreover, /var/lib contains remnants of the deployment that should be removed.

So here is what I did on the master and each worker:

kubeadm reset
rm /etc/cni/net.d/*
cd /var/lib
ls
rm -rf calico
rm -rf cni
rm -rf kubelet
cd /etc/kubernetes/
ls
ls manifests/
ls pki/
cd
iptables -F
iptables -L
systemctl restart docker.service 
iptables -L
kubeadm join k8smaster:6443 --token 4id98t.tzleaeq49ew8ahgm     --discovery-token-ca-cert-hash sha256:1727ba505a5fe1dec308530497c56109bb7c92263c1464e78b6f19401ae1ec23

iptables -F flushes the iptables rules. The restart of docker.service is necessary to create new iptables rules for docker.

Here the result:

kubectl get nodes
NAME      STATUS   ROLES    AGE     VERSION
master    Ready    master   51m     v1.18.1
worker1   Ready    <none>   35m     v1.18.1
worker2   Ready    <none>   28m     v1.18.1
worker3   Ready    <none>   8m45s   v1.18.1
worker4   Ready    <none>   3m12s   v1.18.1

Unfortunately my hardware cannot handle more than 4 nodes, as it freezes and crashes a worker node once I try to run the 5th node (4th worker node).

kubectl get pods --all-namespaces 
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-578894d4cd-dtdzx   1/1     Running   0          41m
kube-system   calico-node-2x7qz                          1/1     Running   0          4m40s
kube-system   calico-node-4qqlc                          1/1     Running   0          41m
kube-system   calico-node-lgdhf                          0/1     Running   1          30m
kube-system   calico-node-nhr62                          1/1     Running   0          36m
kube-system   calico-node-w29hn                          1/1     Running   0          10m
kube-system   coredns-66bff467f8-2x44x                   1/1     Running   0          52m
kube-system   coredns-66bff467f8-9s6hc                   1/1     Running   0          52m
kube-system   etcd-master                                1/1     Running   0          52m
kube-system   kube-apiserver-master                      1/1     Running   0          52m
kube-system   kube-controller-manager-master             1/1     Running   2          52m
kube-system   kube-proxy-c45nx                           1/1     Running   0          36m
kube-system   kube-proxy-d2z9k                           1/1     Running   1          30m
kube-system   kube-proxy-rpmtk                           1/1     Running   0          10m
kube-system   kube-proxy-tjwc5                           1/1     Running   0          4m40s
kube-system   kube-proxy-z2ph4                           1/1     Running   0          52m
kube-system   kube-scheduler-master                      1/1     Running   2

Still I'd say it works as advertised. I will remove the 4th worker and see how stable it is.

heiko_s · August 2020

So I got rid of worker4:

kubectl delete nodes worker4

But that doesn't seem the right way, as calico still runs pods for the removed node:

NAMESPACE     NAME                                       READY   STATUS                  RESTARTS   AGE
kube-system   calico-kube-controllers-578894d4cd-dtdzx   1/1     Running                 1          57m
kube-system   calico-node-4qqlc                          1/1     Running                 1          57m
kube-system   calico-node-lgdhf                          0/1     Init:CrashLoopBackOff   2          45m
kube-system   calico-node-nhr62                          0/1     Running                 2          52m
kube-system   calico-node-w29hn                          0/1     Running                 1          25m
kube-system   coredns-66bff467f8-2x44x                   1/1     Running                 1          67m
kube-system   coredns-66bff467f8-9s6hc                   1/1     Running                 1          67m

This also affects the running pods, as you can see above 0/1 ready state. So I deleted the pods which finally gave me 1/1 results:

NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-578894d4cd-dtdzx   1/1     Running   1          73m
kube-system   calico-node-4qqlc                          1/1     Running   1          73m
kube-system   calico-node-57cpc                          1/1     Running   0          62s
kube-system   calico-node-pqg6d                          1/1     Running   0          62s
kube-system   calico-node-rlbtk                          1/1     Running   0          102s
kube-system   coredns-66bff467f8-2x44x                   1/1     Running   1          84m
kube-system   coredns-66bff467f8-9s6hc                   1/1     Running   1          84m

serewicz · August 2020

Great Heiko! I'm glad you got it working.

Sounds like a combo between left over config files and some resource issues. Congrats, you found the two most difficult things to troubleshoot

If you want to add the fifth node, one idea would be to lower the resources on all the worker and the proxy to fit. If you don't have any big deployments the worker node and the HAProxy node don't use all that much resources. I always suggest folks use the same size as there is less chance of running out of resources with a high replica count, but the lab should work with smaller worker/proxy.

Regards,

heiko_s · August 2020

Now that the cluster is running I've been testing it until the smoke comes out.

In exercise 3.4 I deployed the nginx container and web service. However, when I tried to access the nginx pod on worker1 from the master using curl IP-address it didn't work. I then realised that there was no tunl0 interface on the master. The workers are fine and all have the tunl0 interface. I suppose this isn't normal? I probably forgot to delete some stuff before I ran again the kubeadm init command. Is there a simple way to (re)activate the tunl0 interface?

Having fun with running
for i in {1..10000}; do curl 10.110.201.77:80 &>/dev/null; done

in one terminal window, watching tcpdump in another
sudo tcpdump -i tunl0

and watching and deleting the pods in a third window

kubectl get pods -o wide
kubectl delete pod nginx-d46f5678b-xwdlj

The IP address in the curl command is the cluster-ip so the tcpdump shows different IPs for the end-points. In the worst case I managed to create a "service blackout" of 5-7 seconds in between pod deletion and recreation. In some instances the curl command would employ 2 nodes.

heiko_s · August 2020

One more thing: You mentioned multiple interfaces could be an issue. Well, here is what I have on the master and the workers:

nmcli con show
NAME     UUID                                  TYPE      DEVICE  
docker0  fa09a716-5e2c-4f26-9be4-20641e117a28  bridge    docker0 
static2  ce36f612-d8da-423b-bc98-cde20aec5de7  ethernet  enp1s0

There are two interfaces, one that I created (static2) and one that docker created. Is that OK like that?

chrispokorni · August 2020

Hi @heiko_s,

Once installed, Docker will create the docker bridge on the nodes where it is running. However, that bridge does not get utilized by Kubernetes, as it uses a third-party networking plugin. In other words, the docker bridge is harmless.

Regards,
-Chris

serewicz · August 2020

Hello,

indeed. I was speaking of outbound interfaces, something like having eth0 and eth1 etc. But I don't think that's the case with your setup, but I there are extra considerations if your instances have multiple outbound interfaces.

I have found that an existing cluster which goes through a kubeadm reset, will have odd networking issues. I would rebuild with freshly installed VMs. That way you know what you are working with is a typcial setup instead of an unknown and little experienced situation of kubeadm reset plus some less typical networking configuration.

Regards,

heiko_s · August 2020

Thanks Chris for clarifying the Docker bridge. Makes sense.

Tim, today I booted the PC and VMs and the master now has a tunl0 interface. At first everything seems to be running fine:

kubectl get nodes -o wide
NAME      STATUS   ROLES    AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION       CONTAINER-RUNTIME
master    Ready    master   19h   v1.18.1   192.168.0.130   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://19.3.6
worker1   Ready    <none>   19h   v1.18.1   192.168.0.131   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://19.3.6
worker2   Ready    <none>   19h   v1.18.1   192.168.0.132   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://19.3.6
worker3   Ready    <none>   18h   v1.18.1   192.168.0.133   <none>        Ubuntu 18.04.5 LTS   4.15.0-112-generic   docker://19.3.6

kubectl get pods -o wide
NAME                    READY   STATUS    RESTARTS   AGE   IP                NODE      NOMINATED NODE   READINESS GATES
nginx-d46f5678b-2mk4s   1/1     Running   1          11h   192.168.189.70    worker2   <none>           <none>
nginx-d46f5678b-d9v2v   1/1     Running   1          11h   192.168.189.69    worker2   <none>           <none>
nginx-d46f5678b-js92b   1/1     Running   1          10h   192.168.235.137   worker1   <none>           <none>

kubectl get service nginx 
NAME    TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
nginx   LoadBalancer   10.105.137.239   <pending>     80:30747/TCP   10h

kubectl get ep nginx 
NAME    ENDPOINTS                                                AGE
nginx   192.168.189.69:80,192.168.189.70:80,192.168.235.137:80   10h

kubectl get deployments.apps nginx -o wide
NAME    READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES   SELECTOR
nginx   3/3     3            3           13h   nginx        nginx    app=nginx

heiko@master:~$ curl 10.105.137.239:80
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

"oldy" is the host:

heiko@oldy:~$ curl 192.168.0.130:30747
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>

nginx is running on worker1 and worker2. I can access the webpage from the master now, using the cluster-ip. Access also works from outside (from host "oldy") using the LoadBalancer.

As I type this, the host again froze and I got a OOM and worker1 was killed (qemu-... process). I don't understand why a worker running in a VM would cause an OOM on the host. Need to look into that. Any ideas are welcome.

P.S.: All nodes / pods were running idle - no work load except kubernetes and the Firefox browser on the host.

P.S.S.: Something is wrong with this web forum page. I get delays and what not and when accessing it from the Macbook it warns me that the page uses a lot of energy and suggests to close the window. This is very odd and indeed when I'm on this forum page the laptop draws more battery power and slows down. Someone needs to look into that.

heiko_s · August 2020

OOM problem solved: It was of course my mistake. I had reserved 24Gig huge pages out of a total of 32Gig at boot time. But I forgot to edit the worker VM configurations to add:

  <memoryBacking>
    <hugepages/>
  </memoryBacking>

Without the above option, the VMs use regular memory, not the huge pages. No wonder that everything ground to a halt when running 4 VMs each 4Gig and having only 8Gig left on the host, plus swap space. The host was essentially using memory and swap space to provide the memory that I had allocated.

I discovered this when I ran the stress container in lab 4.1 and scaled to 3 replicas, with an eye on the host memory:

watch -n 3 free -h

It quickly shrank to 460Mi (from 8Gig total).

serewicz · August 2020

Hi Heiko,

Glad you found the issue as I couldn't think of what could be causing a host OOM. As far as the forum causing issues, it may be tied to the browser you are using. Are you using chrome? Does a different browser have the same high utilization?

I only have Linux systems to test with, and don't typically use Chrome. But when I do use Chrome, and especially if I have more than one tab open, I will notice it consumes a lot of resources.

Regards,

heiko_s · August 2020

@serewicz said:
Hi Heiko,

...As far as the forum causing issues, it may be tied to the browser you are using. Are you using chrome? Does a different browser have the same high utilization?

I only have Linux systems to test with, and don't typically use Chrome. But when I do use Chrome, and especially if I have more than one tab open, I will notice it consumes a lot of resources.

Regards,

I have several systems that I'm using, mostly Linux:
On the Linux systems I run Firefox.
My Macbook runs Safari.

I never user Chrome - seems we have the same experience. I typically have multiple tabs open (sometimes several dozen). However, until now this never caused any issue.

The message I got on the Macbook with the "using a lot of energy" was a first for me.

chrispokorni · August 2020

Hi @heiko_s,

I experienced similar behavior with Chrome on Ubuntu, and I suspected it had something to do with my browser plugins. I disabled most of them, yet I was still experiencing page freezes with the forum. I always thought it must be an isolated case caused by my setup and did not think much else of it.

Could it be related to the caching mechanism responsible for saving drafts?

Regards,
-Chris

RonaldHeirbaut · February 2021

I had the same issue when using Ubuntu 18.04.5 LTS for the master and worker. Everything worked fine until I applied the calico.yaml. Then docker was not able to pull the nginx image nor do any dns-lookup; they all timed out. I found out it was due to systemd-resolved. When I removed calico.yaml again, docker was able to pull the nginx image again and dns-lookups worked fine again. I solved it by using debian 10 in stead of Ubuntu. Never liked Ubuntu.

Lab 3.3 - coredns CrashLoopBackoff

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)