Lab 3.2 Grow the Cluster - Cilium crashloop with worker nodes

fasmy · April 27

The issue:

When adding workers, cilium pods are stuck in CarshLoopBackOff during the initialization process.

Instructions from Lab 3.1 to Lab 3.2 followed to the letter.
kubectl, kubeadm, kubelet 1.31.1 with apt maked to hold

Spect

One computer with 3 VirtualBox machines (32GB RAM, 2TB Storage)
Each node has 2cores, 4GB RAM, and 20GB Storage

Network

NAT used to provide easy internet access, for wget and apt install.
Host-only Adapter used to connect between Nodes. Promiscuous mode: "deny". Each machine can ping one another.

Network IPs

Host-only Ethernet IPv4: 192.168.56.1 /24
CP: 192.168.56.11/24
Worker1: 192.168.56.21/24
Worker2: 192.168.56.22/24

Cilium
cluster-pool-ipv4-cidr: 192.168.0.0 /16

What I tried
I was worried that Cilium might not be using k8scp, despit it being used in the init file.
cp > cat /etc/hosts
127.0.0.1 localhost
10.0.2.15 cp
192.168.56.11 k8scp

worker1 > cat /etc/hosts
127.0.0.1 localhost
10.0.2.15 worker1
192.168.56.11 k8scp cp

I notice the "internal IP" was linked to my NAT rather than the Host-Only Network.

So I solved it by modifying manually specifying which IP my kubelets should use.

sudo vim /etc/default/kubelet
KUBELET_EXTRA_ARGS="--node-ip=192.168.56.11"

It now shows:

But my pods are still in CrashLoopBackOff...

I belive that I may need to set the default IP (eth0 or something) to my desired host-only adapter. Or maybe there is a cilium-specific config that I must change.

Any help is welcome. If you have a prefered/recommended VirtualBox setup, tell me. There was no recommendations on how to configure the VM networking in the course, so I followed standard home-lab practices.

chrispokorni · April 28

Hi @fasmy,

As a follow up to your earlier post, this paints a much clearer picture of your infrastructure and your Kubernetes installation.

Each node has 2cores, 4GB RAM, and 20GB Storage

I would recommend at least the cp VM provisioned with 8 GB RAM, while the two workers should support the light workload of the lab exercises at 4GB RAM each (pay close attention at exercises 4.2 and 4.3 while working with Memory constraints).

NAT used to provide easy internet access, for wget and apt install.

Host-only Adapter used to connect between Nodes...

While a mix of nat and host-only networks may seem to satisfy "standard home-lab practices", a single bridged network adapter per VM has worked very well for VirtualBox VMs for the purposes of this lab environment. The bridged network adapter simultaneously supports the network modes of the nat and host-only adapters.

... Promiscuous mode: "deny"...

Promiscuous mode should be set to "allow-all" traffic to the VMs, even with the recommended bridged adapter.

... Each machine can ping one another.

Ping only proves that the ICMP protocol is allowed - a protocol not used by Kubernetes. Instead, Kubernetes relies on TCP and UDP protocols.

Host-only Ethernet IPv4: 192.168.56.1 /24
CP: 192.168.56.11/24
Worker1: 192.168.56.21/24
Worker2: 192.168.56.22/24
cluster-pool-ipv4-cidr: 192.168.0.0 /16

Overlapping VM IP addresses with the pod IP pool is detrimental to routing within the cluster. By default, VirtualBox uses the 192.168.56.0/24 IP range for VMs. The pod IP pool (range) should be distinct, not overlapping the VM range. In addition to setting one single bridged adapter per VM, and enabling promiscuous mode to "allow-all", I recommend setting the cluster-pool-ipv4-cidr to 10.200.0.0/16 in cilium-cni.yaml manifest, and modifying the podSubnet entry to the same 10.200.0.0/16 cidr in the kubeadm-config.yaml manifest.

Regards,
-Chris

fasmy · April 28

Thank you very much !

I'll keep you in touch. I might be off work soon enough to fix it tomorrow!

fasmy · May 5

It works, it just works! HOORAY!

I tried tweaking the IP/CIDR ranges to get things tight—around 60 nodes with 128 IPs each (I really can't remember exactly). But in the end, I stopped playing games and just went with a bridge setup.

OH MY BRIDGE!
I was even able to reuse my existing virtual machines—I DIDN’T HAVE TO START FROM SCRATCH!

Quick advice for anyone trying to learn from the ground up and doing it with non-standard setup:

Keep snapshots at every step!

After SSH setup
After all Kubernetes packages are installed, /etc/hostname and /etc/hosts are set, and swapoff -a is done
→ From here, clone your first machine (named cp, for control plane) and clone it twice—once for each worker.
2.1. On each worker, set /etc/hostname and /etc/hosts, then save a snapshot
After cp init (verify everything is OK with kubectl get -A pods)
→ Don’t forget: kubectl needs to be configured to work. The kubeadm init output literally gives you 4 commands—run them!
After Cilium is installed on the cp (check with kubectl get -A pods)
→ One control plane pod will be stuck at 0/1 until a second node joins—Next Step!
After both worker nodes have joined
→ Check kubectl get -A pods again and make sure everything shows 1/1 and looks clean!

AND YOU ARE D-D-D-D-D-D-DONE!

If you're following a strict tutorial, you might get through this in a few minutes and wonder why it ever seems hard. But if you're not an everyday Linux user, you’ll need to learn how to tweak configs (Ubuntu uses Netplan now), set up SSH (both server and client), and secure access properly.

SIMULATE your full working environment—from start to finish.
Break things on purpose. Try weird setups. Learn what fails and why.

Referential learning is the backbone of my job, but some things you really have to do once before you can just follow instructions blindly.

To whoever’s reading this: good luck!
And yes—follow the tutorial strictly if you don't want to waste time. I had to do this, because I have great bare metal pans for the future, so there was no shortcut.

Lab 3.2 Grow the Cluster - Cilium crashloop with worker nodes

Best Answer

Answers

Welcome!

Welcome!

Quick Links

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)