Lab 3.1 POD cilium-XXXXX always with STATUS: CrashLoopBackOff (Ubuntu 22.04)

okanovic.eldin · December 2023

Hi there,
I am having difficulties with the Instalation of Kubernetes on Ubuntu 22.04 LTS.
Everything went fine until step nr. 23. I was following all the previous steps and using all configuration files provided in the course tarball.

After step nr. 23:

root@ubuntu:˜# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out

the pods are looking like this:

ubuntu@ubuntu:~$ kubectl -n kube-system get pods
NAME                               READY   STATUS             RESTARTS        AGE
cilium-6d8fr                       0/1     CrashLoopBackOff   9 (4m40s ago)   26m
cilium-operator-788c7d7585-n9f26   1/1     Running            0               26m
cilium-operator-788c7d7585-rhqtx   0/1     Pending            0               26m
coredns-5d78c9869d-7sj44           0/1     Pending            0               29m
coredns-5d78c9869d-mbq28           0/1     Pending            0               29m
etcd-ubuntu                        1/1     Running            0               29m
kube-apiserver-ubuntu              1/1     Running            0               29m
kube-controller-manager-ubuntu     1/1     Running            0               29m
kube-proxy-rwgr4                   1/1     Running            0               29m
kube-scheduler-ubuntu              1/1     Running            0               29m

In description of cilium-6d8fr I am getting following event:

ubuntu@ubuntu:~$ kubectl -n kube-system describe pod cilium-6d8fr
...
Warning  Unhealthy  35m (x3 over 36m)    kubelet            Startup probe failed: Get "http://127.0.0.1:9879/healthz": dial tcp 127.0.0.1:9879: connect: connection refused
  Warning  BackOff    77s (x178 over 35m)  kubelet            Back-off restarting failed container cilium-agent in pod cilium-6d8fr_kube-system(33fd142a-b736-4a08-b8c2-4d1bc18f9f0a)

Can somebody explain or give me directions what to analyse next. Currently I am without a idea and cannot find anything usefull on the internet about this issue.

Thanks

chrispokorni · December 2023

Hi @okanovic.eldin,

Ubuntu 22.04 LTS is known to interfere with the networking configuration of the Kubernetes components.
Please use the recommended OS - Ubuntu 20.04 LTS.

Regards,
-Chris

okanovic.eldin · December 2023

Hi @chrispokorni

Thank you for your comment.

I forgot to mention in my question that I am testing on the Ubuntu 22.04 LTS (minimal Kernel configuration), and that was the problem all the time. It doesn't meet the Cilium Requierements for the Linux Kernel. After Analysis of cilium-pod logs I found out that most of Kernel modules for Cilium are missing.

So I had to install linux extra kernel modules with:

sudo apt-get update
sudo apt install linux-generic
sudo apt install linuc-generic-hwe-22.04

And TADA:

ubuntu@ubuntu:~$ kubectl -n kube-system get pods
NAME                               READY   STATUS    RESTARTS      AGE
cilium-operator-788c7d7585-k7lbd   0/1     Pending   0             16m
cilium-operator-788c7d7585-xdkvq   1/1     Running   0             16m
cilium-vdblt                       1/1     Running   0             16m
coredns-5d78c9869d-kdnsm           1/1     Running   0             24m
coredns-5d78c9869d-vntqd           1/1     Running   0             24m
etcd-ubuntu                        1/1     Running   1 (18m ago)   24m
kube-apiserver-ubuntu              1/1     Running   1 (18m ago)   24m
kube-controller-manager-ubuntu     1/1     Running   1 (18m ago)   24m
kube-proxy-dgkh9                   1/1     Running   1 (18m ago)   24m
kube-scheduler-ubuntu              1/1     Running   1 (18m ago)   24m

Just to avoid the confusion, I have installed kubernetes from the scratch, so the Pod-names above are different from my first post.

Regards,
Eldin

yorgji · February 5

Hello,
I got a similar problem even though I'm running Ubuntu 20.04.6 LTS.
Any help would be really appreciated.

cilium-ml8lj 0/1 Init:0/6 6 (52s ago) 9m

chrispokorni · February 6

Hi @yorgji,

It would be helpful if you also shared details of your lab environment. What cloud or hypervisor is hosting your VMs, size of the VMs (CPU, RAM, disk - fully allocated or dynamically allocated), how many network interfaces per VM (what type), how is the firewall filtering inbound traffic to the VMs? What are the private IP addresses of the VMs?

Please run the following commands and supply their outputs:

kubectl get nodes -o wide
kubectl get pods -A -o wide
kubectl -n kube-system describe pod cilium-ml8lj

Regards,
-Chris

yorgji · February 6

Hello @chrispokorni,
Thanks for replying so quickly.
Some of the info that you requested are in the body of the message, the rest can be found in the attached a file.

[PDF attachment removed for cybersecurity concerns]

This is the vagrant file that I'm using.

Vagrant.configure("2") do |config|
  # Define common settings for both VMs
  config.vm.box = "ubuntu/focal64"  # Ubuntu 20.04
  config.vm.provider "virtualbox" do |vb|
    vb.memory = "8192"
    vb.cpus = 2
  end
 
  # Control Plane Node
  config.vm.define "cp-node" do |cp|
    cp.vm.hostname = "cp-node"
    cp.vm.network "private_network", type: "static", ip: "192.168.56.10"
    cp.vm.provider "virtualbox" do |vb|
    end
    cp.vm.provision "shell", inline: <<-SHELL
      sudo swapoff -a
      sudo sed -i '/ swap / s/^/#/' /etc/fstab
      sudo systemctl disable --now ufw
      sudo systemctl stop apparmor
      sudo systemctl disable --now apparmor
    SHELL
  end
 
  # Worker Node
  config.vm.define "worker-node" do |worker|
    worker.vm.hostname = "worker-node"
    worker.vm.network "private_network", type: "static", ip: "192.168.56.11"
    worker.vm.provision "shell", inline: <<-SHELL
      sudo swapoff -a
      sudo sed -i '/ swap / s/^/#/' /etc/fstab
      sudo systemctl disable --now ufw
      sudo systemctl stop apparmor
      sudo systemctl disable --now apparmor
    SHELL
  end
end

Hypervisor : virtualbox
2 CPUs
8 GB RAM
40 GB is the default storage of the focal64 image (probably static)
192.168.56.10 the ip of the control panel node

192.168.56.11 the ip of the worker node

!!! I had to edit the k8scp.sh. I added "--apiserver-advertise-address=192.168.56.10" in the kubeadm init command otherwise the worker node could not join. And then manually copy the .kube/config from the cp-node to the worker in order to be able to use the api.

Cilium doesn't need to be installed in every node ?

Let me know if you need more info.

yorgji · February 6

Thanks in advance. I posted it twice by mistake.

chrispokorni · February 6

Hi @yorgji,

I was unable to open the attached pdf document; please provide a txt file instead - it is a safer attachment format. Since vagrant is a personal choice, feel free to make adjustments as necessary to implement the following recommendations.

On the VirtualBox hypervisor, please ensure that:

each VM is provisioned with only one bridged network adapter, with promiscuous mode enabled to allow all incoming traffic
the 40GB vdisk is fully allocated, and not dynamic. All this needs to be explicitly declared. Assumptions with "probably" do not lead to successful outcomes

Editing the kubeadm-config.yaml manifest:

the controlPlaneEndpoint entry is recommended over the --apiserver-advertise-address option due to its flexibility when it comes to HA cluster configuration in a later chapter. Ensure the .../hosts file is correctly populated with control plane VM IP address and the k8scp alias (same entry on both VMs)
in order to avoid IP range overlapping, change the subnet value as such podSubnet: 10.200.0.0/16

Editing the cilium-cni.yaml manifest before launching the Cilium CNI:

around line 222 update the cidr value as such cluster-pool-ipv4-cidr: "10.200.0.0/16"

Cilium doesn't need to be installed in every node ?

No.

And then manually copy the .kube/config from the cp-node to the worker in order to be able to use the api.

Not a recommended step, but not a showstopper either.

Regards,
-Chris

yorgji · February 6

Hello @chrispokorni,
Thanks for the help but it seems way too complicated.
Is there a tutorial to setup the environment locally using VMs instead of the using a Cloud provider?

kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cp-node Ready control-plane 18m v1.31.1 10.0.2.15
Ubuntu 20.04.6 LTS 5.4.0-205-generic containerd://1.7.25
worker-node NotReady 9m9s v1.31.1 10.0.2.15
Ubuntu 20.04.6 LTS 5.4.0-205-generic containerd://1.7.25
kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS
RESTARTS AGE IP NODE NOMINATED NODE
READINESS GATES
kube-system cilium-bpdkv 0/1 Init:0/6 6 (46s
ago) 9m44s 10.0.2.15 worker-node
kube-system cilium-envoy-2nnb5 ago) 9m44s 10.0.2.15 worker-node 1/1 Running 1 (4m51s

kube-system cilium-envoy-frtsz 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system cilium-kkjtm 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system cilium-operator-5c7867ccd5-7zvvd 1/1 Running 019m 10.0.2.15 cp-node
kube-system coredns-7c65d6cfc9-d99tl 1/1 Running 0
19m 10.0.0.30 cp-node
kube-system coredns-7c65d6cfc9-tpvd5 1/1 Running 0
19m 10.0.0.123 cp-node
kube-system etcd-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-apiserver-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-controller-manager-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-proxy-lvkpc 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-proxy-m5ntc ago) 9m44s 10.0.2.15 worker-node 1/1 Running 1 (4m51s

kube-system kube-scheduler-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kubectl -n kube-system describe pod cilium-bpdkv
Name: cilium-bpdkv
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
...
Normal Scheduled 10m default-scheduler
Successfully assigned kube-system/cilium-bpdkv to worker-node
Normal Pulling 10m kubelet Pulling
image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39"Normal Pulled 9m18s kubelet
Successfully pulled image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39" in 33.006s (48.034s including waiting). Image
size: 223002645 bytes.
Warning BackOff 7m6s kubelet Back-
off restarting failed container config in pod cilium-bpdkv_kube-
system(aa383527-9bec-4324-9f80-b8b676126a5a)
Normal Created 6m51s (x3 over 9m18s) kubelet Created
container config
Normal Started 6m51s (x3 over 9m18s) kubelet Started
container config
Normal Pulled 6m51s (x2 over 8m12s) kubelet
Container image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39" already present on machine
Normal SandboxChanged 5m17s kubelet Pod
sandbox changed, it will be killed and re-created.
Warning BackOff 47s (x6 over 4m10s) kubelet Back-
off restarting failed container config in pod cilium-bpdkv_kube-
system(aa383527-9bec-4324-9f80-b8b676126a5a)
Normal Pulled 33s (x4 over 5m17s) kubelet
Container image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39" already present on machine
Normal Created 33s (x4 over 5m16s) kubelet Created
container config
Normal Started 33s (x4 over 5m16s) kubelet Started
container config

chrispokorni · February 6

Hi @yorgji,

Video guides have only been generated for the AWS and GCP cloud providers.
Based on the output provided above, your worker node is NotReady, a state which prevents your cluster from fully initializing and deploying any workload. Unfortunately there is no band-aid solution here; a correctly configured cluster is required to move forward and successfully complete this training.

As far as VirtualBox, its easy to use interface allows me to quickly provision the two required VMs to the specs described above. Following all the installation and config steps from the lab guide, with few custom edits mentioned above, should yield a fully functional cluster. I have done this countless times. You only have to build the cluster once, then stop the VMs when not actively working on lab exercises.

Regards,
-Chris

yorgji · February 13

Hello @chrispokorni ,
So after I run the k8scp.sh without any modifications :
1. set the controlPlaneEndpoint to 192.168.56.10(control plane ip)
2. and the /etc/hosts file in each node should contain

192.168.56.10 k8scp
 
192.168.56.10 cp-node

?
3. And then update the cluster-pool-ipv4-cidr to cluster-pool-ipv4-cidr: "10.200.0.0/16" ? and if yes in which file ?

chrispokorni · February 13

Hi @yorgji,

Here are the details from my earlier response to you, with a minor edit for clarification:

On the VirtualBox hypervisor, please ensure that:

each VM is provisioned with only one bridged network adapter, with promiscuous mode enabled to allow all incoming traffic

the 40GB vdisk is fully allocated, and not dynamic. All this needs to be explicitly declared. Assumptions with "probably" do not lead to successful outcomes

Editing the kubeadm-config.yaml manifest:

the controlPlaneEndpoint: "k8scp:6443" entry is recommended over the --apiserver-advertise-address option due to its flexibility when it comes to HA cluster configuration in a later chapter. Ensure the .../hosts file is correctly populated with control plane VM IP address and the k8scp alias (same entry on both VMs)

in order to avoid IP range overlapping, change the subnet value as such podSubnet: 10.200.0.0/16

Editing the cilium-cni.yaml manifest before launching the Cilium CNI:

around line 222 update the cidr value as such cluster-pool-ipv4-cidr: "10.200.0.0/16"

Regards,
-Chris

yorgji · February 13

Hello @chrispokorni ,

I really appreciate your help.

I updated the kubeadm init command in k8scp.sh file ->

sudo kubeadm init --control-plane-endpoint "k8scp:6443" --pod-network-cidr=10.200.0.0/16

I also updated the cilium install command in k8scp.sh file ->

cilium install --version 1.16.1 \
  --set k8sServiceHost=k8scp \
  --set k8sServicePort=6443 \
  --set ipam.operator.clusterPoolIPv4PodCIDRList=10.200.0.0/16

and I updated the /etc/hosts as per your instruction.

And there is an improvement, now both nodes are ready but cilium still reports errors; like this one :

cilium             cilium-zpzqf    unable to retrieve cilium endpoint information: command failed (pod=kube-system/cilium-zpzqf, container=cilium-agent): unable to upgrade connection: pod does not exist

but it is running :

cilium-zpzqf   1/1     Running   0               13m

yorgji · February 17

This is the result of the kubectl describe pod cilium-zpzqf :

 Warning  Unhealthy       24m (x3 over 24m)      kubelet            Startup probe failed: Get "http://127.0.0.1:9879/healthz": dial tcp 127.0.0.1:9879: connect: connection refused

yorgji · February 17

Turns out that one of the coredns pods is failing too : Readiness probe failed: HTTP probe failed with statuscode: 503

chrispokorni · February 18

Hi @yorgji,

There are a few things that puzzle me from your most recent comments.
Please clarify for me what course you are following - LFS258 Kubernetes Fundamentals or LFD259 Kubernetes for Developers?

This forum is for the LFS258 course, yet some of your remarks are hinting that you are enrolled in a different course, the LFD259. The cluster bootstrapping is different between the 2 courses, and the troubleshooting help we provide here is very targeted.

Regards,
-Chris

yorgji · February 18

Hello @chrispokorni,
You are right, I'm enrolled in LFD259 Kubernetes for Developers.
Sorry about that, I didn't notice it.

chrispokorni · February 19

Hi @yorgji,

In that case you do not need the k8scp alias entry in any of the hosts files, and remove it from any command if you supplied it as an argument. Perhaps populating the hosts files on both machines with the cp and worker hostnames and their respective private IP addresses will also help.

Regards,
-Chris

Lab 3.1 POD cilium-XXXXX always with STATUS: CrashLoopBackOff (Ubuntu 22.04)

Welcome!

Comments

192.168.56.11 the ip of the worker node

Welcome!

Welcome!

Quick Links

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)