Lab 3.1 POD cilium-XXXXX always with STATUS: CrashLoopBackOff (Ubuntu 22.04)

Hi there,
I am having difficulties with the Instalation of Kubernetes on Ubuntu 22.04 LTS.
Everything went fine until step nr. 23. I was following all the previous steps and using all configuration files provided in the course tarball.
After step nr. 23:
root@ubuntu:˜# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
the pods are looking like this:
ubuntu@ubuntu:~$ kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE cilium-6d8fr 0/1 CrashLoopBackOff 9 (4m40s ago) 26m cilium-operator-788c7d7585-n9f26 1/1 Running 0 26m cilium-operator-788c7d7585-rhqtx 0/1 Pending 0 26m coredns-5d78c9869d-7sj44 0/1 Pending 0 29m coredns-5d78c9869d-mbq28 0/1 Pending 0 29m etcd-ubuntu 1/1 Running 0 29m kube-apiserver-ubuntu 1/1 Running 0 29m kube-controller-manager-ubuntu 1/1 Running 0 29m kube-proxy-rwgr4 1/1 Running 0 29m kube-scheduler-ubuntu 1/1 Running 0 29m
In description of cilium-6d8fr I am getting following event:
ubuntu@ubuntu:~$ kubectl -n kube-system describe pod cilium-6d8fr ... Warning Unhealthy 35m (x3 over 36m) kubelet Startup probe failed: Get "http://127.0.0.1:9879/healthz": dial tcp 127.0.0.1:9879: connect: connection refused Warning BackOff 77s (x178 over 35m) kubelet Back-off restarting failed container cilium-agent in pod cilium-6d8fr_kube-system(33fd142a-b736-4a08-b8c2-4d1bc18f9f0a)
Can somebody explain or give me directions what to analyse next. Currently I am without a idea and cannot find anything usefull on the internet about this issue.
Thanks
Comments
-
Hi @okanovic.eldin,
Ubuntu 22.04 LTS is known to interfere with the networking configuration of the Kubernetes components.
Please use the recommended OS - Ubuntu 20.04 LTS.Regards,
-Chris0 -
Thank you for your comment.
I forgot to mention in my question that I am testing on the Ubuntu 22.04 LTS (minimal Kernel configuration), and that was the problem all the time. It doesn't meet the Cilium Requierements for the Linux Kernel. After Analysis of cilium-pod logs I found out that most of Kernel modules for Cilium are missing.
So I had to install linux extra kernel modules with:
sudo apt-get update sudo apt install linux-generic sudo apt install linuc-generic-hwe-22.04
And TADA:
ubuntu@ubuntu:~$ kubectl -n kube-system get pods NAME READY STATUS RESTARTS AGE cilium-operator-788c7d7585-k7lbd 0/1 Pending 0 16m cilium-operator-788c7d7585-xdkvq 1/1 Running 0 16m cilium-vdblt 1/1 Running 0 16m coredns-5d78c9869d-kdnsm 1/1 Running 0 24m coredns-5d78c9869d-vntqd 1/1 Running 0 24m etcd-ubuntu 1/1 Running 1 (18m ago) 24m kube-apiserver-ubuntu 1/1 Running 1 (18m ago) 24m kube-controller-manager-ubuntu 1/1 Running 1 (18m ago) 24m kube-proxy-dgkh9 1/1 Running 1 (18m ago) 24m kube-scheduler-ubuntu 1/1 Running 1 (18m ago) 24m
Just to avoid the confusion, I have installed kubernetes from the scratch, so the Pod-names above are different from my first post.
Regards,
Eldin0 -
Hello,
I got a similar problem even though I'm running Ubuntu 20.04.6 LTS.
Any help would be really appreciated.cilium-ml8lj 0/1 Init:0/6 6 (52s ago) 9m
0 -
Hi @yorgji,
It would be helpful if you also shared details of your lab environment. What cloud or hypervisor is hosting your VMs, size of the VMs (CPU, RAM, disk - fully allocated or dynamically allocated), how many network interfaces per VM (what type), how is the firewall filtering inbound traffic to the VMs? What are the private IP addresses of the VMs?
Please run the following commands and supply their outputs:
kubectl get nodes -o wide kubectl get pods -A -o wide kubectl -n kube-system describe pod cilium-ml8lj
Regards,
-Chris0 -
Hello @chrispokorni,
Thanks for replying so quickly.
Some of the info that you requested are in the body of the message, the rest can be found in the attached a file.[PDF attachment removed for cybersecurity concerns]
This is the vagrant file that I'm using.
Vagrant.configure("2") do |config| # Define common settings for both VMs config.vm.box = "ubuntu/focal64" # Ubuntu 20.04 config.vm.provider "virtualbox" do |vb| vb.memory = "8192" vb.cpus = 2 end # Control Plane Node config.vm.define "cp-node" do |cp| cp.vm.hostname = "cp-node" cp.vm.network "private_network", type: "static", ip: "192.168.56.10" cp.vm.provider "virtualbox" do |vb| end cp.vm.provision "shell", inline: <<-SHELL sudo swapoff -a sudo sed -i '/ swap / s/^/#/' /etc/fstab sudo systemctl disable --now ufw sudo systemctl stop apparmor sudo systemctl disable --now apparmor SHELL end # Worker Node config.vm.define "worker-node" do |worker| worker.vm.hostname = "worker-node" worker.vm.network "private_network", type: "static", ip: "192.168.56.11" worker.vm.provision "shell", inline: <<-SHELL sudo swapoff -a sudo sed -i '/ swap / s/^/#/' /etc/fstab sudo systemctl disable --now ufw sudo systemctl stop apparmor sudo systemctl disable --now apparmor SHELL end end
Hypervisor : virtualbox
2 CPUs
8 GB RAM
40 GB is the default storage of the focal64 image (probably static)
192.168.56.10 the ip of the control panel node192.168.56.11 the ip of the worker node
!!! I had to edit the k8scp.sh. I added "--apiserver-advertise-address=192.168.56.10" in the kubeadm init command otherwise the worker node could not join. And then manually copy the .kube/config from the cp-node to the worker in order to be able to use the api.
Cilium doesn't need to be installed in every node ?
Let me know if you need more info.
0 -
Thanks in advance. I posted it twice by mistake.
0 -
Hi @yorgji,
I was unable to open the attached pdf document; please provide a txt file instead - it is a safer attachment format. Since vagrant is a personal choice, feel free to make adjustments as necessary to implement the following recommendations.
On the VirtualBox hypervisor, please ensure that:
- each VM is provisioned with only one bridged network adapter, with promiscuous mode enabled to allow all incoming traffic
- the 40GB vdisk is fully allocated, and not dynamic. All this needs to be explicitly declared. Assumptions with "probably" do not lead to successful outcomes
Editing the
kubeadm-config.yaml
manifest:- the
controlPlaneEndpoint
entry is recommended over the--apiserver-advertise-address
option due to its flexibility when it comes to HA cluster configuration in a later chapter. Ensure the.../hosts
file is correctly populated with control plane VM IP address and thek8scp
alias (same entry on both VMs) - in order to avoid IP range overlapping, change the subnet value as such
podSubnet: 10.200.0.0/16
Editing the
cilium-cni.yaml
manifest before launching the Cilium CNI:- around line 222 update the cidr value as such
cluster-pool-ipv4-cidr: "10.200.0.0/16"
Cilium doesn't need to be installed in every node ?
No.
And then manually copy the .kube/config from the cp-node to the worker in order to be able to use the api.
Not a recommended step, but not a showstopper either.
Regards,
-Chris0 -
Hello @chrispokorni,
Thanks for the help but it seems way too complicated.
Is there a tutorial to setup the environment locally using VMs instead of the using a Cloud provider?kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP
EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
cp-node Ready control-plane 18m v1.31.1 10.0.2.15
Ubuntu 20.04.6 LTS 5.4.0-205-generic containerd://1.7.25
worker-node NotReady 9m9s v1.31.1 10.0.2.15
Ubuntu 20.04.6 LTS 5.4.0-205-generic containerd://1.7.25kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS
RESTARTS AGE IP NODE NOMINATED NODE
READINESS GATES
kube-system cilium-bpdkv 0/1 Init:0/6 6 (46s
ago) 9m44s 10.0.2.15 worker-node
kube-system cilium-envoy-2nnb5 ago) 9m44s 10.0.2.15 worker-node 1/1 Running 1 (4m51s
kube-system cilium-envoy-frtsz 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system cilium-kkjtm 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system cilium-operator-5c7867ccd5-7zvvd 1/1 Running 019m 10.0.2.15 cp-node
kube-system coredns-7c65d6cfc9-d99tl 1/1 Running 0
19m 10.0.0.30 cp-node
kube-system coredns-7c65d6cfc9-tpvd5 1/1 Running 0
19m 10.0.0.123 cp-node
kube-system etcd-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-apiserver-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-controller-manager-cp-node 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-proxy-lvkpc 1/1 Running 0
19m 10.0.2.15 cp-node
kube-system kube-proxy-m5ntc ago) 9m44s 10.0.2.15 worker-node 1/1 Running 1 (4m51s
kube-system kube-scheduler-cp-node 1/1 Running 0
19m 10.0.2.15 cp-nodekubectl -n kube-system describe pod cilium-bpdkv
Name: cilium-bpdkv
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
...
Normal Scheduled 10m default-scheduler
Successfully assigned kube-system/cilium-bpdkv to worker-node
Normal Pulling 10m kubelet Pulling
image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39"Normal Pulled 9m18s kubelet
Successfully pulled image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39" in 33.006s (48.034s including waiting). Image
size: 223002645 bytes.
Warning BackOff 7m6s kubelet Back-
off restarting failed container config in pod cilium-bpdkv_kube-
system(aa383527-9bec-4324-9f80-b8b676126a5a)
Normal Created 6m51s (x3 over 9m18s) kubelet Created
container config
Normal Started 6m51s (x3 over 9m18s) kubelet Started
container config
Normal Pulled 6m51s (x2 over 8m12s) kubelet
Container image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39" already present on machine
Normal SandboxChanged 5m17s kubelet Pod
sandbox changed, it will be killed and re-created.
Warning BackOff 47s (x6 over 4m10s) kubelet Back-
off restarting failed container config in pod cilium-bpdkv_kube-
system(aa383527-9bec-4324-9f80-b8b676126a5a)
Normal Pulled 33s (x4 over 5m17s) kubelet
Container image
"quay.io/cilium/cilium:v1.16.1@sha256:0b4a3ab41a4760d86b7fc945b8783747ba27f2
9dac30dd434d94f2c9e3679f39" already present on machine
Normal Created 33s (x4 over 5m16s) kubelet Created
container config
Normal Started 33s (x4 over 5m16s) kubelet Started
container config
0 -
Hi @yorgji,
Video guides have only been generated for the AWS and GCP cloud providers.
Based on the output provided above, your worker node is NotReady, a state which prevents your cluster from fully initializing and deploying any workload. Unfortunately there is no band-aid solution here; a correctly configured cluster is required to move forward and successfully complete this training.As far as VirtualBox, its easy to use interface allows me to quickly provision the two required VMs to the specs described above. Following all the installation and config steps from the lab guide, with few custom edits mentioned above, should yield a fully functional cluster. I have done this countless times. You only have to build the cluster once, then stop the VMs when not actively working on lab exercises.
Regards,
-Chris0 -
Hello @chrispokorni ,
So after I run the k8scp.sh without any modifications :
1. set the controlPlaneEndpoint to 192.168.56.10(control plane ip)
2. and the /etc/hosts file in each node should contain192.168.56.10 k8scp 192.168.56.10 cp-node
?
3. And then update the cluster-pool-ipv4-cidr to cluster-pool-ipv4-cidr: "10.200.0.0/16" ? and if yes in which file ?0 -
Hi @yorgji,
Here are the details from my earlier response to you, with a minor edit for clarification:
On the VirtualBox hypervisor, please ensure that:
- each VM is provisioned with only one bridged network adapter, with promiscuous mode enabled to allow all incoming traffic
- the 40GB vdisk is fully allocated, and not dynamic. All this needs to be explicitly declared. Assumptions with "probably" do not lead to successful outcomes
Editing the
kubeadm-config.yaml
manifest:- the
controlPlaneEndpoint: "k8scp:6443"
entry is recommended over the--apiserver-advertise-address
option due to its flexibility when it comes to HA cluster configuration in a later chapter. Ensure the.../hosts
file is correctly populated with control plane VM IP address and thek8scp
alias (same entry on both VMs) - in order to avoid IP range overlapping, change the subnet value as such
podSubnet: 10.200.0.0/16
Editing the
cilium-cni.yaml
manifest before launching the Cilium CNI:- around line 222 update the cidr value as such
cluster-pool-ipv4-cidr: "10.200.0.0/16"
Regards,
-Chris0 -
Hello @chrispokorni ,
I really appreciate your help.
I updated the kubeadm init command in k8scp.sh file ->
sudo kubeadm init --control-plane-endpoint "k8scp:6443" --pod-network-cidr=10.200.0.0/16
I also updated the cilium install command in k8scp.sh file ->
cilium install --version 1.16.1 \ --set k8sServiceHost=k8scp \ --set k8sServicePort=6443 \ --set ipam.operator.clusterPoolIPv4PodCIDRList=10.200.0.0/16
and I updated the /etc/hosts as per your instruction.
And there is an improvement, now both nodes are ready but cilium still reports errors; like this one :
cilium cilium-zpzqf unable to retrieve cilium endpoint information: command failed (pod=kube-system/cilium-zpzqf, container=cilium-agent): unable to upgrade connection: pod does not exist
but it is running :
cilium-zpzqf 1/1 Running 0 13m
0 -
This is the result of the kubectl describe pod cilium-zpzqf :
Warning Unhealthy 24m (x3 over 24m) kubelet Startup probe failed: Get "http://127.0.0.1:9879/healthz": dial tcp 127.0.0.1:9879: connect: connection refused
0 -
Turns out that one of the coredns pods is failing too : Readiness probe failed: HTTP probe failed with statuscode: 503
0 -
Hi @yorgji,
There are a few things that puzzle me from your most recent comments.
Please clarify for me what course you are following - LFS258 Kubernetes Fundamentals or LFD259 Kubernetes for Developers?This forum is for the LFS258 course, yet some of your remarks are hinting that you are enrolled in a different course, the LFD259. The cluster bootstrapping is different between the 2 courses, and the troubleshooting help we provide here is very targeted.
Regards,
-Chris0 -
Hello @chrispokorni,
You are right, I'm enrolled in LFD259 Kubernetes for Developers.
Sorry about that, I didn't notice it.0 -
Hi @yorgji,
In that case you do not need the
k8scp
alias entry in any of thehosts
files, and remove it from any command if you supplied it as an argument. Perhaps populating thehosts
files on both machines with the cp and worker hostnames and their respective private IP addresses will also help.Regards,
-Chris0
Categories
- All Categories
- 232 LFX Mentorship
- 232 LFX Mentorship: Linux Kernel
- 812 Linux Foundation IT Professional Programs
- 365 Cloud Engineer IT Professional Program
- 183 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 151 Cloud Native Developer IT Professional Program
- 140 Express Training Courses & Microlearning
- 140 Express Courses - Discussion Forum
- Microlearning - Discussion Forum
- 6.4K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 71 LFC131 Class Forum
- 47 LFD102 Class Forum
- 229 LFD103 Class Forum
- 20 LFD110 Class Forum
- 44 LFD121 Class Forum
- LFD125 Class Forum
- 18 LFD133 Class Forum
- 8 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 5 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 712 LFD259 Class Forum
- 111 LFD272 Class Forum - Discontinued
- 4 LFD272-JP クラス フォーラム
- 13 LFD273 Class Forum
- 201 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 3 LFS116 Class Forum
- 7 LFS118 Class Forum
- LFS120 Class Forum
- 9 LFS142 Class Forum
- 8 LFS144 Class Forum
- 4 LFS145 Class Forum
- 3 LFS146 Class Forum
- 15 LFS148 Class Forum
- 15 LFS151 Class Forum
- 5 LFS157 Class Forum
- 49 LFS158 Class Forum
- LFS158-JP クラス フォーラム
- 10 LFS162 Class Forum
- 2 LFS166 Class Forum
- 5 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 33 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム - Discontinued
- 20 LFS203 Class Forum
- 135 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 2 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 50 LFS242 Class Forum
- 38 LFS243 Class Forum
- 16 LFS244 Class Forum
- 5 LFS245 Class Forum
- LFS246 Class Forum
- LFS248 Class Forum
- 54 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 156 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 10 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 11 LFS258-JP クラス フォーラム
- 135 LFS260 Class Forum
- 160 LFS261 Class Forum
- 43 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 25 LFS268 Class Forum
- 32 LFS269 Class Forum
- 6 LFS270 Class Forum
- 202 LFS272 Class Forum - Discontinued
- 2 LFS272-JP クラス フォーラム
- 4 LFS147 Class Forum
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 15 LFW111 Class Forum
- 262 LFW211 Class Forum
- 184 LFW212 Class Forum
- 15 SKF100 Class Forum
- 1 SKF200 Class Forum
- 2 SKF201 Class Forum
- 797 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 759 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 354 Ubuntu
- 470 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 95 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 69 Mobile Computing
- 18 Android
- 38 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 377 Off Topic
- 115 Introductions
- 175 Small Talk
- 26 Study Material
- 807 Programming and Development
- 304 Kernel Development
- 485 Software Development
- 1.8K Software
- 263 Applications
- 183 Command Line
- 3 Compiling/Installing
- 988 Games
- 317 Installation
- 103 All In Program
- 103 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)