LAB 3.2 v09.05 dead loop on virtual device cilium-vxlan, fix it urgently!

adding 1 node works fine.
when the second node, exact same command is executed, Master goes into a loop and console shows
"dead loop on virtual device cilium-vxlan, fix it urgently!"
it may allow the second node to be added, then fails.
I lost several hours of my training day trying to make this work but didn't work.
How to fix it?
Comments
-
even adding 1 node , message shows added, status seen not ready , after a while, k8scp goes down
kubectl describe node node02
Name: node02
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node02
kubernetes.io/os=linux
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Tue, 12 Sep 2023 03:40:34 +0000
Taints: node.kubernetes.io/not-ready:NoExecute
node.cilium.io/agent-not-ready:NoSchedule
node.kubernetes.io/not-ready:NoSchedule
Unschedulable: false
Lease:
HolderIdentity: node02
AcquireTime:
RenewTime: Tue, 12 Sep 2023 03:41:55 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason M essage
---- ------ ----------------- ------------------ ------ - ------
MemoryPressure False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletHasSufficientMemory k ubelet has sufficient memory available
DiskPressure False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletHasNoDiskPressure k ubelet has no disk pressure
PIDPressure False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletHasSufficientPID k ubelet has sufficient PID available
Ready False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletNotReady c ontainer runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
InternalIP: 192.168.1.22
Hostname: node02
Capacity:
cpu: 2
ephemeral-storage: 64188044Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1964840Ki
pods: 110
Allocatable:
cpu: 2
ephemeral-storage: 59155701253
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 1862440Ki
pods: 110
System Info:
Machine ID: fd76f417257946eca2e98aab8cc4434f
System UUID: 16f1ff4c-f455-fc43-a6da-13a2eb9f2b63
Boot ID: c5425f60-2b97-4f41-9a7c-227d09add390
Kernel Version: 5.4.0-150-generic
OS Image: Ubuntu 20.04.6 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: containerd://1.6.22
Kubelet Version: v1.27.1
Kube-Proxy Version: v1.27.1
PodCIDR: 192.168.1.0/24
PodCIDRs: 192.168.1.0/24
Non-terminated Pods: (3 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits A ge
--------- ---- ------------ ---------- --------------- ------------- - --
kube-system cilium-operator-788c7d7585-rfdt6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4 h49m
kube-system cilium-xv4t2 100m (5%) 0 (0%) 100Mi (5%) 0 (0%) 8 4s
kube-system kube-proxy-7x7bl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8 4s
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 100m (5%) 0 (0%)
memory 100Mi (5%) 0 (0%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Starting 62s kube-proxy
Normal RegisteredNode 84s node-controller Node node02 event: Registered Node node02 in Controlle r
Normal NodeHasSufficientMemory 84s (x5 over 86s) kubelet Node node02 status is now: NodeHasSufficientMemory
Normal NodeHasNoDiskPressure 84s (x5 over 86s) kubelet Node node02 status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 84s (x5 over 86s) kubelet Node node02 status is now: NodeHasSufficientPID0 -
Hi @porrascarlos80,
Please provide details about your environment, such as the cloud provider or hypervisor used to provision the VMs, the guest OS release/version, VM CPU, VM RAM, VM disk, how many network interfaces per VM, private/public, network bridged/nat, private subnet range for the VMs, whether all ingress traffic is allowed (from all sources, to all port destinations, all protocols).
This may help us to reproduce the behavior reported above.
Regards,
-Chris0 -
Problem appears if I follow instructions on the lab guide, lab 3.1 step 23
V 2023-09-05
applying cilium yaml.
as a work around, I joined master and two nodes first.
did the installation using this method :https://docs.cilium.io/en/stable/installation/k8s-install-kubeadm/and now nodes and master are in ready state with no errors. All pods are up and running!
this is how my hosts file shows up
192.168.1.20 k8scp
192.168.1.21 node01
192.168.1.22 node02
127.0.0.1 localhost
127.0.1.1 master01The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allroutersand used this guide for troubleshooting the non ready state
https://komodor.com/learn/how-to-fix-kubernetes-node-not-ready-error/0 -
Hi @porrascarlos80,
Thank you for the details provided above. While they do not answer the earlier questions, they provide enough information about you cluster in general.
The installation method from docs.cilium.io installs cilium in a different manner than the way it was intended by the installation instructions of the course lab guide. It implements the Pod network and it uses guest OS components differently than the lab guide, and some later exercises may behave differently as a result.
However, based on the hosts file entries provided, make sure that k8scp is an alias of the control plane node, and not the actual hostname of the control plane node.
The IP addresses of the node VMs are from the 192.168.1.0 subnet. This subnet overlaps with the Pod network implemented by the cilium network plugin 192.168.0.0/16. Such overlaps should be avoided. The nodes network (aaa.bbb.ccc.ddd), the Pods network (192.168.0.0/16), and the Services network (10.96.0.0/12) should be distinct. Because of this overlap the installation method from the lab guide did not complete successfully on your cluster.
If you are using a local hypervisor, managing the DHCP server is pretty straight forward, and all inbound traffic can be easily allowed from the hypervisor's settings.
Regards,
-Chris0 -
@chrispokorni said:
However, based on the hosts file entries provided, make sure that k8scp is an alias of the control plane node, and not the actual hostname of the control plane node.The IP addresses of the node VMs are from the 192.168.1.0 subnet. This subnet overlaps with the Pod network implemented by the cilium network plugin 192.168.0.0/16. Such overlaps should be avoided. The nodes network (aaa.bbb.ccc.ddd), the Pods network (192.168.0.0/16), and the Services network (10.96.0.0/12) should be distinct. Because of this overlap the installation method from the lab guide did not complete successfully on your cluster.
-ChrisI'd recommend updating text in the Lab Guide 3.x to explicitly state the above cilium yaml edits.
I ran into the same time-waster when I originally ran section 3. Although it was just a matter of reading the logs, then reading the yaml & making the edits to ensure each subnet was different, it's something that brand-new readers might be overwhelmed by.
Thanks.0 -
This exact issue got me too.
k8cps must point to the IP address of the Control Plane's/First node's IP address. In my it was on eth0 which was 192.168.1.225.
This will clash with cilium's subnet so have to change cluster-pool-ipv4-cidr in the cilium yaml to "192.169.0.0/16" and podSubnet in the kubeadm-config.yaml to 192.169.0.0/16
I would have loved if these notes were in the lab as I wasted a bit of time with this too.
0 -
Hi @mxsxs2,
Please keep in mind that 192.169.0.0/16 is not a private CIDR. The pod network should be private.
Regards,
-Chris0 -
I change values and parameters in the file kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: 1.27.1
controlPlaneEndpoint: "k8scp:6443"
networking:
podSubnet: 10.10.0.0/16 <-- change
serviceSubnet: 10.96.0.0/12 <-- addToo follow this link mainly the install with helm.
https://docs.cilium.io/en/stable/installation/k8s-install-kubeadm/Install Helm
https://helm.sh/es/docs/intro/install/Setup Helm repository:
helm repo add cilium https://helm.cilium.io/Deploy Cilium release via Helm:
helm install cilium cilium/cilium --version 1.14.4 --namespace kube-systemwith this it worked for me..!!!
0
Categories
- All Categories
- 51 LFX Mentorship
- 104 LFX Mentorship: Linux Kernel
- 576 Linux Foundation IT Professional Programs
- 304 Cloud Engineer IT Professional Program
- 125 Advanced Cloud Engineer IT Professional Program
- 53 DevOps Engineer IT Professional Program
- 61 Cloud Native Developer IT Professional Program
- 5 Express Training Courses
- 5 Express Courses - Discussion Forum
- 2K Training Courses
- 19 LFC110 Class Forum
- 7 LFC131 Class Forum
- 27 LFD102 Class Forum
- 157 LFD103 Class Forum
- 20 LFD121 Class Forum
- 1 LFD137 Class Forum
- 61 LFD201 Class Forum
- 1 LFD210 Class Forum
- LFD210-CN Class Forum
- 1 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum
- LFD237 Class Forum
- 23 LFD254 Class Forum
- 611 LFD259 Class Forum
- 105 LFD272 Class Forum
- 1 LFD272-JP クラス フォーラム
- 1 LFD273 Class Forum
- 2 LFS145 Class Forum
- 24 LFS200 Class Forum
- 739 LFS201 Class Forum
- 1 LFS201-JP クラス フォーラム
- 11 LFS203 Class Forum
- 75 LFS207 Class Forum
- 300 LFS211 Class Forum
- 54 LFS216 Class Forum
- 47 LFS241 Class Forum
- 41 LFS242 Class Forum
- 37 LFS243 Class Forum
- 11 LFS244 Class Forum
- 36 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 140 LFS253 Class Forum
- LFS254 Class Forum
- 1.1K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 93 LFS260 Class Forum
- 132 LFS261 Class Forum
- 33 LFS262 Class Forum
- 80 LFS263 Class Forum
- 15 LFS264 Class Forum
- 11 LFS266 Class Forum
- 18 LFS267 Class Forum
- 17 LFS268 Class Forum
- 23 LFS269 Class Forum
- 203 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- LFS281 Class Forum
- 235 LFW211 Class Forum
- 172 LFW212 Class Forum
- 7 SKF100 Class Forum
- SKF200 Class Forum
- 902 Hardware
- 219 Drivers
- 74 I/O Devices
- 44 Monitors
- 115 Multimedia
- 209 Networking
- 101 Printers & Scanners
- 85 Storage
- 763 Linux Distributions
- 88 Debian
- 66 Fedora
- 15 Linux Mint
- 13 Mageia
- 24 openSUSE
- 142 Red Hat Enterprise
- 33 Slackware
- 13 SUSE Enterprise
- 357 Ubuntu
- 479 Linux System Administration
- 41 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 95 Linux Security
- 78 Network Management
- 108 System Management
- 49 Web Management
- 68 Mobile Computing
- 23 Android
- 30 Development
- 1.2K New to Linux
- 1.1K Getting Started with Linux
- 537 Off Topic
- 131 Introductions
- 217 Small Talk
- 21 Study Material
- 826 Programming and Development
- 278 Kernel Development
- 514 Software Development
- 928 Software
- 260 Applications
- 184 Command Line
- 3 Compiling/Installing
- 76 Games
- 316 Installation
- 61 All In Program
- 61 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)