Cilium pod is isolating the cp node.

andriesadelina · August 2024

Hello,

After applying the cilium-cni.yaml, the cp node is isolated and no connections (any port) are allowed, except the console.

It seems that cilium deployment is not finished, one of two pods are not running.

From what I can see, there is an issue between cilium.yaml file and apparmor.

ade@cp:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME cp Ready control-plane 28m v1.29.1 192.168.0.201 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.19 worker1 NotReady <none> 7m51s v1.29.1 192.168.0.27 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.19 ade@cp:~$ ade@cp:~$ ping 192.168.0.27 PING 192.168.0.27 (192.168.0.27) 56(84) bytes of data. 64 bytes from 192.168.0.27: icmp_seq=1 ttl=64 time=0.284 ms 64 bytes from 192.168.0.27: icmp_seq=2 ttl=64 time=0.411 ms ^C --- 192.168.0.27 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1019ms rtt min/avg/max/mdev = 0.284/0.347/0.411/0.063 ms

ade@cp:~$ kubectl apply -f /home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml serviceaccount/cilium unchanged serviceaccount/cilium-operator unchanged secret/cilium-ca unchanged secret/hubble-server-certs unchanged configmap/cilium-config unchanged clusterrole.rbac.authorization.k8s.io/cilium unchanged clusterrole.rbac.authorization.k8s.io/cilium-operator unchanged clusterrolebinding.rbac.authorization.k8s.io/cilium unchanged clusterrolebinding.rbac.authorization.k8s.io/cilium-operator unchanged role.rbac.authorization.k8s.io/cilium-config-agent unchanged rolebinding.rbac.authorization.k8s.io/cilium-config-agent unchanged service/hubble-peer unchanged daemonset.apps/cilium created deployment.apps/cilium-operator created ade@cp:~$ ade@cp:~$ ade@cp:~$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system cilium-7d2wv 0/1 Init:0/6 0 7s kube-system cilium-f7s4q 0/1 Init:4/6 0 7s kube-system cilium-operator-56bdb99ff6-vqntk 0/1 ContainerCreating 0 7s kube-system cilium-operator-56bdb99ff6-zm658 1/1 Running 0 7s kube-system coredns-76f75df574-7pdsg 0/1 Unknown 0 30m kube-system coredns-76f75df574-s5jnz 0/1 Unknown 0 30m kube-system etcd-cp 1/1 Running 1 (13m ago) 30m kube-system kube-apiserver-cp 1/1 Running 1 (13m ago) 30m kube-system kube-controller-manager-cp 1/1 Running 1 (13m ago) 30m kube-system kube-proxy-6thx7 1/1 Running 1 (13m ago) 30m kube-system kube-proxy-7qgl9 1/1 Running 0 10m kube-system kube-scheduler-cp 1/1 Running 1 (13m ago) 30m ade@cp:~$ ade@cp:~$

andriesadelina · August 2024

I am using Ubuntu 20.04 installed over Proxmox (as VM).
I have installed cilium as normal user (not root).

chrispokorni · August 2024

Hi @andriesadelina,

I suspect your VM IP addresses overlap the Cilium CNI pod network 192.168.0.0/16 that is set in the cilium-cni.yaml manifest. This can be modified at line 198 to a different subnet, that is distinct from your VM subnet and from the default Kubernetes Service subnet 10.96.0.0/12.

You could try 10.200.0.0/16 for the pod network to avoid any future issues.

First, remove cilium:
kubectl delete -f /home/student/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

Then edit the cilium-cni.yaml manifest at line 198 with a desired pod network, then re-deploy the cilium-cni.yaml manifest.

Regards,
-Chris

andriesadelina · August 2024

Hello @chrispokorni ,

I have modified the following, but the issue persists:

In file "ciulium-cni.yaml"
198 cluster-pool-ipv4-cidr: "10.200.0.0/16"
In file "/root/kubeadm-config.yaml"
networking: podSubnet: 10.10.0.0/16 serviceSubnet: 10.96.0.0/12

After that I run the below command:
ade@cp:~$ kubectl delete -f /home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": serviceaccounts "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": serviceaccounts "cilium-operator" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": secrets "cilium-ca" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": secrets "hubble-server-certs" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": configmaps "cilium-config" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterroles.rbac.authorization.k8s.io "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterroles.rbac.authorization.k8s.io "cilium-operator" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterrolebindings.rbac.authorization.k8s.io "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterrolebindings.rbac.authorization.k8s.io "cilium-operator" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": roles.rbac.authorization.k8s.io "cilium-config-agent" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": rolebindings.rbac.authorization.k8s.io "cilium-config-agent" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": services "hubble-peer" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": daemonsets.apps "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": deployments.apps "cilium-operator" not found

`ade@cp:~$ kubectl get all -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system pod/etcd-cp 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
kube-system pod/kube-apiserver-cp 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
kube-system pod/kube-controller-manager-cp 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
kube-system pod/kube-proxy-6thx7 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
kube-system pod/kube-proxy-7qgl9 1/1 Running 1 (61m ago) 2d2h 192.168.0.27 worker1
kube-system pod/kube-scheduler-cp 1/1 Running 12 (10m ago) 45m 192.168.0.201 cp

NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 443/TCP 2d2h
kube-system service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 2d2h k8s-app=kube-dns

NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
kube-system daemonset.apps/kube-proxy 2 2 1 2 1 kubernetes.io/os=linux 2d2h kube-proxy registry.k8s.io/kube-proxy:v1.29.1 k8s-app=kube-proxy`

And apply again the cilium-cni.yaml:
kubectl apply -f /home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml serviceaccount/cilium created serviceaccount/cilium-operator created secret/cilium-ca created secret/hubble-server-certs created configmap/cilium-config created clusterrole.rbac.authorization.k8s.io/cilium created clusterrole.rbac.authorization.k8s.io/cilium-operator created clusterrolebinding.rbac.authorization.k8s.io/cilium created clusterrolebinding.rbac.authorization.k8s.io/cilium-operator created role.rbac.authorization.k8s.io/cilium-config-agent created rolebinding.rbac.authorization.k8s.io/cilium-config-agent created service/hubble-peer created daemonset.apps/cilium created deployment.apps/cilium-operator created

The two pods that are assigned to worker1 node are in "Pending" status:

On worker1, I am receiving the below error:

Could you please tell me what could be the issue?

Thank you very much for your time!

andriesadelina · August 2024

.

chrispokorni · August 2024

Hi @andriesadelina,

There seems to be a mix of data in your source files. You seem to be following LFS258 while the cluster resources are attempted from LFS458.

Please follow the training material (lectures, lab guide and lab resources/solutions) released for the course you enrolled in - LFS258.

Please ensure that the hypervisor enabled a single bridged network adapter per VM. Keeping the VM's IP addresses on the 192.168.0.0/16 will avoid further confusion. Also, ensure the hypervisor does not block any ingress (inbound) traffic to the VMs (meaning that ALL protocols should be allowed to ALL port destinations from ALL sources).

In an attempt to salvage your current cluster, please complete the following:

1 - remove the Cilium installation (assuming you downloaded the correct LFS258 SOLUTIONS tarball, and replace "student" in the path with your user ID):
student@cp:~$ kubectl delete -f /home/student/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

2 - delete the worker1 node from the cluster (run command on control plane node, as regular non-root user)
student@cp:~$ kubectl delete node worker1

3 - reset the worker1 node (run the command as root on the worker1 node)
root@worker1:~# kubeadm reset
confirm the reset when prompted

4 - reset the cp node (run the command as root on the cp node)
root@cp:~# kubeadm reset
confirm the reset when prompted

5 - edit the /etc/hosts file on the control plane node with the required control plane alias k8scp assigned to the private IP of the control plane node (not the control plane node hostname) [lab guide exercise 3.1 step 19]

...
192.168.x.x k8scp
...

6 - correct the /root/kubeadm-config.yaml manifest on the control plane node as such (control plane alias and desired pod subnet CIDR for the CNI plugin) [lab guide exercise 3.1 step 20]

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: 1.29.1
controlPlaneEndpoint: "k8scp:6443"
networking:
  podSubnet: 10.200.0.0/16

7 - initialize the control plane (run the command as root on the cp node) [lab guide exercise 3.1 step 21]
root@cp:~# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out

8 - reset the cluster admin credentials for the non-root user [lab guide exercise 3.1 step 22]

student@cp:~$ rm $HOME/.kube/config
student@cp:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
student@cp:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config

9 - update the cilium-cni.yaml manifest on line 198 with the desired pod CIDR, to match the one supplied in the kubeadm-config.yaml manifest earlier

...
cluster-pool-ipv4-cidr: "10.200.0.0/16"
...

10 - deploy cilium again [lab guide exercise 3.1 step 23]
student@cp:~$ kubectl apply -f /path/to/cilium-cni.yaml

11 - extract and copy the join command from the /root/kubeadm-init.out file generated by the init command on the control plane node (or just run the following command to generate it student@cp: ~$ sudo kubeadm token create --print-join-command)

12 - edit the /etc/hosts file on the worker1 node with the required control plane alias k8scp assigned to the private IP of the control plane node (not the control plane node hostname, not the worker1 node hostname, not worker1 node private IP address) [lab guide exercise 3.2 step 12]

13 - run the join command on the worker1 node (run the command as root on the worker1 node) [lab guide exercise 3.2 step 13]
root@worker1:~# kubeadm join ....................

If completing these steps does not produce a working cluster, please decommission the VMs, and start over with two new VMs, provisioned to match the networking requirements described above.

Regards,
-Chris

andriesadelina · August 2024

Hello @chrispokorni ,

After I have performed all of the above steps, the issue was fixed.

Thank you so much for your help!

Cilium pod is isolating the cp node.

Answers

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)