Welcome to the Linux Foundation Forum!

Cilium pod is isolating the cp node.

Hello,

After applying the cilium-cni.yaml, the cp node is isolated and no connections (any port) are allowed, except the console.

It seems that cilium deployment is not finished, one of two pods are not running.

From what I can see, there is an issue between cilium.yaml file and apparmor.

ade@cp:~$ kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME cp Ready control-plane 28m v1.29.1 192.168.0.201 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.19 worker1 NotReady <none> 7m51s v1.29.1 192.168.0.27 <none> Ubuntu 20.04.6 LTS 5.4.0-190-generic containerd://1.7.19 ade@cp:~$ ade@cp:~$ ping 192.168.0.27 PING 192.168.0.27 (192.168.0.27) 56(84) bytes of data. 64 bytes from 192.168.0.27: icmp_seq=1 ttl=64 time=0.284 ms 64 bytes from 192.168.0.27: icmp_seq=2 ttl=64 time=0.411 ms ^C --- 192.168.0.27 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1019ms rtt min/avg/max/mdev = 0.284/0.347/0.411/0.063 ms

ade@cp:~$ kubectl apply -f /home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml serviceaccount/cilium unchanged serviceaccount/cilium-operator unchanged secret/cilium-ca unchanged secret/hubble-server-certs unchanged configmap/cilium-config unchanged clusterrole.rbac.authorization.k8s.io/cilium unchanged clusterrole.rbac.authorization.k8s.io/cilium-operator unchanged clusterrolebinding.rbac.authorization.k8s.io/cilium unchanged clusterrolebinding.rbac.authorization.k8s.io/cilium-operator unchanged role.rbac.authorization.k8s.io/cilium-config-agent unchanged rolebinding.rbac.authorization.k8s.io/cilium-config-agent unchanged service/hubble-peer unchanged daemonset.apps/cilium created deployment.apps/cilium-operator created ade@cp:~$ ade@cp:~$ ade@cp:~$ kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system cilium-7d2wv 0/1 Init:0/6 0 7s kube-system cilium-f7s4q 0/1 Init:4/6 0 7s kube-system cilium-operator-56bdb99ff6-vqntk 0/1 ContainerCreating 0 7s kube-system cilium-operator-56bdb99ff6-zm658 1/1 Running 0 7s kube-system coredns-76f75df574-7pdsg 0/1 Unknown 0 30m kube-system coredns-76f75df574-s5jnz 0/1 Unknown 0 30m kube-system etcd-cp 1/1 Running 1 (13m ago) 30m kube-system kube-apiserver-cp 1/1 Running 1 (13m ago) 30m kube-system kube-controller-manager-cp 1/1 Running 1 (13m ago) 30m kube-system kube-proxy-6thx7 1/1 Running 1 (13m ago) 30m kube-system kube-proxy-7qgl9 1/1 Running 0 10m kube-system kube-scheduler-cp 1/1 Running 1 (13m ago) 30m ade@cp:~$ ade@cp:~$



Answers

  • I am using Ubuntu 20.04 installed over Proxmox (as VM).
    I have installed cilium as normal user (not root).

  • chrispokorni
    chrispokorni Posts: 2,270

    Hi @andriesadelina,

    I suspect your VM IP addresses overlap the Cilium CNI pod network 192.168.0.0/16 that is set in the cilium-cni.yaml manifest. This can be modified at line 198 to a different subnet, that is distinct from your VM subnet and from the default Kubernetes Service subnet 10.96.0.0/12.

    You could try 10.200.0.0/16 for the pod network to avoid any future issues.

    First, remove cilium:
    kubectl delete -f /home/student/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

    Then edit the cilium-cni.yaml manifest at line 198 with a desired pod network, then re-deploy the cilium-cni.yaml manifest.

    Regards,
    -Chris

  • andriesadelina
    andriesadelina Posts: 5
    edited August 7

    Hello @chrispokorni ,

    I have modified the following, but the issue persists:

    • In file "ciulium-cni.yaml"
      198 cluster-pool-ipv4-cidr: "10.200.0.0/16"

    • In file "/root/kubeadm-config.yaml"
      networking: podSubnet: 10.10.0.0/16 serviceSubnet: 10.96.0.0/12

    After that I run the below command:
    ade@cp:~$ kubectl delete -f /home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": serviceaccounts "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": serviceaccounts "cilium-operator" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": secrets "cilium-ca" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": secrets "hubble-server-certs" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": configmaps "cilium-config" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterroles.rbac.authorization.k8s.io "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterroles.rbac.authorization.k8s.io "cilium-operator" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterrolebindings.rbac.authorization.k8s.io "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": clusterrolebindings.rbac.authorization.k8s.io "cilium-operator" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": roles.rbac.authorization.k8s.io "cilium-config-agent" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": rolebindings.rbac.authorization.k8s.io "cilium-config-agent" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": services "hubble-peer" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": daemonsets.apps "cilium" not found Error from server (NotFound): error when deleting "/home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml": deployments.apps "cilium-operator" not found

    `ade@cp:~$ kubectl get all -A -o wide
    NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    kube-system pod/etcd-cp 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
    kube-system pod/kube-apiserver-cp 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
    kube-system pod/kube-controller-manager-cp 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
    kube-system pod/kube-proxy-6thx7 1/1 Running 12 (10m ago) 2d2h 192.168.0.201 cp
    kube-system pod/kube-proxy-7qgl9 1/1 Running 1 (61m ago) 2d2h 192.168.0.27 worker1
    kube-system pod/kube-scheduler-cp 1/1 Running 12 (10m ago) 45m 192.168.0.201 cp

    NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
    default service/kubernetes ClusterIP 10.96.0.1 443/TCP 2d2h
    kube-system service/kube-dns ClusterIP 10.96.0.10 53/UDP,53/TCP,9153/TCP 2d2h k8s-app=kube-dns

    NAMESPACE NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE CONTAINERS IMAGES SELECTOR
    kube-system daemonset.apps/kube-proxy 2 2 1 2 1 kubernetes.io/os=linux 2d2h kube-proxy registry.k8s.io/kube-proxy:v1.29.1 k8s-app=kube-proxy`

    And apply again the cilium-cni.yaml:
    kubectl apply -f /home/ade/LFS458/SOLUTIONS/s_03/cilium-cni.yaml serviceaccount/cilium created serviceaccount/cilium-operator created secret/cilium-ca created secret/hubble-server-certs created configmap/cilium-config created clusterrole.rbac.authorization.k8s.io/cilium created clusterrole.rbac.authorization.k8s.io/cilium-operator created clusterrolebinding.rbac.authorization.k8s.io/cilium created clusterrolebinding.rbac.authorization.k8s.io/cilium-operator created role.rbac.authorization.k8s.io/cilium-config-agent created rolebinding.rbac.authorization.k8s.io/cilium-config-agent created service/hubble-peer created daemonset.apps/cilium created deployment.apps/cilium-operator created

    The two pods that are assigned to worker1 node are in "Pending" status:

    On worker1, I am receiving the below error:

    Could you please tell me what could be the issue?

    Thank you very much for your time!

  • andriesadelina
    andriesadelina Posts: 5
    edited August 8

    .

  • chrispokorni
    chrispokorni Posts: 2,270

    Hi @andriesadelina,

    There seems to be a mix of data in your source files. You seem to be following LFS258 while the cluster resources are attempted from LFS458.

    Please follow the training material (lectures, lab guide and lab resources/solutions) released for the course you enrolled in - LFS258.

    Please ensure that the hypervisor enabled a single bridged network adapter per VM. Keeping the VM's IP addresses on the 192.168.0.0/16 will avoid further confusion. Also, ensure the hypervisor does not block any ingress (inbound) traffic to the VMs (meaning that ALL protocols should be allowed to ALL port destinations from ALL sources).

    In an attempt to salvage your current cluster, please complete the following:

    1 - remove the Cilium installation (assuming you downloaded the correct LFS258 SOLUTIONS tarball, and replace "student" in the path with your user ID):
    student@cp:~$ kubectl delete -f /home/student/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

    2 - delete the worker1 node from the cluster (run command on control plane node, as regular non-root user)
    student@cp:~$ kubectl delete node worker1

    3 - reset the worker1 node (run the command as root on the worker1 node)
    root@worker1:~# kubeadm reset
    confirm the reset when prompted

    4 - reset the cp node (run the command as root on the cp node)
    root@cp:~# kubeadm reset
    confirm the reset when prompted

    5 - edit the /etc/hosts file on the control plane node with the required control plane alias k8scp assigned to the private IP of the control plane node (not the control plane node hostname) [lab guide exercise 3.1 step 19]

    ...
    192.168.x.x k8scp
    ...
    

    6 - correct the /root/kubeadm-config.yaml manifest on the control plane node as such (control plane alias and desired pod subnet CIDR for the CNI plugin) [lab guide exercise 3.1 step 20]

    apiVersion: kubeadm.k8s.io/v1beta3
    kind: ClusterConfiguration
    kubernetesVersion: 1.29.1
    controlPlaneEndpoint: "k8scp:6443"
    networking:
      podSubnet: 10.200.0.0/16
    

    7 - initialize the control plane (run the command as root on the cp node) [lab guide exercise 3.1 step 21]
    root@cp:~# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out

    8 - reset the cluster admin credentials for the non-root user [lab guide exercise 3.1 step 22]

    student@cp:~$ rm $HOME/.kube/config
    student@cp:~$ sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    student@cp:~$ sudo chown $(id -u):$(id -g) $HOME/.kube/config
    

    9 - update the cilium-cni.yaml manifest on line 198 with the desired pod CIDR, to match the one supplied in the kubeadm-config.yaml manifest earlier

    ...
    cluster-pool-ipv4-cidr: "10.200.0.0/16"
    ...
    

    10 - deploy cilium again [lab guide exercise 3.1 step 23]
    student@cp:~$ kubectl apply -f /path/to/cilium-cni.yaml

    11 - extract and copy the join command from the /root/kubeadm-init.out file generated by the init command on the control plane node (or just run the following command to generate it student@cp: ~$ sudo kubeadm token create --print-join-command)

    12 - edit the /etc/hosts file on the worker1 node with the required control plane alias k8scp assigned to the private IP of the control plane node (not the control plane node hostname, not the worker1 node hostname, not worker1 node private IP address) [lab guide exercise 3.2 step 12]

    13 - run the join command on the worker1 node (run the command as root on the worker1 node) [lab guide exercise 3.2 step 13]
    root@worker1:~# kubeadm join ....................

    If completing these steps does not produce a working cluster, please decommission the VMs, and start over with two new VMs, provisioned to match the networking requirements described above.

    Regards,
    -Chris

  • Hello @chrispokorni ,

    After I have performed all of the above steps, the issue was fixed.

    Thank you so much for your help!

Categories

Upcoming Training