Welcome to the Linux Foundation Forum!

LAB 3.2 v09.05 dead loop on virtual device cilium-vxlan, fix it urgently!

Options

adding 1 node works fine.
when the second node, exact same command is executed, Master goes into a loop and console shows

"dead loop on virtual device cilium-vxlan, fix it urgently!"

it may allow the second node to be added, then fails.

I lost several hours of my training day trying to make this work but didn't work.

How to fix it?

Comments

  • porrascarlos80
    Options

    even adding 1 node , message shows added, status seen not ready , after a while, k8scp goes down

    kubectl describe node node02
    Name: node02
    Roles:
    Labels: beta.kubernetes.io/arch=amd64
    beta.kubernetes.io/os=linux
    kubernetes.io/arch=amd64
    kubernetes.io/hostname=node02
    kubernetes.io/os=linux
    Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
    node.alpha.kubernetes.io/ttl: 0
    volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp: Tue, 12 Sep 2023 03:40:34 +0000
    Taints: node.kubernetes.io/not-ready:NoExecute
    node.cilium.io/agent-not-ready:NoSchedule
    node.kubernetes.io/not-ready:NoSchedule
    Unschedulable: false
    Lease:
    HolderIdentity: node02
    AcquireTime:
    RenewTime: Tue, 12 Sep 2023 03:41:55 +0000
    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason M essage
    ---- ------ ----------------- ------------------ ------ - ------
    MemoryPressure False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletHasSufficientMemory k ubelet has sufficient memory available
    DiskPressure False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletHasNoDiskPressure k ubelet has no disk pressure
    PIDPressure False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletHasSufficientPID k ubelet has sufficient PID available
    Ready False Tue, 12 Sep 2023 03:41:04 +0000 Tue, 12 Sep 2023 03:40:34 +0000 KubeletNotReady c ontainer runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
    Addresses:
    InternalIP: 192.168.1.22
    Hostname: node02
    Capacity:
    cpu: 2
    ephemeral-storage: 64188044Ki
    hugepages-1Gi: 0
    hugepages-2Mi: 0
    memory: 1964840Ki
    pods: 110
    Allocatable:
    cpu: 2
    ephemeral-storage: 59155701253
    hugepages-1Gi: 0
    hugepages-2Mi: 0
    memory: 1862440Ki
    pods: 110
    System Info:
    Machine ID: fd76f417257946eca2e98aab8cc4434f
    System UUID: 16f1ff4c-f455-fc43-a6da-13a2eb9f2b63
    Boot ID: c5425f60-2b97-4f41-9a7c-227d09add390
    Kernel Version: 5.4.0-150-generic
    OS Image: Ubuntu 20.04.6 LTS
    Operating System: linux
    Architecture: amd64
    Container Runtime Version: containerd://1.6.22
    Kubelet Version: v1.27.1
    Kube-Proxy Version: v1.27.1
    PodCIDR: 192.168.1.0/24
    PodCIDRs: 192.168.1.0/24
    Non-terminated Pods: (3 in total)
    Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits A ge
    --------- ---- ------------ ---------- --------------- ------------- - --
    kube-system cilium-operator-788c7d7585-rfdt6 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4 h49m
    kube-system cilium-xv4t2 100m (5%) 0 (0%) 100Mi (5%) 0 (0%) 8 4s
    kube-system kube-proxy-7x7bl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 8 4s
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    Resource Requests Limits
    -------- -------- ------
    cpu 100m (5%) 0 (0%)
    memory 100Mi (5%) 0 (0%)
    ephemeral-storage 0 (0%) 0 (0%)
    hugepages-1Gi 0 (0%) 0 (0%)
    hugepages-2Mi 0 (0%) 0 (0%)
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Starting 62s kube-proxy
    Normal RegisteredNode 84s node-controller Node node02 event: Registered Node node02 in Controlle r
    Normal NodeHasSufficientMemory 84s (x5 over 86s) kubelet Node node02 status is now: NodeHasSufficientMemory
    Normal NodeHasNoDiskPressure 84s (x5 over 86s) kubelet Node node02 status is now: NodeHasNoDiskPressure
    Normal NodeHasSufficientPID 84s (x5 over 86s) kubelet Node node02 status is now: NodeHasSufficientPID

  • chrispokorni
    chrispokorni Posts: 2,189
    edited September 2023
    Options

    Hi @porrascarlos80,

    Please provide details about your environment, such as the cloud provider or hypervisor used to provision the VMs, the guest OS release/version, VM CPU, VM RAM, VM disk, how many network interfaces per VM, private/public, network bridged/nat, private subnet range for the VMs, whether all ingress traffic is allowed (from all sources, to all port destinations, all protocols).

    This may help us to reproduce the behavior reported above.

    Regards,
    -Chris

  • porrascarlos80
    Options

    Problem appears if I follow instructions on the lab guide, lab 3.1 step 23
    V 2023-09-05
    applying cilium yaml.
    as a work around, I joined master and two nodes first.
    did the installation using this method :https://docs.cilium.io/en/stable/installation/k8s-install-kubeadm/

    and now nodes and master are in ready state with no errors. All pods are up and running!

    this is how my hosts file shows up

    192.168.1.20 k8scp
    192.168.1.21 node01
    192.168.1.22 node02
    127.0.0.1 localhost
    127.0.1.1 master01

    The following lines are desirable for IPv6 capable hosts

    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters

    and used this guide for troubleshooting the non ready state
    https://komodor.com/learn/how-to-fix-kubernetes-node-not-ready-error/

  • chrispokorni
    Options

    Hi @porrascarlos80,

    Thank you for the details provided above. While they do not answer the earlier questions, they provide enough information about you cluster in general.

    The installation method from docs.cilium.io installs cilium in a different manner than the way it was intended by the installation instructions of the course lab guide. It implements the Pod network and it uses guest OS components differently than the lab guide, and some later exercises may behave differently as a result.

    However, based on the hosts file entries provided, make sure that k8scp is an alias of the control plane node, and not the actual hostname of the control plane node.

    The IP addresses of the node VMs are from the 192.168.1.0 subnet. This subnet overlaps with the Pod network implemented by the cilium network plugin 192.168.0.0/16. Such overlaps should be avoided. The nodes network (aaa.bbb.ccc.ddd), the Pods network (192.168.0.0/16), and the Services network (10.96.0.0/12) should be distinct. Because of this overlap the installation method from the lab guide did not complete successfully on your cluster.

    If you are using a local hypervisor, managing the DHCP server is pretty straight forward, and all inbound traffic can be easily allowed from the hypervisor's settings.

    Regards,
    -Chris

  • malloc_failed
    Options

    @chrispokorni said:
    However, based on the hosts file entries provided, make sure that k8scp is an alias of the control plane node, and not the actual hostname of the control plane node.

    The IP addresses of the node VMs are from the 192.168.1.0 subnet. This subnet overlaps with the Pod network implemented by the cilium network plugin 192.168.0.0/16. Such overlaps should be avoided. The nodes network (aaa.bbb.ccc.ddd), the Pods network (192.168.0.0/16), and the Services network (10.96.0.0/12) should be distinct. Because of this overlap the installation method from the lab guide did not complete successfully on your cluster.
    -Chris

    I'd recommend updating text in the Lab Guide 3.x to explicitly state the above cilium yaml edits.

    I ran into the same time-waster when I originally ran section 3. Although it was just a matter of reading the logs, then reading the yaml & making the edits to ensure each subnet was different, it's something that brand-new readers might be overwhelmed by.
    Thanks.

  • mxsxs2
    Options

    This exact issue got me too.

    k8cps must point to the IP address of the Control Plane's/First node's IP address. In my it was on eth0 which was 192.168.1.225.

    This will clash with cilium's subnet so have to change cluster-pool-ipv4-cidr in the cilium yaml to "192.169.0.0/16" and podSubnet in the kubeadm-config.yaml to 192.169.0.0/16

    I would have loved if these notes were in the lab as I wasted a bit of time with this too.

  • chrispokorni
    Options

    Hi @mxsxs2,

    Please keep in mind that 192.169.0.0/16 is not a private CIDR. The pod network should be private.

    Regards,
    -Chris

  • jeruso76
    jeruso76 Posts: 1
    edited November 2023
    Options

    I change values and parameters in the file kubeadm-config.yaml

    apiVersion: kubeadm.k8s.io/v1beta3
    kind: ClusterConfiguration
    kubernetesVersion: 1.27.1
    controlPlaneEndpoint: "k8scp:6443"
    networking:
    podSubnet: 10.10.0.0/16 <-- change
    serviceSubnet: 10.96.0.0/12 <-- add

    Too follow this link mainly the install with helm.
    https://docs.cilium.io/en/stable/installation/k8s-install-kubeadm/

    Install Helm
    https://helm.sh/es/docs/intro/install/

    Setup Helm repository:
    helm repo add cilium https://helm.cilium.io/

    Deploy Cilium release via Helm:
    helm install cilium cilium/cilium --version 1.14.4 --namespace kube-system

    with this it worked for me..!!!

  • marco.ferretti
    Options

    I don't get why it is so hard to provide a vagrant image with the correct networking set up or at least a script to fix the vm fiddling. It took me days (am not a sysadmin) to fix the lab setup

  • ryangauthier
    Options

    I paid good money for this course and honestly would like a refund. I have spent a few hours working in the first few chapter sections and a couple of days sorting through the bugs and reading forum posts involving all sorts of issues with a supposed step by step setup of a k8s cluster. I can't even finish chapter 3. This is painstaking and frustrating.

  • fcioanca
    fcioanca Posts: 1,924
    Options

    Hi @ryangauthier

    You can use this forum to ask course-related questions, especially when you need assistance with lab exercises. The forums are moderated by course instructors and they will work with you to understand your lab environment setup and what may cause issues, and then provide guidance on how to move forward.

    Regards,
    Flavia
    The Linux Foundation Training Team

  • denismcx
    denismcx Posts: 1
    edited April 28
    Options

    Got the same issue as my home lab DHCP is configured to provide 192.168.1.0/24.
    To fix it and set 10.10.0.0/16 CIDR to cilium, simply run this command and continue the guide
    sed -i s'#cluster-pool-ipv4-cidr: "192.168.0.0/16"#cluster-pool-ipv4-cidr: "10.10.0.0/16"#'g $(find $HOME -name cilium-cni.yaml)

    It would be nice indeed that the course specifies the CIDR used by Cilium in the config file, and give the necessary instruction for changing it.

    Best
    Denis

Categories

Upcoming Training