Welcome to the Linux Foundation Forum!

Lab 3.3. - 3.4. calico/node is not ready

I have been following LFS258 lab exercises from 3.1 till 3.3.
Controller/Master and worker node installation and growing cluster has been succesfull and workder node reported it's status as ready.
I used docker as container engine:
docker version - 19.03.6
kube client & master version - v1.18.1

From controller/master node 'ip a' output I see that I have 'tunl0' interface created.

In chapter/exercise 3.4 when I execute 'kubectl create deployment nginx --image=nginx' I see that pod is stuck on 'ContainerCreating' status.
'kubectl -n default describe pod nginx-f89759699-frqxj' output shows me:

"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "64b6688978e818556f498fcd33b6d17f50a2f2da8ba8740a514801734adc810b" network for pod "nginx-f89759699-frqxj": networkPlugin cni failed to set up pod "nginx-f89759699-frqxj_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"

When I check calico pod on worker node I see that it's crashlooping - 'kubectl -n kube-system describe pod calico-node-jcdz5' shows me:

"Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1"
"Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused"
"Container calico-node failed liveness probe, will be restarted"
"Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory"

I have been following instructions step by step without any exceptions this far.
What am I doing wrong?

Comments

  • chrispokorni
    chrispokorni Posts: 1,267
    edited October 2020

    Hi @zrks,

    Calico is highly dependent on the node-to-node networking of your infrastructure. What is your infrastructure and how is it configured? Are you on a local hypervisor, cloud VMs? What type of firewalls do you have at infra level and/or at VM OS level?

    If it is just a glitch, you may try to delete the misbehaving pod and allow the controller to re-create it, or kubectl delete -f calico.yaml and then kubectl apply -f calico.yaml to re-deploy all calico related artifacts. If this does not resolve your issue, then you need to investigate the inter-node networking configuration.

    Regards,
    -Chris

  • Hi, I have the same problem.

    I´m using 2 virtualbox machines.
    Before joining the node I get this:
    kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE
    calico-kube-controllers-86bddfcff-jgqrl 1/1 Running 0 3m2s
    calico-node-f97ml 1/1 Running 0 3m2s
    coredns-f9fd979d6-9xj86 1/1 Running 0 6m26s
    coredns-f9fd979d6-ccs6l 1/1 Running 0 6m26s
    etcd-k8smaster 1/1 Running 0 6m42s
    kube-apiserver-k8smaster 1/1 Running 0 6m41s
    kube-controller-manager-k8smaster 1/1 Running 0 6m42s
    kube-proxy-vtsxq 1/1 Running 0 6m26s
    kube-scheduler-k8smaster 1/1 Running 0 6m42s

    And all looks right. After join the node I get this:

    vagrant$ kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE
    calico-kube-controllers-86bddfcff-jgqrl 1/1 Running 0 5m48s
    calico-node-74bbx 0/1 Running 0 51s
    calico-node-f97ml 1/1 Running 0 5m48s
    coredns-f9fd979d6-9xj86 1/1 Running 0 9m12s
    coredns-f9fd979d6-ccs6l 1/1 Running 0 9m12s
    etcd-k8smaster 1/1 Running 0 9m28s
    kube-apiserver-k8smaster 1/1 Running 0 9m27s
    kube-controller-manager-k8smaster 1/1 Running 0 9m28s
    kube-proxy-hnklj 1/1 Running 0 51s
    kube-proxy-vtsxq 1/1 Running 0 9m12s
    kube-scheduler-k8smaster 1/1 Running 0 9m28s

    I see the pod calico-node-74bbx that is not running and If I see the events:

    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedScheduling 6m36s (x5 over 6m52s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
    Normal Scheduled 6m26s default-scheduler Successfully assigned kube-system/calico-kube-controllers-86bddfcff-jgqrl to k8smaster
    Warning FailedCreatePodSandBox 6m25s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "70731438fd40ce4cde0f82936903a5372d703554150f20d08e62ac148f965f35" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
    Warning FailedCreatePodSandBox 6m20s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "45a9225d424a1e7ee246b09ec1b4b8a96a442c560928e23a2402d6aff329650e" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
    Warning FailedCreatePodSandBox 6m18s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f7a67cb7e42081870a94e469ccd99e93bf8cfb1c92cae1c60b209617013584f5" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
    Normal SandboxChanged 6m15s (x4 over 6m24s) kubelet, k8smaster Pod sandbox changed, it will be killed and re-created.
    Warning FailedCreatePodSandBox 6m15s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c1a72c68263c8f6fcadaa32f6fa31b9518224404c1ac447324b4278b854a6ab2" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
    Normal Pulling 6m13s kubelet, k8smaster Pulling image "docker.io/calico/kube-controllers:v3.17.2"
    Normal Pulled 6m6s kubelet, k8smaster Successfully pulled image "docker.io/calico/kube-controllers:v3.17.2" in 7.401631705s
    Normal Created 6m6s kubelet, k8smaster Created container calico-kube-controllers
    Normal Started 6m5s kubelet, k8smaster Started container calico-kube-controllers

    and it is true
    stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

    there is no nodename file there.

    I have tried to to delete and apply calico again and it does not resolve the problem.

    I agree that it must be a problem with the network but I really don't know what to check (all seems ok)

    any sugesstion?

    Thanks

  • Hi @emiliano.sutil,

    Every automation tool added into the infrastructure provisioning and then the Kubernetes cluster bootstrapping processes introduces extra configuration options that may adversely impact the cluster build process. It could by any number of things, from VM virtual hardware profile (CPU, mem, disk, NIC), guest OS, networking options, IP address management, host and guest firewall rules, permissions, etc...

    When provisioning VirtualBox VMs, it is important to follow the VM sizing guide found in the Overview section of Lab 3.1, the gest OS suggested, and most importantly ensure that the VM IP addresses assigned by the VirtualBox hypervisor do not overlap with the Pod IP network 192.168.0.0/16 managed by the Calico network plugin. For VM networking configuration it is recommended to enable the promiscuous mode and set it to allow all inbound traffic. Also, disabling all guest OS firewalls is recommended for the labs. Ina few similar cases the host OS firewall rules have also adversely impacted the Kubernetes cluster bootstrapping on VirtualBox.

    I noticed that you deviated from the installation steps, and are using k8smaster as the hostname of your master/control plane node. That is not the intent of the lab guide. k8smaster is assumed to be only an alias set in the /etc/hosts files, to help with cluster bootstrapping at first, and later with High Availability (HA) configuration.

    Regards,
    -Chris

  • thanks @chrispokorni for your answer

    I suspect the problem is with the overlap of ips (in fact, the overlap ;-) ). I'm going to test that hypothesis.
    I let you know if this fix my problem.

    Regards.

  • Hello @chrispokorni

    I think I have found my problem. I'll try to explain (perhaps this could be useful for someone):
    On the first hand I'm using virtual box with vagrant.
    I have seen that all my machines have 2 interfaces:
    enp0s3: 10.0.2.15
    enp0s8: 10.128.1.3 (on the first try, the ip overlaps with the calico settings)

    Well, all the machines have the same ip on the interface enp0s3: 10.0.2.15 and when I init the cluster it takes that ip it assumes that ip as the ip. For example you can see this on the output
    ...
    certs] etcd/server serving cert is signed for DNS names [localhost master] and IPs [10.0.2.15 127.0.0.1 ::1]
    [certs] Generating "etcd/peer" certificate and key
    [certs] etcd/peer serving cert is signed for DNS names [localhost master] and IPs [10.0.2.15 127.0.0.1 ::1]
    ...

    Well, I have run this command to init the cluster:
    kubeadm init --apiserver-advertise-address=10.128.1.3 --apiserver-cert-extra-sans=10.128.1.3 --node-name k8smaster --pod-network-cidr=192.168.0.0/16

    And now all it works as expected.

    My only question now is: how can I init the cluster with this parameters on the the kubeadm-config.yaml?

    Do you see any problem with this?

    Thanks.

  • thinzaung
    thinzaung Posts: 2

    Hi @emiliano.sutil Thanks so much for the solution. I had the exact same problem like you as I built my VMs using vagrant and ended up having 2 NICs. I used the same kubeadm init command that you mentioned above and it works for me. let me play around more on how to add these configuration into kubeadm config yaml file.

    Thanks
    Thin

  • torin42
    torin42 Posts: 1

    I had the same problem with Virtualbox running Ubuntu 20.04. Both of the VMs have 2 network interfaces, a NAT for internet and an Internal Network for communicating with each other. Since the NAT is the first interface, kubadm picked that IP and caused the issue. @emiliano.sutil 's solution worked perfectly to solve the problem.

  • @emiliano.sutil Thanks so much for your post. I also used Virtual box with Vagrant with 2 interfaces also defined ([email protected] and [email protected]). I tried to run kubeadm init with predefined config file, but my pod from worker-1 node is always "CrashLoopBackOff".

    After running following init command, my work-1 node has successfully joined control nodel.
    kubeadm init --apiserver-advertise-address=192.168.5.11 --apiserver-cert-extra-sans=192.168.5.11 --node-name master-1 --pod-network-cidr=192.168.0.0/16

    I definitely would give it a try on kubeadm config yaml file, probably some file like following:
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterConfiguration
    kubernetesVersion: 1.21.1
    controlPlaneEndpoint: "k8scp:6443"
    networking:
    podSubnet: 192.168.0.0/16
    localAPIEndpoint:
    advertiseAddress: 192.168.5.11
    bindPort: 443

    Thanks
    Shao

Categories

Upcoming Training