Lab 3.3. - 3.4. calico/node is not ready

zrks · October 2020

I have been following LFS258 lab exercises from 3.1 till 3.3.
Controller/Master and worker node installation and growing cluster has been succesfull and workder node reported it's status as ready.
I used docker as container engine:
docker version - 19.03.6
kube client & master version - v1.18.1

From controller/master node 'ip a' output I see that I have 'tunl0' interface created.

In chapter/exercise 3.4 when I execute 'kubectl create deployment nginx --image=nginx' I see that pod is stuck on 'ContainerCreating' status.
'kubectl -n default describe pod nginx-f89759699-frqxj' output shows me:

"Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "64b6688978e818556f498fcd33b6d17f50a2f2da8ba8740a514801734adc810b" network for pod "nginx-f89759699-frqxj": networkPlugin cni failed to set up pod "nginx-f89759699-frqxj_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/"

When I check calico pod on worker node I see that it's crashlooping - 'kubectl -n kube-system describe pod calico-node-jcdz5' shows me:

"Liveness probe failed: calico/node is not ready: bird/confd is not live: exit status 1"
"Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp 127.0.0.1:9099: connect: connection refused"
"Container calico-node failed liveness probe, will be restarted"
"Readiness probe failed: calico/node is not ready: BIRD is not ready: Failed to stat() nodename file: stat /var/lib/calico/nodename: no such file or directory"

I have been following instructions step by step without any exceptions this far.
What am I doing wrong?

chrispokorni · October 2020

Hi @zrks,

Calico is highly dependent on the node-to-node networking of your infrastructure. What is your infrastructure and how is it configured? Are you on a local hypervisor, cloud VMs? What type of firewalls do you have at infra level and/or at VM OS level?

If it is just a glitch, you may try to delete the misbehaving pod and allow the controller to re-create it, or kubectl delete -f calico.yaml and then kubectl apply -f calico.yaml to re-deploy all calico related artifacts. If this does not resolve your issue, then you need to investigate the inter-node networking configuration.

Regards,
-Chris

emiliano.sutil · February 2021

Hi, I have the same problem.

I´m using 2 virtualbox machines.
Before joining the node I get this:
kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-86bddfcff-jgqrl 1/1 Running 0 3m2s
calico-node-f97ml 1/1 Running 0 3m2s
coredns-f9fd979d6-9xj86 1/1 Running 0 6m26s
coredns-f9fd979d6-ccs6l 1/1 Running 0 6m26s
etcd-k8smaster 1/1 Running 0 6m42s
kube-apiserver-k8smaster 1/1 Running 0 6m41s
kube-controller-manager-k8smaster 1/1 Running 0 6m42s
kube-proxy-vtsxq 1/1 Running 0 6m26s
kube-scheduler-k8smaster 1/1 Running 0 6m42s

And all looks right. After join the node I get this:

vagrant$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-86bddfcff-jgqrl 1/1 Running 0 5m48s
calico-node-74bbx 0/1 Running 0 51s
calico-node-f97ml 1/1 Running 0 5m48s
coredns-f9fd979d6-9xj86 1/1 Running 0 9m12s
coredns-f9fd979d6-ccs6l 1/1 Running 0 9m12s
etcd-k8smaster 1/1 Running 0 9m28s
kube-apiserver-k8smaster 1/1 Running 0 9m27s
kube-controller-manager-k8smaster 1/1 Running 0 9m28s
kube-proxy-hnklj 1/1 Running 0 51s
kube-proxy-vtsxq 1/1 Running 0 9m12s
kube-scheduler-k8smaster 1/1 Running 0 9m28s

I see the pod calico-node-74bbx that is not running and If I see the events:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 6m36s (x5 over 6m52s) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 6m26s default-scheduler Successfully assigned kube-system/calico-kube-controllers-86bddfcff-jgqrl to k8smaster
Warning FailedCreatePodSandBox 6m25s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "70731438fd40ce4cde0f82936903a5372d703554150f20d08e62ac148f965f35" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 6m20s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "45a9225d424a1e7ee246b09ec1b4b8a96a442c560928e23a2402d6aff329650e" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Warning FailedCreatePodSandBox 6m18s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "f7a67cb7e42081870a94e469ccd99e93bf8cfb1c92cae1c60b209617013584f5" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Normal SandboxChanged 6m15s (x4 over 6m24s) kubelet, k8smaster Pod sandbox changed, it will be killed and re-created.
Warning FailedCreatePodSandBox 6m15s kubelet, k8smaster Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "c1a72c68263c8f6fcadaa32f6fa31b9518224404c1ac447324b4278b854a6ab2" network for pod "calico-kube-controllers-86bddfcff-jgqrl": networkPlugin cni failed to set up pod "calico-kube-controllers-86bddfcff-jgqrl_kube-system" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
Normal Pulling 6m13s kubelet, k8smaster Pulling image "docker.io/calico/kube-controllers:v3.17.2"
Normal Pulled 6m6s kubelet, k8smaster Successfully pulled image "docker.io/calico/kube-controllers:v3.17.2" in 7.401631705s
Normal Created 6m6s kubelet, k8smaster Created container calico-kube-controllers
Normal Started 6m5s kubelet, k8smaster Started container calico-kube-controllers

and it is true
stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/

there is no nodename file there.

I have tried to to delete and apply calico again and it does not resolve the problem.

I agree that it must be a problem with the network but I really don't know what to check (all seems ok)

any sugesstion?

Thanks

chrispokorni · February 2021

Hi @emiliano.sutil,

Every automation tool added into the infrastructure provisioning and then the Kubernetes cluster bootstrapping processes introduces extra configuration options that may adversely impact the cluster build process. It could by any number of things, from VM virtual hardware profile (CPU, mem, disk, NIC), guest OS, networking options, IP address management, host and guest firewall rules, permissions, etc...

When provisioning VirtualBox VMs, it is important to follow the VM sizing guide found in the Overview section of Lab 3.1, the gest OS suggested, and most importantly ensure that the VM IP addresses assigned by the VirtualBox hypervisor do not overlap with the Pod IP network 192.168.0.0/16 managed by the Calico network plugin. For VM networking configuration it is recommended to enable the promiscuous mode and set it to allow all inbound traffic. Also, disabling all guest OS firewalls is recommended for the labs. Ina few similar cases the host OS firewall rules have also adversely impacted the Kubernetes cluster bootstrapping on VirtualBox.

I noticed that you deviated from the installation steps, and are using k8smaster as the hostname of your master/control plane node. That is not the intent of the lab guide. k8smaster is assumed to be only an alias set in the /etc/hosts files, to help with cluster bootstrapping at first, and later with High Availability (HA) configuration.

Regards,
-Chris

emiliano.sutil · February 2021

thanks @chrispokorni for your answer

I suspect the problem is with the overlap of ips (in fact, the overlap ;-) ). I'm going to test that hypothesis.
I let you know if this fix my problem.

Regards.

emiliano.sutil · February 2021

Hello @chrispokorni

I think I have found my problem. I'll try to explain (perhaps this could be useful for someone):
On the first hand I'm using virtual box with vagrant.
I have seen that all my machines have 2 interfaces:
enp0s3: 10.0.2.15
enp0s8: 10.128.1.3 (on the first try, the ip overlaps with the calico settings)

Well, all the machines have the same ip on the interface enp0s3: 10.0.2.15 and when I init the cluster it takes that ip it assumes that ip as the ip. For example you can see this on the output
...
certs] etcd/server serving cert is signed for DNS names [localhost master] and IPs [10.0.2.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master] and IPs [10.0.2.15 127.0.0.1 ::1]
...

Well, I have run this command to init the cluster:
kubeadm init --apiserver-advertise-address=10.128.1.3 --apiserver-cert-extra-sans=10.128.1.3 --node-name k8smaster --pod-network-cidr=192.168.0.0/16

And now all it works as expected.

My only question now is: how can I init the cluster with this parameters on the the kubeadm-config.yaml?

Do you see any problem with this?

Thanks.

thinzaung · May 2021

Hi @emiliano.sutil Thanks so much for the solution. I had the exact same problem like you as I built my VMs using vagrant and ended up having 2 NICs. I used the same kubeadm init command that you mentioned above and it works for me. let me play around more on how to add these configuration into kubeadm config yaml file.

Thanks
Thin

torin42 · July 2021

I had the same problem with Virtualbox running Ubuntu 20.04. Both of the VMs have 2 network interfaces, a NAT for internet and an Internal Network for communicating with each other. Since the NAT is the first interface, kubadm picked that IP and caused the issue. @emiliano.sutil 's solution worked perfectly to solve the problem.

caishaoping · December 2021

@emiliano.sutil Thanks so much for your post. I also used Virtual box with Vagrant with 2 interfaces also defined (enp0s3@10.0.2.15 and enp0s8@192.168.5.11). I tried to run kubeadm init with predefined config file, but my pod from worker-1 node is always "CrashLoopBackOff".

After running following init command, my work-1 node has successfully joined control nodel.
kubeadm init --apiserver-advertise-address=192.168.5.11 --apiserver-cert-extra-sans=192.168.5.11 --node-name master-1 --pod-network-cidr=192.168.0.0/16

I definitely would give it a try on kubeadm config yaml file, probably some file like following:
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.21.1
controlPlaneEndpoint: "k8scp:6443"
networking:
podSubnet: 192.168.0.0/16
localAPIEndpoint:
advertiseAddress: 192.168.5.11
bindPort: 443

Thanks
Shao

wanch · February 2022

After reading the kubeadm init documentation and kubeadm configuration for v1beta2, comparing with the above working kubeadm init command and testing, I managed to create a kubeadm-config.yaml that appears to be working.

apiVersion: kubeadm.k8s.io/v1beta2
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 10.128.0.100
  bindPort: 6443
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.21.1
controlPlaneEndpoint: "k8scp:6443"
networking:
  podSubnet: 192.168.0.0/16

where 10.128.0.100 is the IP of the second network interface of my Vagrant VirtualBox VM, In the /etc/hosts, k8scp is mapped to that IP.

Lab 3.3. - 3.4. calico/node is not ready

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)