Exercise 3.3 - Worker node not ready

jhurlstone · March 2023

I have built the lab in Google Cloud as suggested, and I believe I have followed the instructions to the letter, however after I joined my worker node to my cluster (Exercise 3.2) the worker node is in a "NotReady" state and has been like that for a few hours, I think its the network connectivity between the nodes as when I try to ping one from the other I get "No route to host" yet, if I deploy a 3rd VM on the same network they can both ping the 3rd VM and the 3rd VM can ping the other 2, cheers.

chrispokorni · March 2023

Hi @jhurlstone,

Are all VMs in the same custom VPC, same subnet, and the VPC firewall rule allows all inbound traffic as per the demo video from the introductory chapter?

Regards,
-Chris

jhurlstone · March 2023

I believe that I followed the video exactly. I have given both the master & worker VM's a reboot and they can now ping each other, and I have been running "kubectl get nodes" on the master periodically and very occasionally it will report that the worker is ready, then a few seconds later it goes back to being "NotReady".

chrispokorni · March 2023

Hi @jhurlstone,

What are the outputs of

kubectl get nodes -o wide

kubectl get pods -A -o wide

Regards,
-Chris

jhurlstone · March 2023

get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 5h37m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker NotReady 3h44m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18

kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system calico-kube-controllers-74677b4c5f-p9c9c 1/1 Running 1 (128m ago) 5h29m 10.2.219.69 master
kube-system calico-node-qhv2t 0/1 Running 1 (30m ago) 40m 10.2.0.7 worker
kube-system calico-node-tv5cx 0/1 Running 1 (128m ago) 5h29m 10.2.0.6 master
kube-system coredns-565d847f94-885qq 1/1 Running 1 (128m ago) 5h38m 10.2.219.68 master
kube-system coredns-565d847f94-rc2l2 1/1 Running 1 (128m ago) 5h38m 10.2.219.70 master
kube-system etcd-master 1/1 Running 1 (128m ago) 5h38m 10.2.0.6 master
kube-system kube-apiserver-master 1/1 Running 1 (128m ago) 5h38m 10.2.0.6 master
kube-system kube-controller-manager-master 1/1 Running 1 (128m ago) 5h38m 10.2.0.6 master
kube-system kube-proxy-2xfv5 1/1 Running 4 (30m ago) 3h44m 10.2.0.7 worker
kube-system kube-proxy-rlsxq 1/1 Running 1 (128m ago) 5h38m 10.2.0.6 master
kube-system kube-scheduler-master 1/1 Running 1 (128m ago) 5h38m 10.2.0.6 master
root@master:~#

chrispokorni · March 2023

Hi @jhurlstone,

What is the machine type of your GCE VMs?

Are you running kubectl as root? Why?

Regards,
-Chris

jhurlstone · March 2023

I may have spotted a typo in my installation, I did not replace the "controlPlaneEndpoint: "k8scp:6443" with the actual hostname when I used the "kubeadm-config.yaml" my hostname for the controlplane VM is "master".

chrispokorni · March 2023

You don't have to replace it. As long as the alias is in the /etc/hosts file, it should all work. Also, make sure you have the correct control plane node IP in both cp and worker /etc/hosts files

jhurlstone · March 2023

The machine types are "e2-standard-2".

chrispokorni · March 2023

Hi @jhurlstone,

Since you modified your installation and configured root with kubectl (not a good practice), what other changes have you made?

When running the following commands:

kubectl describe node worker

kubectl -n kube-system describe pod calico-node-tv5cx

What are the Events at the very bottom of both outputs?

Regards,
-Chris

jhurlstone · March 2023

I have double checked the etc/hosts which contain the same entry on both nodes "10.2.0.6 k8scp" and have just run multiple kubectl get nodes -o wide and as you can see the "worker" does occasionally come "Ready" then flicks back to "NotReady"

student@master:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 6h1m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker Ready 4h8m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
student@master:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 6h1m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker Ready 4h8m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
student@master:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 6h2m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker Ready 4h8m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
student@master:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 6h2m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker NotReady 4h9m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
student@master:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 6h2m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker NotReady 4h9m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
student@master:~$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master Ready control-plane 6h2m v1.25.1 10.2.0.6 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
worker NotReady 4h9m v1.25.1 10.2.0.7 Ubuntu 20.04.5 LTS 5.15.0-1030-gcp containerd://1.6.18
student@master:~$

jhurlstone · March 2023

kubectl describe node worker

most recent events

Normal Starting 8m57s kubelet Starting kubelet.
Warning InvalidDiskCapacity 8m57s kubelet invalid capacity 0 on image filesystem
Normal NodeAllocatableEnforced 8m54s kubelet Updated Node Allocatable limit across pods
Normal NodeHasNoDiskPressure 7m35s (x7 over 8m57s) kubelet Node worker status is now: NodeHasNoDiskPressure
Normal NodeHasSufficientPID 7m35s (x7 over 8m57s) kubelet Node worker status is now: NodeHasSufficientPID
Normal RegisteredNode 6m50s node-controller Node worker event: Registered Node worker in Controller
Normal NodeHasSufficientMemory 3m10s (x10 over 8m57s) kubelet Node worker status is now: NodeHasSufficientMemory
Normal NodeNotReady 2m30s (x2 over 6m10s) node-controller Node worker status is now: NodeNotReady

kubectl -n kube-system describe pod calico-node-tv5cx

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 15m (x396 over 144m) kubelet (combined from similar events): Readiness probe failed: 2023-03-14 16:55:26.722 [INFO][25020] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Normal SandboxChanged 9m33s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 9m33s kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 9m33s kubelet Created container upgrade-ipam
Normal Started 9m33s kubelet Started container upgrade-ipam
Normal Pulled 9m31s kubelet Container image "docker.io/calico/cni:v3.25.0" already present on machine
Normal Created 9m31s kubelet Created container install-cni
Normal Started 9m31s kubelet Started container install-cni
Normal Pulled 9m27s kubelet Container image "docker.io/calico/node:v3.25.0" already present on machine
Normal Created 9m27s kubelet Created container mount-bpffs
Normal Started 9m27s kubelet Started container mount-bpffs
Normal Pulled 9m26s kubelet Container image "docker.io/calico/node:v3.25.0" already present on machine
Normal Created 9m26s kubelet Created container calico-node
Normal Started 9m26s kubelet Started container calico-node
Warning Unhealthy 9m25s kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning Unhealthy 9m23s (x2 over 9m24s) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Warning Unhealthy 9m13s kubelet Readiness probe failed: 2023-03-14 17:01:32.575 [INFO][248] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 9m3s kubelet Readiness probe failed: 2023-03-14 17:01:42.940 [INFO][272] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 5m23s kubelet Readiness probe failed: 2023-03-14 17:05:22.612 [INFO][946] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 2m13s kubelet Readiness probe failed: 2023-03-14 17:08:32.587 [INFO][1457] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 2m3s kubelet Readiness probe failed: 2023-03-14 17:08:42.513 [INFO][1489] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 113s kubelet Readiness probe failed: 2023-03-14 17:08:52.532 [INFO][1515] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 113s kubelet Readiness probe failed: 2023-03-14 17:08:52.765 [INFO][1537] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7
Warning Unhealthy 93s (x2 over 103s) kubelet (combined from similar events): Readiness probe failed: 2023-03-14 17:09:12.519 [INFO][1578] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.0.7

chrispokorni · March 2023

Hi @jhurlstone,

I would recommend revisiting the infra provisioning steps and follow the custom VPC, subnet, and firewall config as presented in the video. Once fixed, the BGP should be established with the two calico-node pods 1/1 Running, and the worker should become Ready as well.

Regards,
-Chris

jhurlstone · March 2023

Hi @chrispokorni

I have run through the video and checked all the settings are the same with the following exceptions, "Region" (I have chosen a region local to me in the UK) and in the firewall setup the instructor has selected "IP Ranges" as the source filter, that is not an available option so I have chosen "IPv4 ranges", these are the only differences I can see, as for routing there is an option for "Dynamic routing mode" which my setup is configured identical to the video as in set to "Regional".

Many thanks for your ongoing assistance with this.

jhurlstone · March 2023

Just to let you know I have got it working by following the instructions on this link

https://www.unixcloudfusion.in/2022/02/solved-caliconode-is-not-ready-bird-is.html

Cheers

chrispokorni · March 2023

Hi @jhurlstone,

Thank you for posting the solution that worked for you.
Unfortunately, I was not able to reproduce the issue you reported therefore I could not test the suggested solution either, but will keep it in mind for the future.

Regards,
-Chris

jhurlstone · March 2023

Hi Hi @chrispokorni

Another thought was that when I built my VM's following the instructions in the video and selected "Ubuntu 20.04 LTS" I was forced to select x86/64 architecture, whether this made a difference to the ability of Calico to identify the Ethernet card property ("eth" or "ens") would be a guess, at the moment I am just happy I could resolve the issue and continue with the training, cheers Jonathan.

Exercise 3.3 - Worker node not ready

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)