Can not remove node.kubernetes.io/not-ready:NoSchedule taint

oliveriom · December 2018

In Chapter 2 .1 Labs/Exercise 2.1: Deploy a New Cluster I am trying to remove the node.kubernetes.io/not-ready:NoSchedule taint but no success.

I ran the kubectl taint nodes --all node.kubernetes.io/not-ready- many times and get

node/kubemaster untainted
node/kubeworker untainted

as the output. But then when I run kubectl describe nodes |grep -i Taint I get

Taints:             node.kubernetes.io/not-ready:NoSchedule
Taints:             node.kubernetes.io/not-ready:NoSchedule

and kubectl get nodes returns NotReady:

NAME         STATUS     ROLES    AGE   VERSION
kubemaster   NotReady   master   78m   v1.12.1
kubeworker   NotReady   <none>   76m   v1.12.1

oliveriom · December 2018

Resetting it with sudo kubadm reset and re-running the whole config fixed it.

serewicz · December 2018

Thank you for the update.

There seems to be a bit of a feature that it takes about a minute between attempts for the taint to be fully removed. I believe in the notes it mentions this feature in the 1.12.x versions.

Regards,

chrispokorni · December 2018

@oliveriom
A few times I ran into similar behavior, but most times the taint removal worked as expected. When it did not, a reset and reconfiguration worked without any issues.
-Chris

madhu91s · February 2023

I have this problem right now and I did a a "sudo kubeadm reset". But I now cannot run the k8scp.sh file to set up the master again.
student@master:~$ rm cp.out
student@master:~$ bash k8scp.sh | tee $HOME/cp.out
WARNING!
/k8scp_run exists. Script has already been run on control plane.
Can someone please help me?

[Edit] I deleted the file /k8scp_run and ran the bash script again, but the problem still persists.

chrispokorni · February 2023

Hi @madhu91s,

Unlike the control-plane and master taints, the not-ready taint you are seeing is not removable. It is placed on nodes as result of misconfiguration - it simply means that none of the nodes are ready to run control plane or worker tasks. Once the issues are fixed, the taints will automatically be lifted and the nodes will reach the ready status.

What are the outputs of

kubectl get nodes -o wide

kubectl get pods -A -o wide

Also, where are you running your cluster - on cloud VMs, local VMs, what are the sizes of your VMs, what guest OS is running your VMs?
Did you happen o watch the demo videos from the intro chapter that show the VM provisioning process together with all recommended network settings?

Regards,
-Chris

madhu91s · February 2023

Hello Chris, thank you for such a quick response. I am also just going through the other posts in this forum as I am new here. You mentioned somewhere that one could delete the core-dns pod. I did that but the container is still in "ContainerCreating" state. The exact result is -
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m10s (x3 over 13m) default-scheduler 0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Normal Scheduled 29s default-scheduler Successfully assigned kube-system/coredns-565d847f94-cjcsv to master
Warning FailedMount 14s (x6 over 29s) kubelet MountVolume.SetUp failed for volume "config-volume" : object "kube-system"/"coredns" not registered
Warning NetworkNotReady 1s (x15 over 29s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

I am doing exactly as what is instructed in the video, I currently have set up my lab on console.cloud.google.com with the same configuration mentioned by the instructor. I skipped the video explaining setting up of lab on AWS since I am not using that environment.

[Edit] I did not disable the apparmor. I probably missed this as I did not watch the AWS set up video. Do you recommend me to do this?

chrispokorni · February 2023

Hi @madhu91s,

Please run the following commands on your control plane node as the student user, and then provide the outputs from the commands requested in my previous response:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

Regards,
-Chris

madhu91s · February 2023

error: unable to read URL "https://docs.projectcalico.org/manifests/calico.yaml", server reported 404 Not Found, status code=404

I figured out what the problem was -
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

this fixed my problem. Thank you!

kanchana0808 · October 2023

@chrispokorni said:
Hi @madhu91s,

Please run the following commands on your control plane node as the student user, and then provide the outputs from the commands requested in my previous response:

kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

Regards,
-Chris

Hi Chris,
I am on the below versions and setting up the cluster behind proxy.
Ubuntu 20.04.6 LTS
docker v20.10.13
containerd v1.6.24
kubelet kubeadm kubectl v1.26.2

After kubeadm init, coredns pods went to pending state as expected.
Once I tainted control-plane, coredns pods went to ContainerCreating.
Setup calico network using the below yamls before,
https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/custom-resources.yaml
https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/tigera-operator.yaml
Now, tried https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml
Ran, kubectl taint nodes --all node.kubernetes.io/not-ready:NoSchedule-
Still, calico and coredns pods showing as ContainerCreating only and never start running.

Logs for your reference:
kubectl describe nodes

kubectl describe pod calico-kube-controllers-57b57c56f-lcn74 -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 5m56s (x2 over 11m) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Normal Scheduled 94s default-scheduler Successfully assigned kube-system/calico-kube-controllers-57b57c56f-lcn74 to master
Warning NetworkNotReady 47s (x25 over 94s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

kubectl describe pod coredns-787d4945fb-4qjzj -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 12m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
Normal Scheduled 11m default-scheduler Successfully assigned kube-system/coredns-787d4945fb-4qjzj to master
Warning FailedMount 11m (x7 over 11m) kubelet MountVolume.SetUp failed for volume "config-volume" : object "kube-system"/"coredns" not registered
Warning NetworkNotReady 98s (x302 over 11m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

Could you please help here?

chrispokorni · October 2023

Hi @kanchana0808,

The recommendations for the Calico network plugin installation were made for an earlier release of the training material.
Since then, the training material has been updated to the Cilium network plugin. Please download the latest release of the lab guide for the most up to date installations and configuration instructions.

The "not-ready" taint is assigned to nodes that do not satisfy the node readiness conditions for scheduling purposes, and it cannot be simply removed. From the warnings visible in the events, the network plugin seems to be the main reason for the taint and the noted failures. In order to determine why the network plugin fails, please provide details about the infrastructure hosting your cluster, such as cloud VM service or local hypervisor, VM size (CPU, MEM, disk), how many network interfaces per VM and the attached network type (nat, host, bridged,...), the VM IP address, any firewall rules protecting ingress/egress traffic of the VMs.

You may inspect the kubelet, containerd and/or docker service logs with journalctl, to uncover additional error and failure messages:

journalctl -u kubelet

Regards,
-Chris

kanchana0808 · December 2023

Hi Chris,
Thank you very much for your explanation. It helps me a lot.

Would like to close my issue with the workaround's I tried. Actually, my nodes were behind the proxy. So docker daemon and containerd requires specific proxy settings in /etc/systemd/system/docker.service.d/http-proxy.conf and /etc/systemd/system/containerd.service.d/http-proxy.conf. Additionally, I have configured proxy for docker in ~/.docker/config.json. Kubernetes pod CIDR and Calico IPs (10.244.0.0/16,10.96.0.0/12) are passed as no_proxy to avoid https go via proxy. With this, coredns pod ContainerCreating issue got resolved.

Can not remove node.kubernetes.io/not-ready:NoSchedule taint

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)