Welcome to the Linux Foundation Forum!

Can not remove node.kubernetes.io/not-ready:NoSchedule taint

oliveriom
oliveriom Posts: 5
edited December 2018 in LFD259 Class Forum

In Chapter 2 .1 Labs/Exercise 2.1: Deploy a New Cluster I am trying to remove the node.kubernetes.io/not-ready:NoSchedule taint but no success.

I ran the kubectl taint nodes --all node.kubernetes.io/not-ready- many times and get

node/kubemaster untainted
node/kubeworker untainted

as the output. But then when I run kubectl describe nodes |grep -i Taint I get

Taints:             node.kubernetes.io/not-ready:NoSchedule
Taints:             node.kubernetes.io/not-ready:NoSchedule

and kubectl get nodes returns NotReady:

NAME         STATUS     ROLES    AGE   VERSION
kubemaster   NotReady   master   78m   v1.12.1
kubeworker   NotReady   <none>   76m   v1.12.1

Comments

  • Resetting it with sudo kubadm reset and re-running the whole config fixed it.

  • serewicz
    serewicz Posts: 1,000

    Thank you for the update.

    There seems to be a bit of a feature that it takes about a minute between attempts for the taint to be fully removed. I believe in the notes it mentions this feature in the 1.12.x versions.

    Regards,

  • @oliveriom
    A few times I ran into similar behavior, but most times the taint removal worked as expected. When it did not, a reset and reconfiguration worked without any issues.
    -Chris

  • madhu91s
    madhu91s Posts: 8
    edited February 2023

    I have this problem right now and I did a a "sudo kubeadm reset". But I now cannot run the k8scp.sh file to set up the master again.
    student@master:~$ rm cp.out
    student@master:~$ bash k8scp.sh | tee $HOME/cp.out
    WARNING!
    /k8scp_run exists. Script has already been run on control plane.
    Can someone please help me?

    [Edit] I deleted the file /k8scp_run and ran the bash script again, but the problem still persists.

  • Hi @madhu91s,

    Unlike the control-plane and master taints, the not-ready taint you are seeing is not removable. It is placed on nodes as result of misconfiguration - it simply means that none of the nodes are ready to run control plane or worker tasks. Once the issues are fixed, the taints will automatically be lifted and the nodes will reach the ready status.

    What are the outputs of

    kubectl get nodes -o wide

    kubectl get pods -A -o wide

    Also, where are you running your cluster - on cloud VMs, local VMs, what are the sizes of your VMs, what guest OS is running your VMs?
    Did you happen o watch the demo videos from the intro chapter that show the VM provisioning process together with all recommended network settings?

    Regards,
    -Chris

  • madhu91s
    madhu91s Posts: 8
    edited February 2023

    Hello Chris, thank you for such a quick response. I am also just going through the other posts in this forum as I am new here. You mentioned somewhere that one could delete the core-dns pod. I did that but the container is still in "ContainerCreating" state. The exact result is -
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedScheduling 3m10s (x3 over 13m) default-scheduler 0/2 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
    Normal Scheduled 29s default-scheduler Successfully assigned kube-system/coredns-565d847f94-cjcsv to master
    Warning FailedMount 14s (x6 over 29s) kubelet MountVolume.SetUp failed for volume "config-volume" : object "kube-system"/"coredns" not registered
    Warning NetworkNotReady 1s (x15 over 29s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

    I am doing exactly as what is instructed in the video, I currently have set up my lab on console.cloud.google.com with the same configuration mentioned by the instructor. I skipped the video explaining setting up of lab on AWS since I am not using that environment.

    [Edit] I did not disable the apparmor. I probably missed this as I did not watch the AWS set up video. Do you recommend me to do this?

  • chrispokorni
    chrispokorni Posts: 2,349
    edited February 2023

    Hi @madhu91s,

    Please run the following commands on your control plane node as the student user, and then provide the outputs from the commands requested in my previous response:

    kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

    Regards,
    -Chris

  • error: unable to read URL "https://docs.projectcalico.org/manifests/calico.yaml", server reported 404 Not Found, status code=404

    I figured out what the problem was -
    kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

    this fixed my problem. Thank you!

  • kanchana0808
    kanchana0808 Posts: 2
    edited October 2023

    @chrispokorni said:
    Hi @madhu91s,

    Please run the following commands on your control plane node as the student user, and then provide the outputs from the commands requested in my previous response:

    kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.25.0/manifests/calico.yaml

    Regards,
    -Chris

    Hi Chris,
    I am on the below versions and setting up the cluster behind proxy.
    Ubuntu 20.04.6 LTS
    docker v20.10.13
    containerd v1.6.24
    kubelet kubeadm kubectl v1.26.2

    Logs for your reference:
    kubectl describe nodes

    kubectl describe pod calico-kube-controllers-57b57c56f-lcn74 -n kube-system
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedScheduling 5m56s (x2 over 11m) default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
    Normal Scheduled 94s default-scheduler Successfully assigned kube-system/calico-kube-controllers-57b57c56f-lcn74 to master
    Warning NetworkNotReady 47s (x25 over 94s) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

    kubectl describe pod coredns-787d4945fb-4qjzj -n kube-system
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedScheduling 12m default-scheduler 0/1 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/not-ready: }. preemption: 0/1 nodes are available: 1 Preemption is not helpful for scheduling..
    Normal Scheduled 11m default-scheduler Successfully assigned kube-system/coredns-787d4945fb-4qjzj to master
    Warning FailedMount 11m (x7 over 11m) kubelet MountVolume.SetUp failed for volume "config-volume" : object "kube-system"/"coredns" not registered
    Warning NetworkNotReady 98s (x302 over 11m) kubelet network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized

    Could you please help here?

  • chrispokorni
    chrispokorni Posts: 2,349

    Hi @kanchana0808,

    The recommendations for the Calico network plugin installation were made for an earlier release of the training material.
    Since then, the training material has been updated to the Cilium network plugin. Please download the latest release of the lab guide for the most up to date installations and configuration instructions.

    The "not-ready" taint is assigned to nodes that do not satisfy the node readiness conditions for scheduling purposes, and it cannot be simply removed. From the warnings visible in the events, the network plugin seems to be the main reason for the taint and the noted failures. In order to determine why the network plugin fails, please provide details about the infrastructure hosting your cluster, such as cloud VM service or local hypervisor, VM size (CPU, MEM, disk), how many network interfaces per VM and the attached network type (nat, host, bridged,...), the VM IP address, any firewall rules protecting ingress/egress traffic of the VMs.

    You may inspect the kubelet, containerd and/or docker service logs with journalctl, to uncover additional error and failure messages:

    journalctl -u kubelet

    Regards,
    -Chris

  • kanchana0808
    kanchana0808 Posts: 2
    edited December 2023

    Hi Chris,
    Thank you very much for your explanation. It helps me a lot.

    Would like to close my issue with the workaround's I tried. Actually, my nodes were behind the proxy. So docker daemon and containerd requires specific proxy settings in /etc/systemd/system/docker.service.d/http-proxy.conf and /etc/systemd/system/containerd.service.d/http-proxy.conf. Additionally, I have configured proxy for docker in ~/.docker/config.json. Kubernetes pod CIDR and Calico IPs (10.244.0.0/16,10.96.0.0/12) are passed as no_proxy to avoid https go via proxy. With this, coredns pod ContainerCreating issue got resolved.

Categories

Upcoming Training