Welcome to the new Linux Foundation Forum!

lab2.1 kubectl untainted not working

raghava.cherukuraghava.cheruku Posts: 7
edited July 2018 in LFD259 Class Forum

Hi,

 

can anyone explain the mistake here ?. I searched in  https://kubernetes.io/docs  for some hint on this "untainted" 

but no informaiton found. 

[email protected]:~$ kubectl taint nodes --all node-role.kubernetes.io/master-node “k8master” untainted

error: at least one taint update is required

[email protected]:~$

however, I went to next command deploying the firstpod, and it got deployed on my k8sseond node. 

can someone explain howto make this untained work?

regards

Raghava

 

Comments

  • chrispokornichrispokorni Posts: 388
    edited July 2018

    Hi, 

    When you use kubeadm to build a cluster with master and minion nodes, the master is tainted in order to prevent the cluster from scheduling pods on the master. Since we are only using a few nodes for these labs, in Lab 2 we remove that taint from the master, allowing the cluster to schedule pods on the master as well. 

    The lab manual (and I hope we are both looking at the same version/content) suggests using the following command in order to remove the taint (to untaint the master):


    kubectl taint nodes --all node-role.kubernetes.io/master-

    (note the "-" at the end of the command, it instructs kubectl to "remove" the taint)

    and then the output is the following:


    node “ckad-1” untainted taint "node-role.kubernetes.io/master:" not found

    where the first line of the output is a confirmation of the node "ckad-1" (master) being successfully untainted, and the second line is the attempt to untaint the second node, but no taint is being found (note the "--all" option used above, which instructs kubectl to remove the taint from "all" nodes, both master and minions).

    From the official documentation I found this page:

    https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/

    and there are quite a few hits if you search for "taint". Searching for "untaint" will not return many results.

    Good luck!

    -Chris 

  • I know why you made the mistake. The lab script is really confusing - I made the same mistake...

    The steps make you think you have to type this command:
    kubectl taint nodes --all node-role.kubernetes.io/master-node “ckad-1” untainted

    But the last part is actually the output. You only type this:
    kubectl taint nodes --all node-role.kubernetes.io/master-

    And the output is:
    node/master untainted
    error: taint "node-role.kubernetes.io/master:" not found

    All in all the lab doco is pretty bad I think.

  • bixuebixue Posts: 6

    Hi i was unable to remove the taint node.kubernetes.io/not-ready on my minion node.

    command:
    kubectl taint nodes --all node.kubernetes.io/not-ready:NoSchedule-
    result:
    node/pc1-node2 untainted
    error: taint "node.kubernetes.io/not-ready:NoSchedule" not found

    I have ran this for thousands of times, but every time it prints me the same output. Then i ran

    command:
    kubectl describe nodes | grep -i taint
    result:
    Taints:
    Taints: node.kubernetes.io/not-ready:NoSchedule

    The taint is so stubborn that it does not go away. Could someone tell me how can i debug this issue?

    Regards,
    Bin

  • chrispokornichrispokorni Posts: 388
    edited November 2018

    Hi Bin,
    "NoSchedule" is the effect of the taint and it should not be a part of the taint removal command.
    Look closely at the lab exercise, and compare with your command provided above.

    -Chris

  • serewiczserewicz Posts: 553

    I think if you read the paragraph of the step, you should find a sentence "Note the minus sign (-) at the end, which is the syntax to remove a taint". As well there is an extra space to indicate the end of the command and the output.
    Regards,

  • bixuebixue Posts: 6
    edited November 2018

    Hi Chris,

    Apologies, i tried both
    command
    kubectl taint nodes --all node.kubernetes.io/not-ready-

    and

    kubectl taint nodes --all node.kubernetes.io/not-ready:NoSchedule-

    neither worked :(

    Both of them returned
    node/pc1-node2 untainted
    error: taint "node.kubernetes.io/not-ready:" not found

    It's strange that the command returned success but the taint is actually not removed.

    Regards,
    Bin

  • bixuebixue Posts: 6

    Hi serewicz,

    I had the minus sign(-) at the end, what do you mean by the extra space? I tried

    kubectl taint nodes --all node.kubernetes.io/not-ready-

    with an extra space at the end of the command, but the result is

    node/pc1-node2 untainted
    error: taint "node.kubernetes.io/not-ready:" not found

    Regards,
    Bin

  • chrispokornichrispokorni Posts: 388
    edited November 2018

    Hi Bin,
    From both responses above the very first command seems to be correct.
    The lab instructions mention that it takes a while and a few attempts for the taint removal to be successful.
    What is worth noting that once the command is successful and the taint is removed, any subsequent attempt would produce that same output - the error that the taint is not found.
    Did you verify that the taints are still there or have been removed?
    Run the first command in step 12. Can you provide that output?

    Thanks,
    -Chris

  • serewiczserewicz Posts: 553

    Hello Bin,

    The extra space is seen between the command and the output in the book. Typically command output is on the very next line, but the book shows an extra space to help illustrate the the last character to be typed would be the minus sign (-), and what follows in the book is output.

    From the output you posted, showing

    node/pc1-node2 untainted

    error: taint "node.kubernetes.io/not-ready:" not found

    Indicates the taint was removed from the first node, but was not on the second node. If you had three nodes in your cluster you would three lines of output. If the taint has not been set on a node you would get a "not found" output. If it is set on the node you would see the "untainted" output.

    Regards,

  • bixuebixue Posts: 6

    Hi Chris,

    The output is
    Taints:
    Taints: node.kubernetes.io/not-ready:NoSchedule

    I have been running
    kubectl taint nodes --all node.kubernetes.io/not-ready-
    since yesterday and the output is always like above

    Regards,
    Bin

  • bixuebixue Posts: 6
    edited November 2018

    Hi Serewicz,

    Thank you very much for the detailed explanation. I totally understand what the book is trying to say. It's just that my setup seems not producing the expected result. Btw, I'm using virtualbox on mac with 2 nodes of ubuntu 16.04

    Regards,
    Bin

  • bixuebixue Posts: 6

    Hi Guys,

    Thanks for the help. I reset the cluster and set it up again and everything starts working. Appreciate your time:)

    Regards,
    Bin

  • susersuser Posts: 10

    Hello
    I have the same problem here not being able to untaint my nodes during lab 2.2. I use qemu VM on proxmox. The only solution for this problem is realy to reset the whole cluster?
    I tried many times wating long minutes, I typed it with no extra spaces.

    kubectl get node
    NAME STATUS ROLES AGE VERSION
    kmaster NotReady master 25h v1.17.1
    kw1 NotReady 23h v1.17.1

    kubectl taint nodes --all node.kubernetes.io/not-ready-
    node/kmaster untainted
    node/kw1 untainted

    kubectl get node
    NAME STATUS ROLES AGE VERSION
    kmaster NotReady master 25h v1.17.1
    kw1 NotReady 23h v1.17.1

    kubectl describe nodes | grep -i taints
    Taints: node.kubernetes.io/not-ready:NoSchedule
    Taints: node.kubernetes.io/not-ready:NoSchedule

    Regards,
    Stefan

  • chrispokornichrispokorni Posts: 388

    Hello Stefan,

    The taints found on your nodes are generated by the cluster to indicate that the nodes are not ready, and they will automatically be removed once your nodes become ready. Issuing kubectl describe node <node-name> command may indicate why your nodes are not ready. Please provide the output of that command and the output of kubectl get pods --all-namespaces.

    Those outputs may help us troubleshoot your cluster.

    Regards,
    -Chris

  • susersuser Posts: 10
    edited March 30

    Thanks Chris,

    Meanwhile I ran kubeadm reset and I re-created the master node only so far, and I am still unable to untaint my kmaster node.
    The output of kubectl get pods --all-namespaces

    [email protected]:~$ kubectl describe nodes | grep -i taints
    Taints: node.kubernetes.io/not-ready:NoExecute
    [email protected]:~$ kubectl describe nodes | grep -i taints
    Taints: node.kubernetes.io/not-ready:NoExecute
    [email protected]:~$ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    kmaster NotReady master 8m25s v1.17.1
    [email protected]:~$ kubectl get pods --all-namespaces
    NAMESPACE NAME READY STATUS RESTARTS AGE
    kube-system coredns-6955765f44-rvdv7 0/1 Pending 0 8m13s
    kube-system coredns-6955765f44-sszvz 0/1 Pending 0 8m13s
    kube-system etcd-kmaster 1/1 Running 0 8m8s
    kube-system kube-apiserver-kmaster 1/1 Running 0 8m8s
    kube-system kube-controller-manager-kmaster 1/1 Running 0 8m8s
    kube-system kube-proxy-z24hm 1/1 Running 0 8m13s
    kube-system kube-scheduler-kmaster 1/1 Running 0 8m8s
    [email protected]:~$

    My attempts to untaint keep reading:

    [email protected]:~$ kubectl taint nodes --all node.kubernetes.io/not-ready-
    node/kmaster untainted
    [email protected]:~$ kubectl describe nodes | grep -i taints
    Taints: node-role.kubernetes.io/master:NoSchedule

    I also attached the output of kubectl describe node kmaster command

    Thanks in advance!

    Stefan

  • chrispokornichrispokorni Posts: 388
    edited March 30

    Hi Stefan,

    It seems your coredns pods are not running. That is the reason why your node never reaches ready state. Delete both coredns pods and allow the cluster to re-create them for you, while keeping an eye on their state. Once they show a running state, check your master node again. It should now show ready.

    Regards,
    -Chris

  • serewiczserewicz Posts: 553

    Hello,

    If you notice your coredns pods are both showing as pending. This would cause the node to be listed as NoSchedule.

    What environment are you using to run the labs? GCE, AWS, DigitalOcean?

    When you run kubectl describe for one of the coredns pods, what are the messages in the output at the end?

    The issue is these pods, not the taint.

    Regards,

  • susersuser Posts: 10

    Hi Chris,

    I use Qemu VM on local proxmox type 1 hypervisor with plenty of resources, latency shouldn't be an issue.

    [email protected]:~$ sudo kubectl describe coredns-6955765f44-rvdv7
    error: the server doesn't have a resource type "coredns-6955765f44-rvdv7"

    Now the pods show a different status:

    [email protected]:~$ kubectl get pods --all-namespaces
    NAMESPACE NAME READY STATUS RESTARTS AGE
    kube-system coredns-6955765f44-rvdv7 0/1 ContainerCreating 0 89m
    kube-system coredns-6955765f44-sszvz 0/1 ContainerCreating 0 89m
    kube-system etcd-kmaster 1/1 Running 0 89m
    kube-system kube-apiserver-kmaster 1/1 Running 0 89m
    kube-system kube-controller-manager-kmaster 1/1 Running 0 89m
    kube-system kube-proxy-dqb6x 1/1 Running 0 38m
    kube-system kube-proxy-z24hm 1/1 Running 0 89m
    kube-system kube-scheduler-kmaster 1/1 Running 0 89m

    Stefan

  • chrispokornichrispokorni Posts: 388

    Coredns pods not running has nothing to do with latency issues. They cannot run because they never receive IP addresses, which should be provided by calico. You have no calico pods running. Did calico get downloaded together with the required rbac file? Have the calico pods been started and the rbac permissions created?

    Regards,
    -Chris

  • susersuser Posts: 10

    Chris,

    I ran the init command as set on lab 2.2 file "sudo kubeadm init --kubernetes-version 1.17.1 --pod-network-cidr 192.168.0.0/16", but VM IP is 10.1.10.30, could this be the issue?

    Stefan

  • susersuser Posts: 10
    edited March 31

    Chris,
    I ran again all the Lab2.2 script k8sMaster.sh step by step and I discovered that the lines
    wget --no-check-certificate https://tinyurl.com/yb4xturm -O rbac-kdd.yaml
    and
    wget --no-check-certificate https://tinyurl.com/y2vqsobb -O calico.yaml
    they do not download any yaml file, but some GIF files instead
    therefore the lines
    kubectl apply -f rbac-kdd.yaml
    kubectl apply -f calico.yam
    generates wrong format related errors and I am not able to correctly setup the environment using given materials.
    Can you help with the correct files rbac-kdd.yaml and calico.yaml?

    Thank you!

  • chrispokornichrispokorni Posts: 388

    It seems to be a strange behavior, which I was not able to reproduce. However, the script in the Solutions tarball does not use the --no-check-certificate option.

    It could be that your instances do not correctly resolve the URLs. Did you check the resolv.conf files of your instances?

    If all else fails, just download the correct files by clicking on the two working links above and then create the yaml files manually.

    Regards,
    -Chris

  • serewiczserewicz Posts: 553

    Hello Stefan,

    Please share, what operating system are you using?

    Where are you running the labs? GCE, AWS, Digital Ocean?

    Regards,

  • susersuser Posts: 10
    edited March 31

    Hi Chris,

    I run the labs on VM which run on local hypervisor (proxmox as I mentioned), I do not use any vendor you mentioned.
    I use ubuntu 18 LTS OS on VM.

    Stefan

  • susersuser Posts: 10

    @chrispokorni Hello, I have to add the flag --no-check-certificate because the site https://tinyurl.com/y2vqsobb uses a self signed certificate, and I cannot connect without this flag from my end and I guess this is the normal default behavior. Maybe the class materials are not good.

    Stefan

  • serewiczserewicz Posts: 553
    edited March 31

    Hello,

    Thank you for letting us know what service and OS are using for the labs, i must have missed that before. I'm not familiar with proxmox. But I'll take a look. In my experience when these sorts of issue happen it ends up being some feature or security which blocks the appropriate messages from being sent.

    When the lab is run using GCE, AWS, Digital Ocecan, VirtualBox, QEMU/KVM, and bare metal it works as written. This would lead me to believe there is something unknown inside of promox which is causing the issues. With the assumption you are using copy and paste, and not typing URLs by hand, my first guess is something network related. That there is something blocked between nodes, or VMs. My second guess is the hyper-visor is not translating the commands as expected through LXC, or not presenting the network interfaces in an expected manner.

    Are you in a secure and locked-down environment such that you cannot use self-signed certificates?

    Regards,

  • susersuser Posts: 10

    Hello,
    Yes, behind my firewall I cannot accept self signed certificate, but I am fine with adding the flag manually. But yet I am not getting the yaml files this way. Can you supply them here for me?
    @Chris What DNS setting should I use? I can change that. (Currently my VM uses the host dns 1.1.1.1 and 1.0.0.1 through the router).

    Stefan

  • chrispokornichrispokorni Posts: 388
    edited March 31

    Hi Stefan,

    Since you have the links, just open them in a browser, copy the files over and you should be able to produce the yaml manifests in your environment. It seems to be a simple workaround since your environment does not permit the download of such files. Providing them "here" would change the yaml formatting which would not help you in any way.

    So far I have not seen such behavior on GCE, AWS EC2, DigitalOcean, Virtualbox, or even on Minikube. Using different environment variations may imply slightly different configuration and installation options. That is to be expected since each environment treats features differently: networking, firewall rules, provisioning and some of the installation options. Once the cluster is up and running, then all else would work as presented.

    Regards,
    -Chris

  • susersuser Posts: 10

    Thank you very much. Got the files using VPN, unable to allow tinyurl redirections to long good url from my end at this point. I will let you know how it goes.

    Stefan

  • susersuser Posts: 10
    edited March 31

    Thanks @serewicz and @chrispokorni, it just worked smoothly having the yaml files. I never used tinyurl before, please note that it is not suitable for any environment.
    Stefan

Sign In or Register to comment.