Welcome to the Linux Foundation Forum!

LFD259 Lab 2.2 Paragraph 8

Hello - I ran the control plane script on the control plane node. I ran the worker script on the worker node. I ran the grep commend on the cp.out file in the control plane node to get the relevant "join" command. As per the instructions, I copied this script line by line into the terminal of my worker node to join the worker to the cluster. A couple of things. First, I had to move the ip address from where it appeared in the cp.out file. Second, the cp.out file did not contain the \ character in the instructions. If I don't do that then the command doesn't work at all. This is the command I ran (I omit the ip address and some of the SHA key):
sudo kubeadm join --token a4ata2.2xf8dh10foz6w9v7 \ XXX.XX.XX.XX:6443 --discovery-token-ca-cert-hash \ sha256:da9e5763788a3d6193e269125871f9c453c06ca7b234d49bf5a6fdXXXXXXXXXX

When I run it I get these errors:

[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-10250]: Port 10250 is in use
[ERROR HTTPProxy]: parse "https:// 172.31.19.95": invalid URL escape "%20"
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher

I have no idea what is wrong.

Best Answer

  • chrispokorni
    chrispokorni Posts: 2,341
    Answer ✓

    Hi @rasputin312,

    It seems you have four distinct security groups. One inbound sg should be enough, with both EC2 instances sharing the same sg.

    I would recommend watching again the AWS video guide from the introductory chapter of this course, that shows the desired configuration for the lab environment.

    Regards,
    -Chris

Answers

  • chrispokorni
    chrispokorni Posts: 2,341

    Hi @rasputin312,

    The "port in use" error means you ran the sudo kubeadm join command several times in a row. It is recommended to run sudo kubeadm reset on the worker node prior to running the join command once again.

    The "invalid URL" error is caused by a typo in the URL. You need to remove the unnecessary space from the URL.

    The backslash character " \ " is necessary when you want to split a long single-line command into shorter pieces and execute it as a multi-line command. The join command as generated and stored in the cp.out file can be selected in its entirety, then copied and pasted with no need to add (except sudo in front of it) or remove anything from it.

    Regards,
    -Chris

  • Sorry ... got done with work, started this up . .. still doesn't work. This is what is shown in my cp.out file in my control plane node:

    Then you can join any number of worker nodes by running the following on each as root:

    kubeadm join 172.31.19.85:6443 --token a4ata2.2xf8dh10foz6w9v7 \
    --discovery-token-ca-cert-hash sha256:da9e5763788a3d6193e269135871f9c453c06ca7b234d49bf5a6fd3308f80c73

    Now ... if I go to my worker node and literally copy and paste from cp.out it adds a > carat at the left hand side :

    sudo kubeadm join 172.31.19.95:6443 --token a4ata2.2xf8dh10foz6w9v7 \

        --discovery-token-ca-cert-hash sha256:da9e5763788a3d6193e269125871f9c453c06ca7b234d49bf5a6fd3308f80c73 
    

    and then it gives me the port error again. No matter how many times I run sudo kubeadm reset.

    I thought the carat may be the issue, so I combined the commands into one line by removing the backspace and then it tells me I have three arguments and it only accepts one.

    I would point out that the format of the text in cp.out does not exactly match the format in the written instructions. In the written instructions the IP address is after the token not before it. But in cp.out the token is before it.

    Again, have no idea how to move past this.

  • chrispokorni
    chrispokorni Posts: 2,341

    Hi @rasputin312,

    One thing I can suggest is to provision a new worker node, and instead of copy/paste from the cp.out file (to avoid any possible extra characters) to generate a new bootstrap token and a new join command from the cp node as such:

    sudo kubeadm token create --print-join-command

    The output should be an entire join command with no additional characters.

    Regards,
    -Chris

  • I literally just type the above without all this? So step by step: (i) I delete the old instance; (ii) create a new instance; (iii) run the worker script; and (iv) then I just run the above line of script - that is it? I ignore this step below:

    kubeadm join 172.31.19.85:6443 --token a4ata2.2xf8dh10foz6w9v7 \
    --discovery-token-ca-cert-hash sha256:da9e5763788a3d6193e269135871f9c453c06ca7b234d49bf5a6fd3308f80c73

  • chrispokorni
    chrispokorni Posts: 2,341

    Hi @rasputin312,

    You generate a complete join command on the cp node with this command sudo kubeadm token create --print-join-command, then copy the output and paste it on the worker node and run it (with sudo). The join command is not ignored, it is still required to run on the worker node.

    Regards,
    -Chris

  • Thanks for your help but this still doesn't seem to be working. I terminated the old worker node. I set up a new worker node, ran the worker script. I think that worked. Now i ran the join instruction. I get this:

    ubuntu@ip-172-31-23-82:~$ sudo kubeadm join 172.31.19.95:6443 --token a4ata2.2xf8dh10foz6w9v7 --discovery-token-ca-cert-hash sha256:da9e5763788a3d6193e269125871f9c453c06ca7b234d49bf5a6fd3308f80c73
    [preflight] Running pre-flight checks
    error execution phase preflight: couldn't validate the identity of the API Server: Get "https://172.31.19.95:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
    To see the stack trace of this error execute with --v=5 or higher

    I looked at the other messages and I saw your comments about the security group and the firewalls. As far as I can discern -- I am enabling all traffic from all ports. See below for both nodes. So I have no idea why this is holding up.

    MASTER

    WORKER

    If you have any other ideas - I am all ears. Otherwise . . . I think I am just out the tuition.

  • Thanks. I terminated all the instances and rebuilt from scratch again today and finally got in. Just FYI . . . the EC2 user interface has changed since they did the video and so I thought I did what he indicated, but if you don't click some extra buttons you don't see that they ALREADY give you a default security configuration and so you need to delete that and then do your own.

  • For anyone else experiencing the ERROR Port 10250 (same thing for Port 10257/10259)

    I had this issue and could not figure it out for the life of me without extensive research.

    The culprit was kubelite using the ports. So no amount of kubeadm reset could resolve it in my case. (sudo or not )

    until I did:

    sudo netstat tlpen | grep 10259 (or any of the above mentioned ports). # This revealed that kubelite was using the port.

    Next:

    pkill kubelite # This kills the kubelite process and if your run the kubeadm init process again, it works!

    Notes: I was using vmware fusion for my virtualisation and running ubuntu 22.04 from a Mac

    Hopefully this saves you the pain I had to go through.

  • chunkuoli
    chunkuoli Posts: 3
    edited September 2023

    For me, these are the 2 steps to solve this issue.

    (1) In the security group that is shared between the 2 EC2s of CP and Worker, make sure it has this 3 inbound rules:

    (2) Once the 2 nodes are up running, at the paragraph 8 of lab 2.2, add the following line to .bashrc:
    "export KUBECONFIG=/etc/kubernetes/kubelet.conf"
    and then ". .bashrc"

    After those 2 steps, the paragraph 8 should be ran through successfully.

    There could be also some file permission issues. However, it is obviously to solve it by "sudo chmod +rwx [filename]"

    Hope this helps.

Categories

Upcoming Training