Welcome to the Linux Foundation Forum!
Unsuccessful in joining the 2nd node with master with kubeadm join command

I was able to create the master node with k8sMaster.sh and can see it in my instance.
kubectl get node NAME STATUS ROLES AGE VERSION ubuntu-bionic-1 Ready master 3d22h v1.16.1
After executing the k8sSecond.sh, I am not able to successfully join the 2nd node with the same cluster as the master is. The command just hangs!
sudo kubeadm join 10.128.0.2:6443 --token <token i got from master.out> --discovery-token-ca-cert-hash sha256:<value i got from master.out> --ignore-preflight-errors='all' [preflight] Running pre-flight checks [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Port-10250]: Port 10250 is in use [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
Any help is appreciated.
0
Comments
I tried reseting master node and ran everything again. Steps I have followed
1.
sudo kubeadm reset
2.
bash k8sMaster.sh
3.
mkdir -p $HOME/.kube
4.
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
5.
sudo chown $(id -u):$(id -g) $HOME/.kube/config
6.
kubectl apply -f calico.yaml
7.
bash k8sSecond.sh
So after this, when I run the kubeadm join command, even though the command executed successfully, I don't see the node in the cluster!
Hi,
The first issue was caused by kubeadm being run sequentially on the same node. Also, keep in mind that the token issued by the master expires after a certain time, and it will prevent another host from utilizing it to join a cluster.
On the second attempt, did you also run the rbac file for calico? Did you reset the worker node before running kubeadm join?
Are your firewalls disabled and traffic allowed to all ports for all protocols?
Are your nodes sized accordingly?
Regards,
-Chris
Yes your were right I created another VM instance and was able to create the minion node.
I had to run the rbac file as well.
I am now able to create the master and minion node and set them up in the cluster. Thanks!
Might be useful to someone. Use the calico.yaml from the official documentation - https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml
If you use the one from the course there were so many rbac issues keep showing up!
Sorry posted the wrong link. Here is the latest version - https://docs.projectcalico.org/v3.11/manifests/calico.yaml
Did you apply the included rbac yaml file?
I have a similar issue. Created the master node successfully. When attempting to create minion node, kubeadm get stuck in precheck.
Created new kubeadm token since it had been more than 2 hours since I created the master node.
On the minion node, it just gets stuck:
Found following traces in /var/log/syslog:
Any help would be great!
Hi @AteivJain,
Is the
/etc/hosts
file configured correctly on your worker node?-Chris
Hi @chrispokorni
I configured it correctly based on the lab. It's the same configured on the master node:
I even tried to nuke the worker node, but still getting the same error.
From your syslog
... /var/lib/kubelet/config.yaml: no such file or directory
Maybe you tried already regenerating the token on master via
sudo kubeadm token create --print-join-command
and re-joining from the worker
Hi @gfalasca
The syslog output is from an older deployment which I tried when I first posted here. I deployed the master again from scratch and then tried to join worker node. This time I didn't create a new token as it was within 2 hours. I still see the same error in syslog:
From Master node:
Hi @AteivJain,
If you only bootstrapped the master from scratch, and not the worker, then after every unsuccessful
sudo kubeadm join ...
command asudo kubeadm reset
is required on the worker node to clear the partially configured environment before the nextsudo kubeadm join ...
.Regards,
-Chris
Hello,
When troubleshooting it can be helpful to see the command you typed as well as the output. One note, I have encountered some folks who copy then paste the join command, but insert it into a word pad first have issues as there is an extra carriage return inserted.
Starting with the fresh instances. Follow the labs as written. When it comes time to copy and paste be careful to select one line at a time and also omit the back-slash. I find that if this works the issue was in an accidental insertion by the tools being used to copy paste. The issue seems to be tied to some version of word pad on Microsoft as well as some notepad options on macs.
Regards,
@chrispokorni I bootstrapped the both Master and worker by killing the Instance group on GCP and then re-deployed.
@serewicz Thanks for pointing that out. I did try the lab from scratch and tried my best not to do mistake while coping the commands. For bigger syntax, I copied them on the node using vim. Not sure what else I can try...
Hmm. Okay. I will put together a quick video of the installation of the master node and the worker. It will take me a bit. Perhaps once you see it work for me you can tell what we are doing differently.
Would that be helpful?
Regards,
@serewicz That would be great. In the meantime, I'll give it a try again and keep troubleshooting. Will keep the thread updated. Thanks again for all the help!
@serewicz I tried again from scratch. This time I typed all commands making sure I didn't miss anything. While joining the cluster, I enabled verbose logging and found following traces. Not sure if it points to anywhere.
Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz
Hi @AteivJain , Where did you check for the DNS entry on the worker node?
Hello @amitraisharma
The lab has edits to the /etc/hosts file. Check there.
Regards,
What I stupidly did while completing the lab for LFS258, was running
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
on the worker VM instead of the master VM.Once I realized my mistake, I ran it again on the master VM and then used the generated hash in the kubeadm join command on the worker VM which then joined fine.
Hope this helps someone
I had a similar issue while using calico for my local cluster setup (based on hyperkit). Replacing calico with Weave Net solved it for me.
Thank you for the feedback. Is hyperkit still something only for MacOS, or has it been ported other places?