Unsuccessful in joining the 2nd node with master with kubeadm join command
I was able to create the master node with k8sMaster.sh and can see it in my instance.
kubectl get node NAME STATUS ROLES AGE VERSION ubuntu-bionic-1 Ready master 3d22h v1.16.1
After executing the k8sSecond.sh, I am not able to successfully join the 2nd node with the same cluster as the master is. The command just hangs!
sudo kubeadm join 10.128.0.2:6443 --token <token i got from master.out> --discovery-token-ca-cert-hash sha256:<value i got from master.out> --ignore-preflight-errors='all' [preflight] Running pre-flight checks [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Port-10250]: Port 10250 is in use [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
Any help is appreciated.
Comments
-
I tried reseting master node and ran everything again. Steps I have followed
1.sudo kubeadm reset
2.bash k8sMaster.sh
3.mkdir -p $HOME/.kube
4.sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
5.sudo chown $(id -u):$(id -g) $HOME/.kube/config
6.kubectl apply -f calico.yaml
7.bash k8sSecond.sh
So after this, when I run the kubeadm join command, even though the command executed successfully, I don't see the node in the cluster!
sudo kubeadm join 10.128.0.2:6443 --token ### --discovery-token-ca-cert-hash sha256:### --ignore-preflight-errors='all' [preflight] Running pre-flight checks [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Port-10250]: Port 10250 is in use [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster. username@ubuntu-bionic-1:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION ubuntu-bionic-1 Ready master 5m47s v1.16.1
0 -
Hi,
The first issue was caused by kubeadm being run sequentially on the same node. Also, keep in mind that the token issued by the master expires after a certain time, and it will prevent another host from utilizing it to join a cluster.
On the second attempt, did you also run the rbac file for calico? Did you reset the worker node before running kubeadm join?
Are your firewalls disabled and traffic allowed to all ports for all protocols?
Are your nodes sized accordingly?
Regards,
-Chris0 -
Yes your were right I created another VM instance and was able to create the minion node.
I had to run the rbac file as well.
I am now able to create the master and minion node and set them up in the cluster. Thanks!
0 -
Might be useful to someone. Use the calico.yaml from the official documentation - https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml
If you use the one from the course there were so many rbac issues keep showing up!
0 -
Sorry posted the wrong link. Here is the latest version - https://docs.projectcalico.org/v3.11/manifests/calico.yaml
0 -
Did you apply the included rbac yaml file?
0 -
I have a similar issue. Created the master node successfully. When attempting to create minion node, kubeadm get stuck in precheck.
- Tried to restart Master--> din't help
Created new kubeadm token since it had been more than 2 hours since I created the master node.
student@lfd259-ateiv-htql:~$ sudo kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 6vmwmi.wto27a2jk26xey22 13h 2020-01-09T16:38:07Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token v150sj.mawz91pqhcd4h7ng 23h 2020-01-10T02:55:28Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
On the minion node, it just gets stuck:
root@lfd259-ateiv-tbjs:~# kubeadm join --token v150sj.mawz91pqhcd4h7ng k8smaster:6443 --discovery-token-ca-cert-hash sha256:9a51c55cda7ba19e173c93d9587b9aad8914d10b4ebb8749104a897b370960ef [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
Found following traces in /var/log/syslog:
Jan 9 03:37:37 lfd259-ateiv-tbjs kubelet[2883]: F0109 03:37:37.492973 2883 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /v ar/lib/kubelet/config.yaml: no such file or directory Jan 9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a Jan 9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Unit entered failed state. Jan 9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Failed with result 'exit-code'.
Any help would be great!
0 -
0
-
I configured it correctly based on the lab. It's the same configured on the master node:
root@lfs258-example-td92:~# cat /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 169.254.169.254 metadata.google.internal metadata 10.168.0.3 k8smaster 10.168.0.3 lfs258-example-td92.us-west2-c.c.astral-reef-264415.internal lfs258-example-td92 # Added by Google 169.254.169.254 metadata.google.internal # Added by Google
I even tried to nuke the worker node, but still getting the same error.
0 -
From your syslog
... /var/lib/kubelet/config.yaml: no such file or directory
Maybe you tried already regenerating the token on master viasudo kubeadm token create --print-join-command
and re-joining from the worker0 -
Hi @gfalasca
The syslog output is from an older deployment which I tried when I first posted here. I deployed the master again from scratch and then tried to join worker node. This time I didn't create a new token as it was within 2 hours. I still see the same error in syslog:
Jan 9 18:35:43 lfs258-example-td92 kubelet[1157]: F0109 18:35:43.594741 1157 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory Jan 9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a Jan 9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Unit entered failed state. Jan 9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Failed with result 'exit-code'. Jan 9 18:35:53 lfs258-example-td92 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart. Jan 9 18:35:53 lfs258-example-td92 systemd[1]: Stopped kubelet: The Kubernetes Node Agent. Jan 9 18:35:53 lfs258-example-td92 systemd[1]: Started kubelet: The Kubernetes Node Agent.
From Master node:
student@lfs258-example-d5pm:~$ sudo kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS ytkmwy.ksw6bk75zhngzb4c 10h 2020-01-10T05:16:14Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token Output of kubeadm init on master node: You can now join any number of the control-plane node running the following command on each as root: kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \ --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405 \ --control-plane --certificate-key 96de61174ccf620108892e24665f8b5053346634693c22a5e9f82ff1d311b66b Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \ --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405
0 -
Hi @AteivJain,
If you only bootstrapped the master from scratch, and not the worker, then after every unsuccessful
sudo kubeadm join ...
command asudo kubeadm reset
is required on the worker node to clear the partially configured environment before the nextsudo kubeadm join ...
.Regards,
-Chris0 -
Hello,
When troubleshooting it can be helpful to see the command you typed as well as the output. One note, I have encountered some folks who copy then paste the join command, but insert it into a word pad first have issues as there is an extra carriage return inserted.
Starting with the fresh instances. Follow the labs as written. When it comes time to copy and paste be careful to select one line at a time and also omit the back-slash. I find that if this works the issue was in an accidental insertion by the tools being used to copy paste. The issue seems to be tied to some version of word pad on Microsoft as well as some notepad options on macs.
Regards,
0 -
@chrispokorni I bootstrapped the both Master and worker by killing the Instance group on GCP and then re-deployed.
@serewicz Thanks for pointing that out. I did try the lab from scratch and tried my best not to do mistake while coping the commands. For bigger syntax, I copied them on the node using vim. Not sure what else I can try...
0 -
Hmm. Okay. I will put together a quick video of the installation of the master node and the worker. It will take me a bit. Perhaps once you see it work for me you can tell what we are doing differently.
Would that be helpful?
Regards,
0 -
@serewicz I tried again from scratch. This time I typed all commands making sure I didn't miss anything. While joining the cluster, I enabled verbose logging and found following traces. Not sure if it points to anywhere.
root@lfs258-ateiv-vnc1:~# kubeadm join k8smaster:6443 --token dow8xr.t7jyriiot6rezfyr --discovery-token-ca-cert-hash sha256:e65cf99c97ccd86a612de073b0912851ed56ce9443e3fd204145bcaaea44ee2c --v=5 I0109 22:00:27.212537 15476 join.go:363] [preflight] found NodeName empty; using OS hostname as NodeName I0109 22:00:27.212638 15476 initconfiguration.go:102] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I0109 22:00:27.212732 15476 preflight.go:90] [preflight] Running general checks [preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH I0109 22:00:27.212878 15476 checks.go:250] validating the existence and emptiness of directory /etc/kubernetes/manifests I0109 22:00:27.212946 15476 checks.go:287] validating the existence of file /etc/kubernetes/kubelet.conf I0109 22:00:27.212963 15476 checks.go:287] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I0109 22:00:27.212975 15476 checks.go:377] validating the presence of executable crictl I0109 22:00:27.213011 15476 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I0109 22:00:27.213047 15476 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward I0109 22:00:27.213105 15476 checks.go:650] validating whether swap is enabled or not I0109 22:00:27.213142 15476 checks.go:377] validating the presence of executable ip I0109 22:00:27.213174 15476 checks.go:377] validating the presence of executable iptables I0109 22:00:27.213202 15476 checks.go:377] validating the presence of executable mount I0109 22:00:27.213231 15476 checks.go:377] validating the presence of executable nsenter I0109 22:00:27.213254 15476 checks.go:377] validating the presence of executable ebtables I0109 22:00:27.213281 15476 checks.go:377] validating the presence of executable ethtool I0109 22:00:27.213308 15476 checks.go:377] validating the presence of executable socat I0109 22:00:27.213330 15476 checks.go:377] validating the presence of executable tc I0109 22:00:27.213357 15476 checks.go:377] validating the presence of executable touch I0109 22:00:27.213383 15476 checks.go:521] running all checks I0109 22:00:27.227485 15476 checks.go:407] checking whether the given node name is reachable using net.LookupHost I0109 22:00:27.227739 15476 checks.go:619] validating kubelet version I0109 22:00:27.291576 15476 checks.go:129] validating if the service is enabled and active I0109 22:00:27.299743 15476 checks.go:202] validating availability of port 10250 I0109 22:00:27.299955 15476 checks.go:287] validating the existence of file /etc/kubernetes/pki/ca.crt I0109 22:00:27.299978 15476 checks.go:433] validating if the connectivity type is via proxy or direct [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` error execution phase preflight k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1 /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:237 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209 k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1 /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864 k8s.io/kubernetes/cmd/kubeadm/app.Run /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50 main.main _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25 runtime.main /usr/local/go/src/runtime/proc.go:200 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337 root@lfs258-ateiv-vnc1:~#
0 -
@AteivJain said:
Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewiczHi @AteivJain , Where did you check for the DNS entry on the worker node?
0 -
0
-
What I stupidly did while completing the lab for LFS258, was running
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
on the worker VM instead of the master VM.Once I realized my mistake, I ran it again on the master VM and then used the generated hash in the kubeadm join command on the worker VM which then joined fine.
Hope this helps someone
0 -
I had a similar issue while using calico for my local cluster setup (based on hyperkit). Replacing calico with Weave Net solved it for me.
0 -
Thank you for the feedback. Is hyperkit still something only for MacOS, or has it been ported other places?
0
Categories
- All Categories
- 167 LFX Mentorship
- 219 LFX Mentorship: Linux Kernel
- 801 Linux Foundation IT Professional Programs
- 357 Cloud Engineer IT Professional Program
- 181 Advanced Cloud Engineer IT Professional Program
- 83 DevOps Engineer IT Professional Program
- 149 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 138 Express Courses - Discussion Forum
- 6.2K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 35 LFD102 Class Forum
- 227 LFD103 Class Forum
- 19 LFD110 Class Forum
- 39 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 697 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 154 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 1 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 7 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 34 LFS158 Class Forum
- 8 LFS162 Class Forum
- 1 LFS166 Class Forum
- 1 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 1 LFS178 Class Forum
- 1 LFS180 Class Forum
- 1 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 102 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 48 LFS241 Class Forum
- 48 LFS242 Class Forum
- 37 LFS243 Class Forum
- 15 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- 50 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 155 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 5 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 121 LFS260 Class Forum
- 159 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 25 LFS268 Class Forum
- 31 LFS269 Class Forum
- 1 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 10 LFW111 Class Forum
- 261 LFW211 Class Forum
- 182 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 96 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 758 Linux Distributions
- 80 Debian
- 67 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 143 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 461 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 64 Mobile Computing
- 17 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 19 Study Material
- 806 Programming and Development
- 304 Kernel Development
- 204 Software Development
- 1.8K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 309 Installation
- 97 All In Program
- 97 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)