Unsuccessful in joining the 2nd node with master with kubeadm join command
I was able to create the master node with k8sMaster.sh and can see it in my instance.
kubectl get node NAME STATUS ROLES AGE VERSION ubuntu-bionic-1 Ready master 3d22h v1.16.1
After executing the k8sSecond.sh, I am not able to successfully join the 2nd node with the same cluster as the master is. The command just hangs!
sudo kubeadm join 10.128.0.2:6443 --token <token i got from master.out> --discovery-token-ca-cert-hash sha256:<value i got from master.out> --ignore-preflight-errors='all' [preflight] Running pre-flight checks [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Port-10250]: Port 10250 is in use [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
Any help is appreciated.
I tried reseting master node and ran everything again. Steps I have followed
sudo kubeadm reset
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f calico.yaml
So after this, when I run the kubeadm join command, even though the command executed successfully, I don't see the node in the cluster!
sudo kubeadm join 10.128.0.2:6443 --token ### --discovery-token-ca-cert-hash sha256:### --ignore-preflight-errors='all' [preflight] Running pre-flight checks [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ [WARNING Port-10250]: Port 10250 is in use [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml' [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Activating the kubelet service [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster. [email protected]:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION ubuntu-bionic-1 Ready master 5m47s v1.16.10
The first issue was caused by kubeadm being run sequentially on the same node. Also, keep in mind that the token issued by the master expires after a certain time, and it will prevent another host from utilizing it to join a cluster.
On the second attempt, did you also run the rbac file for calico? Did you reset the worker node before running kubeadm join?
Are your firewalls disabled and traffic allowed to all ports for all protocols?
Are your nodes sized accordingly?
Yes your were right I created another VM instance and was able to create the minion node.
I had to run the rbac file as well.
I am now able to create the master and minion node and set them up in the cluster. Thanks!0
Might be useful to someone. Use the calico.yaml from the official documentation - https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml
If you use the one from the course there were so many rbac issues keep showing up!0
Sorry posted the wrong link. Here is the latest version - https://docs.projectcalico.org/v3.11/manifests/calico.yaml0
Did you apply the included rbac yaml file?0
I have a similar issue. Created the master node successfully. When attempting to create minion node, kubeadm get stuck in precheck.
- Tried to restart Master--> din't help
Created new kubeadm token since it had been more than 2 hours since I created the master node.
[email protected]:~$ sudo kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS 6vmwmi.wto27a2jk26xey22 13h 2020-01-09T16:38:07Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token v150sj.mawz91pqhcd4h7ng 23h 2020-01-10T02:55:28Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
On the minion node, it just gets stuck:
[email protected]:~# kubeadm join --token v150sj.mawz91pqhcd4h7ng k8smaster:6443 --discovery-token-ca-cert-hash sha256:9a51c55cda7ba19e173c93d9587b9aad8914d10b4ebb8749104a897b370960ef [preflight] Running pre-flight checks [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
Found following traces in /var/log/syslog:
Jan 9 03:37:37 lfd259-ateiv-tbjs kubelet: F0109 03:37:37.492973 2883 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /v ar/lib/kubelet/config.yaml: no such file or directory Jan 9 03:37:37 lfd259-ateiv-tbjs systemd: kubelet.service: Main process exited, code=exited, status=255/n/a Jan 9 03:37:37 lfd259-ateiv-tbjs systemd: kubelet.service: Unit entered failed state. Jan 9 03:37:37 lfd259-ateiv-tbjs systemd: kubelet.service: Failed with result 'exit-code'.
Any help would be great!0
/etc/hostsfile configured correctly on your worker node?
I configured it correctly based on the lab. It's the same configured on the master node:
[email protected]:~# cat /etc/hosts 127.0.0.1 localhost # The following lines are desirable for IPv6 capable hosts ::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters ff02::3 ip6-allhosts 169.254.169.254 metadata.google.internal metadata 10.168.0.3 k8smaster 10.168.0.3 lfs258-example-td92.us-west2-c.c.astral-reef-264415.internal lfs258-example-td92 # Added by Google 169.254.169.254 metadata.google.internal # Added by Google
I even tried to nuke the worker node, but still getting the same error.0
gfalasca Posts: 8
From your syslog
... /var/lib/kubelet/config.yaml: no such file or directory
Maybe you tried already regenerating the token on master via
sudo kubeadm token create --print-join-command
and re-joining from the worker0
The syslog output is from an older deployment which I tried when I first posted here. I deployed the master again from scratch and then tried to join worker node. This time I didn't create a new token as it was within 2 hours. I still see the same error in syslog:
Jan 9 18:35:43 lfs258-example-td92 kubelet: F0109 18:35:43.594741 1157 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory Jan 9 18:35:43 lfs258-example-td92 systemd: kubelet.service: Main process exited, code=exited, status=255/n/a Jan 9 18:35:43 lfs258-example-td92 systemd: kubelet.service: Unit entered failed state. Jan 9 18:35:43 lfs258-example-td92 systemd: kubelet.service: Failed with result 'exit-code'. Jan 9 18:35:53 lfs258-example-td92 systemd: kubelet.service: Service hold-off time over, scheduling restart. Jan 9 18:35:53 lfs258-example-td92 systemd: Stopped kubelet: The Kubernetes Node Agent. Jan 9 18:35:53 lfs258-example-td92 systemd: Started kubelet: The Kubernetes Node Agent.
From Master node:
[email protected]:~$ sudo kubeadm token list TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS ytkmwy.ksw6bk75zhngzb4c 10h 2020-01-10T05:16:14Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token Output of kubeadm init on master node: You can now join any number of the control-plane node running the following command on each as root: kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \ --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405 \ --control-plane --certificate-key 96de61174ccf620108892e24665f8b5053346634693c22a5e9f82ff1d311b66b Please note that the certificate-key gives access to cluster sensitive data, keep it secret! As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use "kubeadm init phase upload-certs --upload-certs" to reload certs afterward. Then you can join any number of worker nodes by running the following on each as root: kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \ --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee4050
If you only bootstrapped the master from scratch, and not the worker, then after every unsuccessful
sudo kubeadm join ...command a
sudo kubeadm resetis required on the worker node to clear the partially configured environment before the next
sudo kubeadm join ....
When troubleshooting it can be helpful to see the command you typed as well as the output. One note, I have encountered some folks who copy then paste the join command, but insert it into a word pad first have issues as there is an extra carriage return inserted.
Starting with the fresh instances. Follow the labs as written. When it comes time to copy and paste be careful to select one line at a time and also omit the back-slash. I find that if this works the issue was in an accidental insertion by the tools being used to copy paste. The issue seems to be tied to some version of word pad on Microsoft as well as some notepad options on macs.
@chrispokorni I bootstrapped the both Master and worker by killing the Instance group on GCP and then re-deployed.
@serewicz Thanks for pointing that out. I did try the lab from scratch and tried my best not to do mistake while coping the commands. For bigger syntax, I copied them on the node using vim. Not sure what else I can try...0
Hmm. Okay. I will put together a quick video of the installation of the master node and the worker. It will take me a bit. Perhaps once you see it work for me you can tell what we are doing differently.
Would that be helpful?
@serewicz I tried again from scratch. This time I typed all commands making sure I didn't miss anything. While joining the cluster, I enabled verbose logging and found following traces. Not sure if it points to anywhere.
[email protected]:~# kubeadm join k8smaster:6443 --token dow8xr.t7jyriiot6rezfyr --discovery-token-ca-cert-hash sha256:e65cf99c97ccd86a612de073b0912851ed56ce9443e3fd204145bcaaea44ee2c --v=5 I0109 22:00:27.212537 15476 join.go:363] [preflight] found NodeName empty; using OS hostname as NodeName I0109 22:00:27.212638 15476 initconfiguration.go:102] detected and using CRI socket: /var/run/dockershim.sock [preflight] Running pre-flight checks I0109 22:00:27.212732 15476 preflight.go:90] [preflight] Running general checks [preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH I0109 22:00:27.212878 15476 checks.go:250] validating the existence and emptiness of directory /etc/kubernetes/manifests I0109 22:00:27.212946 15476 checks.go:287] validating the existence of file /etc/kubernetes/kubelet.conf I0109 22:00:27.212963 15476 checks.go:287] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf I0109 22:00:27.212975 15476 checks.go:377] validating the presence of executable crictl I0109 22:00:27.213011 15476 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables I0109 22:00:27.213047 15476 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward I0109 22:00:27.213105 15476 checks.go:650] validating whether swap is enabled or not I0109 22:00:27.213142 15476 checks.go:377] validating the presence of executable ip I0109 22:00:27.213174 15476 checks.go:377] validating the presence of executable iptables I0109 22:00:27.213202 15476 checks.go:377] validating the presence of executable mount I0109 22:00:27.213231 15476 checks.go:377] validating the presence of executable nsenter I0109 22:00:27.213254 15476 checks.go:377] validating the presence of executable ebtables I0109 22:00:27.213281 15476 checks.go:377] validating the presence of executable ethtool I0109 22:00:27.213308 15476 checks.go:377] validating the presence of executable socat I0109 22:00:27.213330 15476 checks.go:377] validating the presence of executable tc I0109 22:00:27.213357 15476 checks.go:377] validating the presence of executable touch I0109 22:00:27.213383 15476 checks.go:521] running all checks I0109 22:00:27.227485 15476 checks.go:407] checking whether the given node name is reachable using net.LookupHost I0109 22:00:27.227739 15476 checks.go:619] validating kubelet version I0109 22:00:27.291576 15476 checks.go:129] validating if the service is enabled and active I0109 22:00:27.299743 15476 checks.go:202] validating availability of port 10250 I0109 22:00:27.299955 15476 checks.go:287] validating the existence of file /etc/kubernetes/pki/ca.crt I0109 22:00:27.299978 15476 checks.go:433] validating if the connectivity type is via proxy or direct [preflight] Some fatal errors occurred: [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1 [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...` error execution phase preflight k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1 /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:237 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424 k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209 k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1 /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914 k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864 k8s.io/kubernetes/cmd/kubeadm/app.Run /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50 main.main _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25 runtime.main /usr/local/go/src/runtime/proc.go:200 runtime.goexit /usr/local/go/src/runtime/asm_amd64.s:1337 [email protected]:~#0
Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz
Hi @AteivJain , Where did you check for the DNS entry on the worker node?0
The lab has edits to the /etc/hosts file. Check there.
What I stupidly did while completing the lab for LFS258, was running
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'on the worker VM instead of the master VM.
Once I realized my mistake, I ran it again on the master VM and then used the generated hash in the kubeadm join command on the worker VM which then joined fine.
Hope this helps someone0
I had a similar issue while using calico for my local cluster setup (based on hyperkit). Replacing calico with Weave Net solved it for me.0
Thank you for the feedback. Is hyperkit still something only for MacOS, or has it been ported other places?0
- 10.1K All Categories
- 35 LFX Mentorship
- 88 LFX Mentorship: Linux Kernel
- 504 Linux Foundation Boot Camps
- 279 Cloud Engineer Boot Camp
- 103 Advanced Cloud Engineer Boot Camp
- 48 DevOps Engineer Boot Camp
- 41 Cloud Native Developer Boot Camp
- 2 Express Training Courses
- 2 Express Courses - Discussion Forum
- 1.7K Training Courses
- 17 LFC110 Class Forum
- 5 LFC131 Class Forum
- 19 LFD102 Class Forum
- 148 LFD103 Class Forum
- 12 LFD121 Class Forum
- 61 LFD201 Class Forum
- LFD210 Class Forum
- 1 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum
- 23 LFD254 Class Forum
- 569 LFD259 Class Forum
- 100 LFD272 Class Forum
- 1 LFD272-JP クラス フォーラム
- 1 LFS145 Class Forum
- 23 LFS200 Class Forum
- 739 LFS201 Class Forum
- 1 LFS201-JP クラス フォーラム
- 1 LFS203 Class Forum
- 45 LFS207 Class Forum
- 298 LFS211 Class Forum
- 53 LFS216 Class Forum
- 46 LFS241 Class Forum
- 41 LFS242 Class Forum
- 37 LFS243 Class Forum
- 10 LFS244 Class Forum
- 27 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- 131 LFS253 Class Forum
- 995 LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 87 LFS260 Class Forum
- 126 LFS261 Class Forum
- 31 LFS262 Class Forum
- 79 LFS263 Class Forum
- 15 LFS264 Class Forum
- 10 LFS266 Class Forum
- 17 LFS267 Class Forum
- 17 LFS268 Class Forum
- 21 LFS269 Class Forum
- 200 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- 212 LFW211 Class Forum
- 153 LFW212 Class Forum
- 899 Hardware
- 217 Drivers
- 74 I/O Devices
- 44 Monitors
- 115 Multimedia
- 208 Networking
- 101 Printers & Scanners
- 85 Storage
- 749 Linux Distributions
- 88 Debian
- 64 Fedora
- 14 Linux Mint
- 13 Mageia
- 24 openSUSE
- 133 Red Hat Enterprise
- 33 Slackware
- 13 SUSE Enterprise
- 355 Ubuntu
- 473 Linux System Administration
- 38 Cloud Computing
- 69 Command Line/Scripting
- Github systems admin projects
- 94 Linux Security
- 77 Network Management
- 108 System Management
- 49 Web Management
- 63 Mobile Computing
- 22 Android
- 27 Development
- 1.2K New to Linux
- 1.1K Getting Started with Linux
- 528 Off Topic
- 127 Introductions
- 213 Small Talk
- 20 Study Material
- 794 Programming and Development
- 262 Kernel Development
- 498 Software Development
- 923 Software
- 258 Applications
- 182 Command Line
- 2 Compiling/Installing
- 76 Games
- 316 Installation
- 53 All In Program
- 53 All In Forum
August 20, 2018
Kubernetes Administration (LFS458)
August 20, 2018
Linux System Administration (LFS301)
August 27, 2018
Open Source Virtualization (LFS462)
August 27, 2018
Linux Kernel Debugging and Security (LFD440)