Welcome to the Linux Foundation Forum!

Unsuccessful in joining the 2nd node with master with kubeadm join command

I was able to create the master node with k8sMaster.sh and can see it in my instance.

  1. kubectl get node
  2. NAME STATUS ROLES AGE VERSION
  3. ubuntu-bionic-1 Ready master 3d22h v1.16.1

After executing the k8sSecond.sh, I am not able to successfully join the 2nd node with the same cluster as the master is. The command just hangs!

  1. sudo kubeadm join 10.128.0.2:6443 --token <token i got from master.out> --discovery-token-ca-cert-hash sha256:<value i got from master.out> --ignore-preflight-errors='all'
  2. [preflight] Running pre-flight checks
  3. [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
  4. [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
  5. [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
  6. [WARNING Port-10250]: Port 10250 is in use
  7. [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists

Any help is appreciated.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • I tried reseting master node and ran everything again. Steps I have followed
    1. sudo kubeadm reset
    2. bash k8sMaster.sh
    3. mkdir -p $HOME/.kube
    4. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    5. sudo chown $(id -u):$(id -g) $HOME/.kube/config
    6. kubectl apply -f calico.yaml
    7. bash k8sSecond.sh

    So after this, when I run the kubeadm join command, even though the command executed successfully, I don't see the node in the cluster!

    1. sudo kubeadm join 10.128.0.2:6443 --token ### --discovery-token-ca-cert-hash sha256:### --ignore-preflight-errors='all'
    2. [preflight] Running pre-flight checks
    3. [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
    4. [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
    5. [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    6. [WARNING Port-10250]: Port 10250 is in use
    7. [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
    8. [preflight] Reading configuration from the cluster...
    9. [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    10. [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
    11. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    12. [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    13. [kubelet-start] Activating the kubelet service
    14. [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
    15.  
    16. This node has joined the cluster:
    17. * Certificate signing request was sent to apiserver and a response was received.
    18. * The Kubelet was informed of the new secure connection details.
    19.  
    20. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
    21.  
    22. username@ubuntu-bionic-1:~$ kubectl get nodes
    23. NAME STATUS ROLES AGE VERSION
    24. ubuntu-bionic-1 Ready master 5m47s v1.16.1
  • Hi,

    The first issue was caused by kubeadm being run sequentially on the same node. Also, keep in mind that the token issued by the master expires after a certain time, and it will prevent another host from utilizing it to join a cluster.

    On the second attempt, did you also run the rbac file for calico? Did you reset the worker node before running kubeadm join?

    Are your firewalls disabled and traffic allowed to all ports for all protocols?

    Are your nodes sized accordingly?

    Regards,
    -Chris

  • Yes your were right I created another VM instance and was able to create the minion node.

    I had to run the rbac file as well.

    I am now able to create the master and minion node and set them up in the cluster. Thanks!

  • Might be useful to someone. Use the calico.yaml from the official documentation - https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml

    If you use the one from the course there were so many rbac issues keep showing up!

  • Sorry posted the wrong link. Here is the latest version - https://docs.projectcalico.org/v3.11/manifests/calico.yaml

  • Posts: 1,000

    Did you apply the included rbac yaml file?

  • Posts: 7
    edited January 2020

    I have a similar issue. Created the master node successfully. When attempting to create minion node, kubeadm get stuck in precheck.

    1. Tried to restart Master--> din't help
    2. Created new kubeadm token since it had been more than 2 hours since I created the master node.

      1. student@lfd259-ateiv-htql:~$ sudo kubeadm token list
      2. TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
      3. 6vmwmi.wto27a2jk26xey22 13h 2020-01-09T16:38:07Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
      4. v150sj.mawz91pqhcd4h7ng 23h 2020-01-10T02:55:28Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token

    On the minion node, it just gets stuck:

    1. root@lfd259-ateiv-tbjs:~# kubeadm join --token v150sj.mawz91pqhcd4h7ng k8smaster:6443 --discovery-token-ca-cert-hash sha256:9a51c55cda7ba19e173c93d9587b9aad8914d10b4ebb8749104a897b370960ef
    2. [preflight] Running pre-flight checks
    3. [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

    Found following traces in /var/log/syslog:

    1. Jan 9 03:37:37 lfd259-ateiv-tbjs kubelet[2883]: F0109 03:37:37.492973 2883 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /v
    2. ar/lib/kubelet/config.yaml: no such file or directory
    3. Jan 9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
    4. Jan 9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Unit entered failed state.
    5. Jan 9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Failed with result 'exit-code'.

    Any help would be great!

  • Posts: 2,443

    Hi @AteivJain,

    Is the /etc/hosts file configured correctly on your worker node?

    -Chris

  • Hi @chrispokorni

    I configured it correctly based on the lab. It's the same configured on the master node:

    1. root@lfs258-example-td92:~# cat /etc/hosts
    2. 127.0.0.1 localhost
    3.  
    4. # The following lines are desirable for IPv6 capable hosts
    5. ::1 ip6-localhost ip6-loopback
    6. fe00::0 ip6-localnet
    7. ff00::0 ip6-mcastprefix
    8. ff02::1 ip6-allnodes
    9. ff02::2 ip6-allrouters
    10. ff02::3 ip6-allhosts
    11. 169.254.169.254 metadata.google.internal metadata
    12. 10.168.0.3 k8smaster
    13. 10.168.0.3 lfs258-example-td92.us-west2-c.c.astral-reef-264415.internal lfs258-example-td92 # Added by Google
    14. 169.254.169.254 metadata.google.internal # Added by Google

    I even tried to nuke the worker node, but still getting the same error.

  • Posts: 8
    edited January 2020

    From your syslog ... /var/lib/kubelet/config.yaml: no such file or directory
    Maybe you tried already regenerating the token on master via sudo kubeadm token create --print-join-command
    and re-joining from the worker

  • Hi @gfalasca

    The syslog output is from an older deployment which I tried when I first posted here. I deployed the master again from scratch and then tried to join worker node. This time I didn't create a new token as it was within 2 hours. I still see the same error in syslog:

    1. Jan 9 18:35:43 lfs258-example-td92 kubelet[1157]: F0109 18:35:43.594741 1157 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
    2. Jan 9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
    3. Jan 9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Unit entered failed state.
    4. Jan 9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    5. Jan 9 18:35:53 lfs258-example-td92 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    6. Jan 9 18:35:53 lfs258-example-td92 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
    7. Jan 9 18:35:53 lfs258-example-td92 systemd[1]: Started kubelet: The Kubernetes Node Agent.

    From Master node:

    1. student@lfs258-example-d5pm:~$ sudo kubeadm token list
    2. TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
    3. ytkmwy.ksw6bk75zhngzb4c 10h 2020-01-10T05:16:14Z authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
    4.  
    5. Output of kubeadm init on master node:
    6.  
    7. You can now join any number of the control-plane node running the following command on each as root:
    8.  
    9. kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \
    10. --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405 \
    11. --control-plane --certificate-key 96de61174ccf620108892e24665f8b5053346634693c22a5e9f82ff1d311b66b
    12.  
    13. Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
    14. As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use
    15. "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
    16.  
    17. Then you can join any number of worker nodes by running the following on each as root:
    18.  
    19. kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \
    20. --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405
  • Posts: 2,443

    Hi @AteivJain,

    If you only bootstrapped the master from scratch, and not the worker, then after every unsuccessful sudo kubeadm join ... command a sudo kubeadm reset is required on the worker node to clear the partially configured environment before the next sudo kubeadm join ....

    Regards,
    -Chris

  • Posts: 1,000

    Hello,

    When troubleshooting it can be helpful to see the command you typed as well as the output. One note, I have encountered some folks who copy then paste the join command, but insert it into a word pad first have issues as there is an extra carriage return inserted.

    Starting with the fresh instances. Follow the labs as written. When it comes time to copy and paste be careful to select one line at a time and also omit the back-slash. I find that if this works the issue was in an accidental insertion by the tools being used to copy paste. The issue seems to be tied to some version of word pad on Microsoft as well as some notepad options on macs.

    Regards,

  • @chrispokorni I bootstrapped the both Master and worker by killing the Instance group on GCP and then re-deployed.

    @serewicz Thanks for pointing that out. I did try the lab from scratch and tried my best not to do mistake while coping the commands. For bigger syntax, I copied them on the node using vim. Not sure what else I can try...

  • Posts: 1,000

    Hmm. Okay. I will put together a quick video of the installation of the master node and the worker. It will take me a bit. Perhaps once you see it work for me you can tell what we are doing differently.

    Would that be helpful?

    Regards,

  • @serewicz That would be great. In the meantime, I'll give it a try again and keep troubleshooting. Will keep the thread updated. Thanks again for all the help!

  • @serewicz I tried again from scratch. This time I typed all commands making sure I didn't miss anything. While joining the cluster, I enabled verbose logging and found following traces. Not sure if it points to anywhere.

    1. root@lfs258-ateiv-vnc1:~# kubeadm join k8smaster:6443 --token dow8xr.t7jyriiot6rezfyr --discovery-token-ca-cert-hash sha256:e65cf99c97ccd86a612de073b0912851ed56ce9443e3fd204145bcaaea44ee2c --v=5
    2. I0109 22:00:27.212537 15476 join.go:363] [preflight] found NodeName empty; using OS hostname as NodeName
    3. I0109 22:00:27.212638 15476 initconfiguration.go:102] detected and using CRI socket: /var/run/dockershim.sock
    4. [preflight] Running pre-flight checks
    5. I0109 22:00:27.212732 15476 preflight.go:90] [preflight] Running general checks
    6. [preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH
    7. I0109 22:00:27.212878 15476 checks.go:250] validating the existence and emptiness of directory /etc/kubernetes/manifests
    8. I0109 22:00:27.212946 15476 checks.go:287] validating the existence of file /etc/kubernetes/kubelet.conf
    9. I0109 22:00:27.212963 15476 checks.go:287] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
    10. I0109 22:00:27.212975 15476 checks.go:377] validating the presence of executable crictl
    11. I0109 22:00:27.213011 15476 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
    12. I0109 22:00:27.213047 15476 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward
    13. I0109 22:00:27.213105 15476 checks.go:650] validating whether swap is enabled or not
    14. I0109 22:00:27.213142 15476 checks.go:377] validating the presence of executable ip
    15. I0109 22:00:27.213174 15476 checks.go:377] validating the presence of executable iptables
    16. I0109 22:00:27.213202 15476 checks.go:377] validating the presence of executable mount
    17. I0109 22:00:27.213231 15476 checks.go:377] validating the presence of executable nsenter
    18. I0109 22:00:27.213254 15476 checks.go:377] validating the presence of executable ebtables
    19. I0109 22:00:27.213281 15476 checks.go:377] validating the presence of executable ethtool
    20. I0109 22:00:27.213308 15476 checks.go:377] validating the presence of executable socat
    21. I0109 22:00:27.213330 15476 checks.go:377] validating the presence of executable tc
    22. I0109 22:00:27.213357 15476 checks.go:377] validating the presence of executable touch
    23. I0109 22:00:27.213383 15476 checks.go:521] running all checks
    24. I0109 22:00:27.227485 15476 checks.go:407] checking whether the given node name is reachable using net.LookupHost
    25. I0109 22:00:27.227739 15476 checks.go:619] validating kubelet version
    26. I0109 22:00:27.291576 15476 checks.go:129] validating if the service is enabled and active
    27. I0109 22:00:27.299743 15476 checks.go:202] validating availability of port 10250
    28. I0109 22:00:27.299955 15476 checks.go:287] validating the existence of file /etc/kubernetes/pki/ca.crt
    29. I0109 22:00:27.299978 15476 checks.go:433] validating if the connectivity type is via proxy or direct
    30. [preflight] Some fatal errors occurred:
    31. [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
    32. [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
    33. [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    34. error execution phase preflight
    35. k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
    36. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:237
    37. k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
    38. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424
    39. k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    40. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209
    41. k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
    42. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169
    43. k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    44. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830
    45. k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    46. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
    47. k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    48. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
    49. k8s.io/kubernetes/cmd/kubeadm/app.Run
    50. /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
    51. main.main
    52. _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
    53. runtime.main
    54. /usr/local/go/src/runtime/proc.go:200
    55. runtime.goexit
    56. /usr/local/go/src/runtime/asm_amd64.s:1337
    57. root@lfs258-ateiv-vnc1:~#
  • Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz

  • @AteivJain said:
    Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz

    Hi @AteivJain , Where did you check for the DNS entry on the worker node?

  • Posts: 1,000

    Hello @amitraisharma

    The lab has edits to the /etc/hosts file. Check there.

    Regards,

  • What I stupidly did while completing the lab for LFS258, was running openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' on the worker VM instead of the master VM.

    Once I realized my mistake, I ran it again on the master VM and then used the generated hash in the kubeadm join command on the worker VM which then joined fine.

    Hope this helps someone :)

  • I had a similar issue while using calico for my local cluster setup (based on hyperkit). Replacing calico with Weave Net solved it for me.

  • Posts: 1,000

    Thank you for the feedback. Is hyperkit still something only for MacOS, or has it been ported other places?

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training