Welcome to the Linux Foundation Forum!

Unsuccessful in joining the 2nd node with master with kubeadm join command

I was able to create the master node with k8sMaster.sh and can see it in my instance.

kubectl get node
NAME              STATUS   ROLES    AGE     VERSION
ubuntu-bionic-1   Ready    master   3d22h   v1.16.1

After executing the k8sSecond.sh, I am not able to successfully join the 2nd node with the same cluster as the master is. The command just hangs!

sudo kubeadm join 10.128.0.2:6443 --token <token i got from master.out> --discovery-token-ca-cert-hash sha256:<value i got from master.out>  --ignore-preflight-errors='all'
[preflight] Running pre-flight checks
    [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
    [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING Port-10250]: Port 10250 is in use
    [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists

Any help is appreciated.

Comments

  • I tried reseting master node and ran everything again. Steps I have followed
    1. sudo kubeadm reset
    2. bash k8sMaster.sh
    3. mkdir -p $HOME/.kube
    4. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    5. sudo chown $(id -u):$(id -g) $HOME/.kube/config
    6. kubectl apply -f calico.yaml
    7. bash k8sSecond.sh

    So after this, when I run the kubeadm join command, even though the command executed successfully, I don't see the node in the cluster!

    sudo kubeadm join 10.128.0.2:6443 --token ### --discovery-token-ca-cert-hash sha256:### --ignore-preflight-errors='all'
    [preflight] Running pre-flight checks
        [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
        [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
        [WARNING Port-10250]: Port 10250 is in use
        [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
    [preflight] Reading configuration from the cluster...
    [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
    [kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Activating the kubelet service
    [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
    
    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.
    
    Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
    
    username@ubuntu-bionic-1:~$ kubectl get nodes
    NAME              STATUS   ROLES    AGE     VERSION
    ubuntu-bionic-1   Ready    master   5m47s   v1.16.1
    
  • Hi,

    The first issue was caused by kubeadm being run sequentially on the same node. Also, keep in mind that the token issued by the master expires after a certain time, and it will prevent another host from utilizing it to join a cluster.

    On the second attempt, did you also run the rbac file for calico? Did you reset the worker node before running kubeadm join?

    Are your firewalls disabled and traffic allowed to all ports for all protocols?

    Are your nodes sized accordingly?

    Regards,
    -Chris

  • Yes your were right I created another VM instance and was able to create the minion node.

    I had to run the rbac file as well.

    I am now able to create the master and minion node and set them up in the cluster. Thanks!

  • Might be useful to someone. Use the calico.yaml from the official documentation - https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml

    If you use the one from the course there were so many rbac issues keep showing up!

  • Sorry posted the wrong link. Here is the latest version - https://docs.projectcalico.org/v3.11/manifests/calico.yaml

  • serewicz
    serewicz Posts: 1,000

    Did you apply the included rbac yaml file?

  • AteivJain
    AteivJain Posts: 7
    edited January 2020

    I have a similar issue. Created the master node successfully. When attempting to create minion node, kubeadm get stuck in precheck.

    1. Tried to restart Master--> din't help
    2. Created new kubeadm token since it had been more than 2 hours since I created the master node.

      student@lfd259-ateiv-htql:~$ sudo kubeadm token list
      TOKEN                     TTL       EXPIRES                USAGES                   DESCRIPTION   EXTRA GROUPS
      6vmwmi.wto27a2jk26xey22   13h       2020-01-09T16:38:07Z   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
      v150sj.mawz91pqhcd4h7ng   23h       2020-01-10T02:55:28Z   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
      

    On the minion node, it just gets stuck:

    root@lfd259-ateiv-tbjs:~# kubeadm join --token v150sj.mawz91pqhcd4h7ng k8smaster:6443 --discovery-token-ca-cert-hash sha256:9a51c55cda7ba19e173c93d9587b9aad8914d10b4ebb8749104a897b370960ef
    [preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    

    Found following traces in /var/log/syslog:

    Jan  9 03:37:37 lfd259-ateiv-tbjs kubelet[2883]: F0109 03:37:37.492973    2883 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /v
    ar/lib/kubelet/config.yaml: no such file or directory
    Jan  9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
    Jan  9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Unit entered failed state.
    Jan  9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Failed with result 'exit-code'.
    

    Any help would be great!

  • chrispokorni
    chrispokorni Posts: 2,349

    Hi @AteivJain,

    Is the /etc/hosts file configured correctly on your worker node?

    -Chris

  • Hi @chrispokorni

    I configured it correctly based on the lab. It's the same configured on the master node:

        root@lfs258-example-td92:~# cat /etc/hosts
        127.0.0.1 localhost
    
        # The following lines are desirable for IPv6 capable hosts
        ::1 ip6-localhost ip6-loopback
        fe00::0 ip6-localnet
        ff00::0 ip6-mcastprefix
        ff02::1 ip6-allnodes
        ff02::2 ip6-allrouters
        ff02::3 ip6-allhosts
        169.254.169.254 metadata.google.internal metadata
        10.168.0.3 k8smaster
        10.168.0.3 lfs258-example-td92.us-west2-c.c.astral-reef-264415.internal lfs258-example-td92  # Added by Google
        169.254.169.254 metadata.google.internal  # Added by Google
    

    I even tried to nuke the worker node, but still getting the same error.

  • gfalasca
    gfalasca Posts: 8
    edited January 2020

    From your syslog ... /var/lib/kubelet/config.yaml: no such file or directory
    Maybe you tried already regenerating the token on master via sudo kubeadm token create --print-join-command
    and re-joining from the worker

  • Hi @gfalasca

    The syslog output is from an older deployment which I tried when I first posted here. I deployed the master again from scratch and then tried to join worker node. This time I didn't create a new token as it was within 2 hours. I still see the same error in syslog:

    Jan  9 18:35:43 lfs258-example-td92 kubelet[1157]: F0109 18:35:43.594741    1157 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
    Jan  9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
    Jan  9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Unit entered failed state.
    Jan  9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Failed with result 'exit-code'.
    Jan  9 18:35:53 lfs258-example-td92 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
    Jan  9 18:35:53 lfs258-example-td92 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
    Jan  9 18:35:53 lfs258-example-td92 systemd[1]: Started kubelet: The Kubernetes Node Agent.
    

    From Master node:

        student@lfs258-example-d5pm:~$ sudo kubeadm token list
        TOKEN                     TTL       EXPIRES                USAGES                   DESCRIPTION   EXTRA GROUPS
        ytkmwy.ksw6bk75zhngzb4c   10h       2020-01-10T05:16:14Z   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
    
    Output of kubeadm init on master node:
    
    You can now join any number of the control-plane node running the following command on each as root:
    
      kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \
        --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405 \
        --control-plane --certificate-key 96de61174ccf620108892e24665f8b5053346634693c22a5e9f82ff1d311b66b
    
    Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
    As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use 
    "kubeadm init phase upload-certs --upload-certs" to reload certs afterward.
    
    Then you can join any number of worker nodes by running the following on each as root:
    
    kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \
        --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405 
    
  • chrispokorni
    chrispokorni Posts: 2,349

    Hi @AteivJain,

    If you only bootstrapped the master from scratch, and not the worker, then after every unsuccessful sudo kubeadm join ... command a sudo kubeadm reset is required on the worker node to clear the partially configured environment before the next sudo kubeadm join ....

    Regards,
    -Chris

  • serewicz
    serewicz Posts: 1,000

    Hello,

    When troubleshooting it can be helpful to see the command you typed as well as the output. One note, I have encountered some folks who copy then paste the join command, but insert it into a word pad first have issues as there is an extra carriage return inserted.

    Starting with the fresh instances. Follow the labs as written. When it comes time to copy and paste be careful to select one line at a time and also omit the back-slash. I find that if this works the issue was in an accidental insertion by the tools being used to copy paste. The issue seems to be tied to some version of word pad on Microsoft as well as some notepad options on macs.

    Regards,

  • @chrispokorni I bootstrapped the both Master and worker by killing the Instance group on GCP and then re-deployed.

    @serewicz Thanks for pointing that out. I did try the lab from scratch and tried my best not to do mistake while coping the commands. For bigger syntax, I copied them on the node using vim. Not sure what else I can try...

  • serewicz
    serewicz Posts: 1,000

    Hmm. Okay. I will put together a quick video of the installation of the master node and the worker. It will take me a bit. Perhaps once you see it work for me you can tell what we are doing differently.

    Would that be helpful?

    Regards,

  • @serewicz That would be great. In the meantime, I'll give it a try again and keep troubleshooting. Will keep the thread updated. Thanks again for all the help!

  • @serewicz I tried again from scratch. This time I typed all commands making sure I didn't miss anything. While joining the cluster, I enabled verbose logging and found following traces. Not sure if it points to anywhere.

    root@lfs258-ateiv-vnc1:~# kubeadm join k8smaster:6443 --token dow8xr.t7jyriiot6rezfyr --discovery-token-ca-cert-hash sha256:e65cf99c97ccd86a612de073b0912851ed56ce9443e3fd204145bcaaea44ee2c --v=5
    I0109 22:00:27.212537   15476 join.go:363] [preflight] found NodeName empty; using OS hostname as NodeName
    I0109 22:00:27.212638   15476 initconfiguration.go:102] detected and using CRI socket: /var/run/dockershim.sock
    [preflight] Running pre-flight checks
    I0109 22:00:27.212732   15476 preflight.go:90] [preflight] Running general checks
    [preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH
    I0109 22:00:27.212878   15476 checks.go:250] validating the existence and emptiness of directory /etc/kubernetes/manifests
    I0109 22:00:27.212946   15476 checks.go:287] validating the existence of file /etc/kubernetes/kubelet.conf
    I0109 22:00:27.212963   15476 checks.go:287] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
    I0109 22:00:27.212975   15476 checks.go:377] validating the presence of executable crictl
    I0109 22:00:27.213011   15476 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
    I0109 22:00:27.213047   15476 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward
    I0109 22:00:27.213105   15476 checks.go:650] validating whether swap is enabled or not
    I0109 22:00:27.213142   15476 checks.go:377] validating the presence of executable ip
    I0109 22:00:27.213174   15476 checks.go:377] validating the presence of executable iptables
    I0109 22:00:27.213202   15476 checks.go:377] validating the presence of executable mount
    I0109 22:00:27.213231   15476 checks.go:377] validating the presence of executable nsenter
    I0109 22:00:27.213254   15476 checks.go:377] validating the presence of executable ebtables
    I0109 22:00:27.213281   15476 checks.go:377] validating the presence of executable ethtool
    I0109 22:00:27.213308   15476 checks.go:377] validating the presence of executable socat
    I0109 22:00:27.213330   15476 checks.go:377] validating the presence of executable tc
    I0109 22:00:27.213357   15476 checks.go:377] validating the presence of executable touch
    I0109 22:00:27.213383   15476 checks.go:521] running all checks
    I0109 22:00:27.227485   15476 checks.go:407] checking whether the given node name is reachable using net.LookupHost
    I0109 22:00:27.227739   15476 checks.go:619] validating kubelet version
    I0109 22:00:27.291576   15476 checks.go:129] validating if the service is enabled and active
    I0109 22:00:27.299743   15476 checks.go:202] validating availability of port 10250
    I0109 22:00:27.299955   15476 checks.go:287] validating the existence of file /etc/kubernetes/pki/ca.crt
    I0109 22:00:27.299978   15476 checks.go:433] validating if the connectivity type is via proxy or direct
    [preflight] Some fatal errors occurred:
        [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
        [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
    [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    error execution phase preflight
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:237
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424
    k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209
    k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169
    k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830
    k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
    k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
    k8s.io/kubernetes/cmd/kubeadm/app.Run
        /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
    main.main
        _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
    runtime.main
        /usr/local/go/src/runtime/proc.go:200
    runtime.goexit
        /usr/local/go/src/runtime/asm_amd64.s:1337
    root@lfs258-ateiv-vnc1:~# 
    
  • Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz

  • @AteivJain said:
    Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz

    Hi @AteivJain , Where did you check for the DNS entry on the worker node?

  • serewicz
    serewicz Posts: 1,000

    Hello @amitraisharma

    The lab has edits to the /etc/hosts file. Check there.

    Regards,

  • What I stupidly did while completing the lab for LFS258, was running openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' on the worker VM instead of the master VM.

    Once I realized my mistake, I ran it again on the master VM and then used the generated hash in the kubeadm join command on the worker VM which then joined fine.

    Hope this helps someone :)

  • I had a similar issue while using calico for my local cluster setup (based on hyperkit). Replacing calico with Weave Net solved it for me.

  • serewicz
    serewicz Posts: 1,000

    Thank you for the feedback. Is hyperkit still something only for MacOS, or has it been ported other places?

Categories

Upcoming Training