Unsuccessful in joining the 2nd node with master with kubeadm join command

seetha33tmr · December 2019

I was able to create the master node with k8sMaster.sh and can see it in my instance.

kubectl get node
NAME              STATUS   ROLES    AGE     VERSION
ubuntu-bionic-1   Ready    master   3d22h   v1.16.1

After executing the k8sSecond.sh, I am not able to successfully join the 2nd node with the same cluster as the master is. The command just hangs!

sudo kubeadm join 10.128.0.2:6443 --token <token i got from master.out> --discovery-token-ca-cert-hash sha256:<value i got from master.out>  --ignore-preflight-errors='all'
[preflight] Running pre-flight checks
    [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
    [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING Port-10250]: Port 10250 is in use
    [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists

Any help is appreciated.

seetha33tmr · December 2019

I tried reseting master node and ran everything again. Steps I have followed
1. sudo kubeadm reset
2. bash k8sMaster.sh
3. mkdir -p $HOME/.kube
4. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
5. sudo chown $(id -u):$(id -g) $HOME/.kube/config
6. kubectl apply -f calico.yaml
7. bash k8sSecond.sh

So after this, when I run the kubeadm join command, even though the command executed successfully, I don't see the node in the cluster!

sudo kubeadm join 10.128.0.2:6443 --token ### --discovery-token-ca-cert-hash sha256:### --ignore-preflight-errors='all'
[preflight] Running pre-flight checks
    [WARNING DirAvailable--etc-kubernetes-manifests]: /etc/kubernetes/manifests is not empty
    [WARNING FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [WARNING Port-10250]: Port 10250 is in use
    [WARNING FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

username@ubuntu-bionic-1:~$ kubectl get nodes
NAME              STATUS   ROLES    AGE     VERSION
ubuntu-bionic-1   Ready    master   5m47s   v1.16.1

chrispokorni · December 2019

Hi,

The first issue was caused by kubeadm being run sequentially on the same node. Also, keep in mind that the token issued by the master expires after a certain time, and it will prevent another host from utilizing it to join a cluster.

On the second attempt, did you also run the rbac file for calico? Did you reset the worker node before running kubeadm join?

Are your firewalls disabled and traffic allowed to all ports for all protocols?

Are your nodes sized accordingly?

Regards,
-Chris

seetha33tmr · December 2019

Yes your were right I created another VM instance and was able to create the minion node.

I had to run the rbac file as well.

I am now able to create the master and minion node and set them up in the cluster. Thanks!

seetha33tmr · December 2019

Might be useful to someone. Use the calico.yaml from the official documentation - https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml

If you use the one from the course there were so many rbac issues keep showing up!

seetha33tmr · December 2019

Sorry posted the wrong link. Here is the latest version - https://docs.projectcalico.org/v3.11/manifests/calico.yaml

serewicz · December 2019

Did you apply the included rbac yaml file?

AteivJain · January 2020

I have a similar issue. Created the master node successfully. When attempting to create minion node, kubeadm get stuck in precheck.

Tried to restart Master--> din't help

Created new kubeadm token since it had been more than 2 hours since I created the master node.

student@lfd259-ateiv-htql:~$ sudo kubeadm token list
TOKEN                     TTL       EXPIRES                USAGES                   DESCRIPTION   EXTRA GROUPS
6vmwmi.wto27a2jk26xey22   13h       2020-01-09T16:38:07Z   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token
v150sj.mawz91pqhcd4h7ng   23h       2020-01-10T02:55:28Z   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token

On the minion node, it just gets stuck:

root@lfd259-ateiv-tbjs:~# kubeadm join --token v150sj.mawz91pqhcd4h7ng k8smaster:6443 --discovery-token-ca-cert-hash sha256:9a51c55cda7ba19e173c93d9587b9aad8914d10b4ebb8749104a897b370960ef
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/

Found following traces in /var/log/syslog:

Jan  9 03:37:37 lfd259-ateiv-tbjs kubelet[2883]: F0109 03:37:37.492973    2883 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /v
ar/lib/kubelet/config.yaml: no such file or directory
Jan  9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan  9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Unit entered failed state.
Jan  9 03:37:37 lfd259-ateiv-tbjs systemd[1]: kubelet.service: Failed with result 'exit-code'.

Any help would be great!

chrispokorni · January 2020

Hi @AteivJain,

Is the /etc/hosts file configured correctly on your worker node?

-Chris

AteivJain · January 2020

Hi @chrispokorni

I configured it correctly based on the lab. It's the same configured on the master node:

    root@lfs258-example-td92:~# cat /etc/hosts
    127.0.0.1 localhost

    # The following lines are desirable for IPv6 capable hosts
    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts
    169.254.169.254 metadata.google.internal metadata
    10.168.0.3 k8smaster
    10.168.0.3 lfs258-example-td92.us-west2-c.c.astral-reef-264415.internal lfs258-example-td92  # Added by Google
    169.254.169.254 metadata.google.internal  # Added by Google

I even tried to nuke the worker node, but still getting the same error.

gfalasca · January 2020

From your syslog ... /var/lib/kubelet/config.yaml: no such file or directory
Maybe you tried already regenerating the token on master via sudo kubeadm token create --print-join-command
and re-joining from the worker

AteivJain · January 2020

Hi @gfalasca

The syslog output is from an older deployment which I tried when I first posted here. I deployed the master again from scratch and then tried to join worker node. This time I didn't create a new token as it was within 2 hours. I still see the same error in syslog:

Jan  9 18:35:43 lfs258-example-td92 kubelet[1157]: F0109 18:35:43.594741    1157 server.go:196] failed to load Kubelet config file /var/lib/kubelet/config.yaml, error failed to read kubelet config file "/var/lib/kubelet/config.yaml", error: open /var/lib/kubelet/config.yaml: no such file or directory
Jan  9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan  9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Unit entered failed state.
Jan  9 18:35:43 lfs258-example-td92 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan  9 18:35:53 lfs258-example-td92 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan  9 18:35:53 lfs258-example-td92 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan  9 18:35:53 lfs258-example-td92 systemd[1]: Started kubelet: The Kubernetes Node Agent.

From Master node:

    student@lfs258-example-d5pm:~$ sudo kubeadm token list
    TOKEN                     TTL       EXPIRES                USAGES                   DESCRIPTION   EXTRA GROUPS
    ytkmwy.ksw6bk75zhngzb4c   10h       2020-01-10T05:16:14Z   authentication,signing   <none>        system:bootstrappers:kubeadm:default-node-token

Output of kubeadm init on master node:

You can now join any number of the control-plane node running the following command on each as root:

  kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \
    --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405 \
    --control-plane --certificate-key 96de61174ccf620108892e24665f8b5053346634693c22a5e9f82ff1d311b66b

Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use 
"kubeadm init phase upload-certs --upload-certs" to reload certs afterward.

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join k8smaster:6443 --token ytkmwy.ksw6bk75zhngzb4c \
    --discovery-token-ca-cert-hash sha256:a2e823cce22278d250b8d9c3d6adc51fcbbe8193604664b26f0a97b01d4ee405

chrispokorni · January 2020

Hi @AteivJain,

If you only bootstrapped the master from scratch, and not the worker, then after every unsuccessful sudo kubeadm join ... command a sudo kubeadm reset is required on the worker node to clear the partially configured environment before the next sudo kubeadm join ....

Regards,
-Chris

serewicz · January 2020

Hello,

When troubleshooting it can be helpful to see the command you typed as well as the output. One note, I have encountered some folks who copy then paste the join command, but insert it into a word pad first have issues as there is an extra carriage return inserted.

Starting with the fresh instances. Follow the labs as written. When it comes time to copy and paste be careful to select one line at a time and also omit the back-slash. I find that if this works the issue was in an accidental insertion by the tools being used to copy paste. The issue seems to be tied to some version of word pad on Microsoft as well as some notepad options on macs.

Regards,

AteivJain · January 2020

@chrispokorni I bootstrapped the both Master and worker by killing the Instance group on GCP and then re-deployed.

@serewicz Thanks for pointing that out. I did try the lab from scratch and tried my best not to do mistake while coping the commands. For bigger syntax, I copied them on the node using vim. Not sure what else I can try...

serewicz · January 2020

Hmm. Okay. I will put together a quick video of the installation of the master node and the worker. It will take me a bit. Perhaps once you see it work for me you can tell what we are doing differently.

Would that be helpful?

Regards,

AteivJain · January 2020

@serewicz That would be great. In the meantime, I'll give it a try again and keep troubleshooting. Will keep the thread updated. Thanks again for all the help!

AteivJain · January 2020

@serewicz I tried again from scratch. This time I typed all commands making sure I didn't miss anything. While joining the cluster, I enabled verbose logging and found following traces. Not sure if it points to anywhere.

root@lfs258-ateiv-vnc1:~# kubeadm join k8smaster:6443 --token dow8xr.t7jyriiot6rezfyr --discovery-token-ca-cert-hash sha256:e65cf99c97ccd86a612de073b0912851ed56ce9443e3fd204145bcaaea44ee2c --v=5
I0109 22:00:27.212537   15476 join.go:363] [preflight] found NodeName empty; using OS hostname as NodeName
I0109 22:00:27.212638   15476 initconfiguration.go:102] detected and using CRI socket: /var/run/dockershim.sock
[preflight] Running pre-flight checks
I0109 22:00:27.212732   15476 preflight.go:90] [preflight] Running general checks
[preflight] WARNING: Couldn't create the interface used for talking to the container runtime: docker is required for container runtime: exec: "docker": executable file not found in $PATH
I0109 22:00:27.212878   15476 checks.go:250] validating the existence and emptiness of directory /etc/kubernetes/manifests
I0109 22:00:27.212946   15476 checks.go:287] validating the existence of file /etc/kubernetes/kubelet.conf
I0109 22:00:27.212963   15476 checks.go:287] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0109 22:00:27.212975   15476 checks.go:377] validating the presence of executable crictl
I0109 22:00:27.213011   15476 checks.go:336] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0109 22:00:27.213047   15476 checks.go:336] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0109 22:00:27.213105   15476 checks.go:650] validating whether swap is enabled or not
I0109 22:00:27.213142   15476 checks.go:377] validating the presence of executable ip
I0109 22:00:27.213174   15476 checks.go:377] validating the presence of executable iptables
I0109 22:00:27.213202   15476 checks.go:377] validating the presence of executable mount
I0109 22:00:27.213231   15476 checks.go:377] validating the presence of executable nsenter
I0109 22:00:27.213254   15476 checks.go:377] validating the presence of executable ebtables
I0109 22:00:27.213281   15476 checks.go:377] validating the presence of executable ethtool
I0109 22:00:27.213308   15476 checks.go:377] validating the presence of executable socat
I0109 22:00:27.213330   15476 checks.go:377] validating the presence of executable tc
I0109 22:00:27.213357   15476 checks.go:377] validating the presence of executable touch
I0109 22:00:27.213383   15476 checks.go:521] running all checks
I0109 22:00:27.227485   15476 checks.go:407] checking whether the given node name is reachable using net.LookupHost
I0109 22:00:27.227739   15476 checks.go:619] validating kubelet version
I0109 22:00:27.291576   15476 checks.go:129] validating if the service is enabled and active
I0109 22:00:27.299743   15476 checks.go:202] validating availability of port 10250
I0109 22:00:27.299955   15476 checks.go:287] validating the existence of file /etc/kubernetes/pki/ca.crt
I0109 22:00:27.299978   15476 checks.go:433] validating if the connectivity type is via proxy or direct
[preflight] Some fatal errors occurred:
    [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist
    [ERROR FileContent--proc-sys-net-ipv4-ip_forward]: /proc/sys/net/ipv4/ip_forward contents are not set to 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
error execution phase preflight
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run.func1
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:237
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).visitAll
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:424
k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow.(*Runner).Run
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/phases/workflow/runner.go:209
k8s.io/kubernetes/cmd/kubeadm/app/cmd.NewCmdJoin.func1
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/cmd/join.go:169
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:830
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:914
k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:864
k8s.io/kubernetes/cmd/kubeadm/app.Run
    /workspace/anago-v1.16.1-beta.0.37+d647ddbd755faf/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/app/kubeadm.go:50
main.main
    _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubeadm/kubeadm.go:25
runtime.main
    /usr/local/go/src/runtime/proc.go:200
runtime.goexit
    /usr/local/go/src/runtime/asm_amd64.s:1337
root@lfs258-ateiv-vnc1:~#

AteivJain · January 2020

Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz

amitraisharma · January 2020

@AteivJain said:
Found the error. I had the wrong DNS entry on worker node. Thanks for all the input. @serewicz

Hi @AteivJain , Where did you check for the DNS entry on the worker node?

serewicz · January 2020

Hello @amitraisharma

The lab has edits to the /etc/hosts file. Check there.

Regards,

itsconquest · February 2020

What I stupidly did while completing the lab for LFS258, was running openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //' on the worker VM instead of the master VM.

Once I realized my mistake, I ran it again on the master VM and then used the generated hash in the kubeadm join command on the worker VM which then joined fine.

Hope this helps someone

wolfgangbecker · July 2020

I had a similar issue while using calico for my local cluster setup (based on hyperkit). Replacing calico with Weave Net solved it for me.

serewicz · July 2020

Thank you for the feedback. Is hyperkit still something only for MacOS, or has it been ported other places?

Unsuccessful in joining the 2nd node with master with kubeadm join command

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)