Lab2.2: kubeadm join failed because of connection refused

caishaoping · September 2022

Hi There,

I guess I need help to shed some light how to resolve the connection refused issue when I am trying to join work node via kubeadm command. Following is the error from this command:

vagrant@worker:~$ sudo kubeadm join 10.0.2.15:6443 --token yajnah.8v1n4d2ivgbo6hlx \

--discovery-token-ca-cert-hash sha256:84ba35a6760a1f74c9b1876fc34ce066e0c6c07e7d88890e3c24d23080519f09

[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 10.0.2.15:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher

I did not run following as I am seeing calico network pods already provisioned, if I need to run, how do I get the exact name of the .yaml?

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

LAN IPs for CP and Worker nodes:
172.16.0.100 CP-node, 172.16.0.102 Worker Node
I can ping CP node from Worker Node:
vagrant@worker:~$ ping 172.16.0.100
PING 172.16.0.100 (172.16.0.100) 56(84) bytes of data.
64 bytes from 172.16.0.100: icmp_seq=1 ttl=64 time=0.583 ms
CP node PODS info:

vagrant@cp:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6799f5f4b4-6xbkb 1/1 Running 0 48m
kube-system calico-node-trznz 1/1 Running 0 48m
kube-system coredns-6d4b75cb6d-jxmtz 1/1 Running 0 48m
kube-system coredns-6d4b75cb6d-k6cf8 1/1 Running 0 48m
kube-system etcd-cp 1/1 Running 0 48m
kube-system kube-apiserver-cp 1/1 Running 0 48m
kube-system kube-controller-manager-cp 1/1 Running 0 48m
kube-system kube-proxy-67mbw 1/1 Running 0 48m
kube-system kube-scheduler-cp 1/1 Running 0 48m

CP node IP info:
vagrant@cp:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:a2:6b:fd brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 79051sec preferred_lft 79051sec
inet6 fe80::a00:27ff:fea2:6bfd/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:13:16:91 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.100/24 brd 172.16.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe13:1691/64 scope link
valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.242.64/32 scope global tunl0
valid_lft forever preferred_lft forever
7: cali549db1682f5@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-4645a764-3f02-3264-4374-c7257cf21be1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: cali2d70479e511@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-71509c2a-11e2-3cd0-0ce6-098d2a3f091f
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
9: cali8bdaed7ef27@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-c7333fb0-3f78-f392-0a72-1d0159f59103
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
vagrant@cp:~$

Thanks
Shao

chrispokorni · September 2022

Hi @caishaoping,

Similar issues have already been reported and solutioned several times in the forum.

On VMs with multiple network interfaces, the control plane gets advertised on the default interface, in this case the one with IP 10.0.2.15. However, it seems that the intent may have been to use the 172.16.x.x private IP address. One solution would be to ensure your VMs only receive a single network interface each during provisioning, connected to a bridged network (promiscuous mode set to allow all). If both interfaces are needed, then the kubeadm init command from the k8scp.sh script file should include the --apiserver-advertise-address=CP-node-private-IP option.

The network plugin is installed part of the same k8scp.sh script, there is no need to manually install the plugin. I would recommend inspecting both script files k8scp.sh and k8sWorker.sh to understand what they are doing in terms of installation and configuration on each VM.

Regards,
-Chris

caishaoping · September 2022

Thanks.

I checked k8scp.sh, it has following, if I assign a 192.160 private address, will this also help avoid the issue?

Configure the cluster

sudo kubeadm init --pod-network-cidr=192.168.0.0/16

Regards

chrispokorni · September 2022

Hi @caishaoping,

Please ensure that there is no overlap between subnets of nodes, pods, and services.

By default services use 10.96.0.0/12 managed by the cluster, and the default pod subnet is 192.168.0.0/16 managed with the pod network plugin - calico. With that in mind, the desired nodes subnet should not overlap the services and pods subnets.

Regards,
-Chris

caishaoping · September 2022

Thanks @chrispokorni This did help my understand why VM's private IP should not be 192.168... After a few tries, I am not able to join my worker not to control-panel node:

vagrant@cp:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cp Ready control-plane 27m v1.24.1
worker Ready 5m34s v1.24.1
vagrant@cp:~$

On Control-panel node (CP node): after sudo kubeadm reset, i did following before "sudo kubeadm init ....", kind of followed the scripts in k8scp.sh file
sudo systemctl enable -now kubelet
sudo swapoff -a
..
sudo systemctl restart containerd
sudo systemctl enable containerd

sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.16.0.100

Then, got the new token for worker node to join:
sudo kubeadm token create --print-join-command

On worker node, it is simple,
sudo systemctl enable --now kubelet
sudo swapoff -a
sudo kubeadm reset
sudo kubeadm join .....

Thanks
Shao

caishaoping · September 2022

Hello Again @chrispokorni, want to take this thread further with one related question:,

Today, after restart of my host Windows machine, I started up two VMs (CP + Worker ndoes), but I am not able to connect to nodes via "kubectl" command of, like, "kubectl get nodes",

vagrant@cp:~$ kubectl get nodes
The connection to the server 172.16.0.100:6443 was refused - did you specify the right host or port?
vagrant@cp:~$

Given the steps mentioned in previous chat, which included the kubeadm init command like "sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.16.0.100"

Following is the message I am getting, if doing 'sudo systemctl status kubelet":

vagrant@cp:~$ sudo systemctl status kubelet

● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Sat 2022-09-10 01:53:56 UTC; 6s ago
Docs: https://kubernetes.io/docs/home/
Process: 11595 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILUR>
Main PID: 11595 (code=exited, status=1/FAILURE)

Sep 10 01:53:56 cp systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 01:53:56 cp systemd[1]: kubelet.service: Failed with result 'exit-code'.
lines 1-11/11 (END)

Where should I start the troubleshooting? Or did I miss any the earlier procedures of setting up cp+worker nodes?

Thanks
Shao

caishaoping · September 2022

Hi There, here is quick update to previous observation and question, I guess it might be the reason that my VM nodes are really slow in startup,

After a few minutes spending time going through past chats to get some ideas, then I tried again, my CP node is up and ready followed by worker node a couple of minutes later.

vagrant@cp:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cp Ready control-plane 29h v1.24.1
worker Ready 28h v1.24.1

My question is: I did do "sudo swapoff -a" on both VMs, not sure if this helps, With limited knowledge on linux admin, my question is: do I need to do "swapoff -a" every time after rebooting?

Thanks
Shao

caishaoping · September 2022

Hi There,
sorry for the question without thoughtful thinking , so let me follow up to conclude this thread:
when I start my VMs today, yes, swap is active, so I need to "swapoff -a", here is the check:

This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Sat Sep 10 01:42:55 2022 from 172.16.0.1

to check if swap is active, yes, it is actually active
vagrant@worker:~$ sudo swapon -s
Filename Type Size Used Priority
/swap.img file 1999868 0 -2
to disable swap
vagrant@worker:~$ sudo swapoff -a
to recheck if swap is off, yes, it is off now
vagrant@worker:~$ sudo swapon -s
vagrant@worker:~$
Furtherly, I sudo vim /etc/fstab and comment swap related lines like below:

/swap.img none swap sw 0 0

Reboot VMs, verified that now swap' disable survives reboot :

This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Sat Sep 10 16:12:07 2022 from 172.16.0.1

vagrant@cp:~$ sudo swapon -s
vagrant@cp:~$

vagrant@cp:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cp Ready control-plane 43h v1.24.1
worker Ready 43h v1.24.1

Happy Ending! Thanks to all!

Lab2.2: kubeadm join failed because of connection refused

Answers

Configure the cluster

/swap.img none swap sw 0 0

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)