Lab2.2: kubeadm join failed because of connection refused
Hi There,
I guess I need help to shed some light how to resolve the connection refused issue when I am trying to join work node via kubeadm command. Following is the error from this command:
vagrant@worker:~$ sudo kubeadm join 10.0.2.15:6443 --token yajnah.8v1n4d2ivgbo6hlx \
--discovery-token-ca-cert-hash sha256:84ba35a6760a1f74c9b1876fc34ce066e0c6c07e7d88890e3c24d23080519f09
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 10.0.2.15:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher
- I did not run following as I am seeing calico network pods already provisioned, if I need to run, how do I get the exact name of the .yaml?
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
LAN IPs for CP and Worker nodes:
172.16.0.100 CP-node, 172.16.0.102 Worker NodeI can ping CP node from Worker Node:
vagrant@worker:~$ ping 172.16.0.100
PING 172.16.0.100 (172.16.0.100) 56(84) bytes of data.
64 bytes from 172.16.0.100: icmp_seq=1 ttl=64 time=0.583 msCP node PODS info:
vagrant@cp:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-6799f5f4b4-6xbkb 1/1 Running 0 48m
kube-system calico-node-trznz 1/1 Running 0 48m
kube-system coredns-6d4b75cb6d-jxmtz 1/1 Running 0 48m
kube-system coredns-6d4b75cb6d-k6cf8 1/1 Running 0 48m
kube-system etcd-cp 1/1 Running 0 48m
kube-system kube-apiserver-cp 1/1 Running 0 48m
kube-system kube-controller-manager-cp 1/1 Running 0 48m
kube-system kube-proxy-67mbw 1/1 Running 0 48m
kube-system kube-scheduler-cp 1/1 Running 0 48m
- CP node IP info:
vagrant@cp:~$ ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:a2:6b:fd brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15/24 brd 10.0.2.255 scope global dynamic eth0
valid_lft 79051sec preferred_lft 79051sec
inet6 fe80::a00:27ff:fea2:6bfd/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:13:16:91 brd ff:ff:ff:ff:ff:ff
inet 172.16.0.100/24 brd 172.16.0.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::a00:27ff:fe13:1691/64 scope link
valid_lft forever preferred_lft forever
4: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.242.64/32 scope global tunl0
valid_lft forever preferred_lft forever
7: cali549db1682f5@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-4645a764-3f02-3264-4374-c7257cf21be1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: cali2d70479e511@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-71509c2a-11e2-3cd0-0ce6-098d2a3f091f
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
9: cali8bdaed7ef27@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-c7333fb0-3f78-f392-0a72-1d0159f59103
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
vagrant@cp:~$
Thanks
Shao
Answers
-
Hi @caishaoping,
Similar issues have already been reported and solutioned several times in the forum.
On VMs with multiple network interfaces, the control plane gets advertised on the default interface, in this case the one with IP 10.0.2.15. However, it seems that the intent may have been to use the 172.16.x.x private IP address. One solution would be to ensure your VMs only receive a single network interface each during provisioning, connected to a bridged network (promiscuous mode set to allow all). If both interfaces are needed, then the
kubeadm initcommand from the k8scp.sh script file should include the--apiserver-advertise-address=CP-node-private-IPoption.The network plugin is installed part of the same k8scp.sh script, there is no need to manually install the plugin. I would recommend inspecting both script files k8scp.sh and k8sWorker.sh to understand what they are doing in terms of installation and configuration on each VM.
Regards,
-Chris0 -
Thanks.
I checked k8scp.sh, it has following, if I assign a 192.160 private address, will this also help avoid the issue?
Configure the cluster
sudo kubeadm init --pod-network-cidr=192.168.0.0/16
Regards
0 -
Hi @caishaoping,
Please ensure that there is no overlap between subnets of nodes, pods, and services.
By default services use 10.96.0.0/12 managed by the cluster, and the default pod subnet is 192.168.0.0/16 managed with the pod network plugin - calico. With that in mind, the desired nodes subnet should not overlap the services and pods subnets.
Regards,
-Chris1 -
Thanks @chrispokorni This did help my understand why VM's private IP should not be 192.168... After a few tries, I am not able to join my worker not to control-panel node:
vagrant@cp:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cp Ready control-plane 27m v1.24.1
worker Ready 5m34s v1.24.1
vagrant@cp:~$- On Control-panel node (CP node): after sudo kubeadm reset, i did following before "sudo kubeadm init ....", kind of followed the scripts in k8scp.sh file
sudo systemctl enable -now kubelet
sudo swapoff -a
..
sudo systemctl restart containerd
sudo systemctl enable containerd
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.16.0.100
Then, got the new token for worker node to join:
sudo kubeadm token create --print-join-command- On worker node, it is simple,
sudo systemctl enable --now kubelet
sudo swapoff -a
sudo kubeadm reset
sudo kubeadm join .....
Thanks
Shao0 - On Control-panel node (CP node): after sudo kubeadm reset, i did following before "sudo kubeadm init ....", kind of followed the scripts in k8scp.sh file
-
Hello Again @chrispokorni, want to take this thread further with one related question:,
Today, after restart of my host Windows machine, I started up two VMs (CP + Worker ndoes), but I am not able to connect to nodes via "kubectl" command of, like, "kubectl get nodes",
vagrant@cp:~$ kubectl get nodes
The connection to the server 172.16.0.100:6443 was refused - did you specify the right host or port?
vagrant@cp:~$Given the steps mentioned in previous chat, which included the kubeadm init command like "sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address=172.16.0.100"
Following is the message I am getting, if doing 'sudo systemctl status kubelet":
vagrant@cp:~$ sudo systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since Sat 2022-09-10 01:53:56 UTC; 6s ago
Docs: https://kubernetes.io/docs/home/
Process: 11595 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILUR>
Main PID: 11595 (code=exited, status=1/FAILURE)Sep 10 01:53:56 cp systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Sep 10 01:53:56 cp systemd[1]: kubelet.service: Failed with result 'exit-code'.
lines 1-11/11 (END)Where should I start the troubleshooting? Or did I miss any the earlier procedures of setting up cp+worker nodes?
Thanks
Shao0 -
Hi There, here is quick update to previous observation and question, I guess it might be the reason that my VM nodes are really slow in startup,
After a few minutes spending time going through past chats to get some ideas, then I tried again, my CP node is up and ready followed by worker node a couple of minutes later.
vagrant@cp:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cp Ready control-plane 29h v1.24.1
worker Ready 28h v1.24.1My question is: I did do "sudo swapoff -a" on both VMs, not sure if this helps, With limited knowledge on linux admin, my question is: do I need to do "swapoff -a" every time after rebooting?
Thanks
Shao0 -
Hi There,
sorry for the question without thoughtful thinking
, so let me follow up to conclude this thread:
when I start my VMs today, yes, swap is active, so I need to "swapoff -a", here is the check:This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Sat Sep 10 01:42:55 2022 from 172.16.0.1to check if swap is active, yes, it is actually active
vagrant@worker:~$ sudo swapon -s
Filename Type Size Used Priority
/swap.img file 1999868 0 -2to disable swap
vagrant@worker:~$ sudo swapoff -ato recheck if swap is off, yes, it is off now
vagrant@worker:~$ sudo swapon -s
vagrant@worker:~$Furtherly, I sudo vim /etc/fstab and comment swap related lines like below:
/swap.img none swap sw 0 0
- Reboot VMs, verified that now swap' disable survives reboot :
This system is built by the Bento project by Chef Software
More information can be found at https://github.com/chef/bento
Last login: Sat Sep 10 16:12:07 2022 from 172.16.0.1vagrant@cp:~$ sudo swapon -s
vagrant@cp:~$vagrant@cp:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cp Ready control-plane 43h v1.24.1
worker Ready 43h v1.24.1Happy Ending! Thanks to all!
0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 754 Linux Foundation IT Professional Programs
- 374 Cloud Engineer IT Professional Program
- 170 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 5 DevOps & GitOps IT Professional Program
- 100 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 2 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 5 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 2 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 794 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 89 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 112 Mobile Computing
- 20 Android
- 77 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 393 Off Topic
- 121 Introductions
- 182 Small Talk
- 29 Study Material
- 977 Programming and Development
- 310 Kernel Development
- 649 Software Development
- 990 Software
- 382 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)