AWS: unable to add worker node

mohachan · October 2025

Hi,

Initially, sudo kubeadm join was hanging. I followed the resolution mentioned in similar earlier issue. However, the command is still failing
The resolution suggested by @chrispokorni mentions "You probably missed a step in the lab exercise". I searched through the lab exercise 2 for any references to "/etc/hosts" and there were none. Hence, a) there are auxiliary instructions that are not listed; b) the instructions are incomplete; c) I am careless in not finding the particular step to update the hosts file. Can you direct me to the exact location where updating the hosts file is mentioned.
Using the aforementioned issue, I updated the hosts file on both the cp and worker nodes: (only showing cp hosts file; worker hosts file is identical.)

cp:/home/ubuntu/LFD259/SOLUTIONS/s_02
$ cat /etc/hosts
127.0.0.1 localhost

# The following lines are desirable for IPv6 capable hosts
::1 ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters
ff02::3 ip6-allhosts
172.31.53.130 k8scp

The IP address is derived from:

$ kc get pod -A  -o wide
NAMESPACE     NAME                              READY   STATUS    RESTARTS   AGE   IP              NODE   NOMINATED NODE   READINESS GATES
kube-system   cilium-577p7                      1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   cilium-envoy-brmjn                1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   cilium-operator-65ddcfbdc-bp7fp   1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   coredns-66bc5c9577-p6vhw          1/1     Running   0          49m   10.0.0.228      cp     <none>           <none>
kube-system   coredns-66bc5c9577-rsbq6          1/1     Running   0          49m   10.0.0.198      cp     <none>           <none>
kube-system   etcd-cp                           1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   kube-apiserver-cp                 1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   kube-controller-manager-cp        1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   kube-proxy-wh7gh                  1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>
kube-system   kube-scheduler-cp                 1/1     Running   0          49m   172.31.53.130   cp     <none>           <none>

and

$ kc get node -o wide
NAME   STATUS   ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
cp     Ready    control-plane   50m   v1.34.1   172.31.53.130   <none>        Ubuntu 22.04.5 LTS   6.8.0-1035-aws   containerd://1.7.28

I reran the kubeadm join command and failed. If I include the output, I am blocked by VANILLA NETWORK. THIS IS PROVING TO BE QUITE FRUSTRATING ON MANY LEVELS.
I suspect the entry 172.31.53.130 k8scp is invalid. Where is k8scp being derived? I do not find it defined in the script nor is @chrispokorni had explained how to derive it in the earlier aforementioned issue.
If this is repeated issue, why isn't it being addressed by the k8scp and k8sworker scripts?

Thanks

mohachan · October 2025

Sorryyyyy. My fault. I used the incorrect securitygroup which didn't allow all inbound traffic.
The node is connected after fixing the sg.

cp:/home/ubuntu/LFD259/SOLUTIONS/s_02
$ kc get node -o wide
NAME     STATUS   ROLES           AGE    VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION   CONTAINER-RUNTIME
cp       Ready    control-plane   5m1s   v1.34.1   172.31.58.177   <none>        Ubuntu 22.04.5 LTS   6.8.0-1035-aws   containerd://1.7.28
worker   Ready    <none>          32s    v1.34.1   172.31.53.94    <none>        Ubuntu 22.04.5 LTS   6.8.0-1035-aws   containerd://1.7.28

If I may suggest an improvement to the instructions, add a step to ensure one can ping from one node to the other before running the k8s scripts.

Thanks!

chrispokorni · October 2025

Hi @mohachan,

Please watch the demo videos from the introductory chapter for critical details about the cloud infrastructure. Also, ensure the recommended guest OS release is running your cloud VMs.

In an ideal scenario, the learner follows the given instructions and the infrastructure works as expected. However, when learners deviate from instructions in ways that cannot be predicted by staff, then omit to fully describe such deviations while providing limited information about their scenario in their request for assistance, one can only assume what could be problem and resolution is suggested as such - aiming to guide the learner towards a working environment.

It is clear however, that not all deviations are the same, and a given solution may not fix all issues.

Regards,
-Chris

AWS: unable to add worker node

Best Answer

Answers

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)