master node not available after rebooting EC2

crixo · January 2019

I stopped EC2 vms and after restarting master node is not available

ubuntu@ip-xxx-xxx-xx-xxx:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Tue 2019-01-08 19:12:13 UTC; 9min ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 1288 (kubelet)
    Tasks: 21
   Memory: 95.0M
      CPU: 1min 7.658s
   CGroup: /system.slice/kubelet.service
           ├─1288 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni
           └─4535 /opt/cni/bin/calico

Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.267101    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.432768    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.597614    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] calico.go 341: Extracted identifiers ContainerID="2af3731145389d511fb6c156e6fbcf5adb586d7290aa32f7d523ea80dceeb45b" Node="ip-xxx-xxx-xx-xxx" Orchestr
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] client.go 202: Loading config from environment
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.169439    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.334586    1288 azure_dd.go:147] failed to get azure cloud in GetVolumeLimits, plugin.host: ip-xxx-xxx-xx-xxx
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.824445    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.945939    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:05 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:05.071637    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found

I assume it should be possible to reboot/stop&restart VMs.

chrispokorni · January 2019

Hi @crixo ,
It has been a while since I worked on AWS, but I remember being able to stop instances and then start them back up when I was ready to continue with my labs.
-Chris

chrispokorni · January 2019

@crixo
I went thru lab 2.1 on 2 EC2 instances on AWS, and I had no trouble completing the lab, stopping then starting my instances. Aside from being assigned new public IPs, the node IPs remained the same, but the pod IPs have changed. I was also able to retest the service by accessing the nginx webserver via curl and browser.
Can you provide any other details?
Can you look into the bootstrap-config, kubelet-config or config files to see whether the master IP has changed?
Thanks,
-Chris

crixo · January 2019

Hi @chrispokorni,
I destroyed previous VM and after creating a new one I was able to reboot and continue to work w/ the cluster.
I noticed VM went wild due to kswapd0 was using most of the cpu.
Since the suggestion in the lab setup script to execute "sudo swapoff -a", I added into the aws VM the same command into
/etc/rc.local to execute it after each reboot

serewicz · January 2019

AWS instances typically have swap disabled by default, the ones I've looked at and used at least. Perhaps something else? Did any of the containers restart? If there is a lot of activity and not enough resources - like when only one node is ready in a cluster - that the terminations of running containers because of OOM issues causes a stamped. After rebooting, the worker node is ready for the workload sharing and things work better.

Glad its working now.

Regards,

crixo · January 2019

Hi @serewicz, Thanks a lot for clarifying. How do I check if the AWS instance has the swap disabled?
If so, is it correct that kswapd0 is active, actually very active, among the process listed by top?

chrispokorni · January 2019

Hi @crixo,
You can verify swap by running one of the following:

cat /etc/fstab

cat /proc/swaps

swapon -s

free -h

There seems to be a known issue with kswapd0 using a lot of CPU and there are a few solutions posted online, but I have not tried either of them so I am not sure what works and what doesn't.

Regards,
-Chris

master node not available after rebooting EC2

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)