Welcome to the Linux Foundation Forum!

master node not available after rebooting EC2

I stopped EC2 vms and after restarting master node is not available

ubuntu@ip-xxx-xxx-xx-xxx:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Tue 2019-01-08 19:12:13 UTC; 9min ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 1288 (kubelet)
    Tasks: 21
   Memory: 95.0M
      CPU: 1min 7.658s
   CGroup: /system.slice/kubelet.service
           ├─1288 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni
           └─4535 /opt/cni/bin/calico

Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.267101    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.432768    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.597614    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] calico.go 341: Extracted identifiers ContainerID="2af3731145389d511fb6c156e6fbcf5adb586d7290aa32f7d523ea80dceeb45b" Node="ip-xxx-xxx-xx-xxx" Orchestr
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] client.go 202: Loading config from environment
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.169439    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.334586    1288 azure_dd.go:147] failed to get azure cloud in GetVolumeLimits, plugin.host: ip-xxx-xxx-xx-xxx
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.824445    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.945939    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:05 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:05.071637    1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found

I assume it should be possible to reboot/stop&restart VMs.

Comments

  • chrispokorni
    chrispokorni Posts: 2,155

    Hi @crixo ,
    It has been a while since I worked on AWS, but I remember being able to stop instances and then start them back up when I was ready to continue with my labs.
    -Chris

  • chrispokorni
    chrispokorni Posts: 2,155

    @crixo
    I went thru lab 2.1 on 2 EC2 instances on AWS, and I had no trouble completing the lab, stopping then starting my instances. Aside from being assigned new public IPs, the node IPs remained the same, but the pod IPs have changed. I was also able to retest the service by accessing the nginx webserver via curl and browser.
    Can you provide any other details?
    Can you look into the bootstrap-config, kubelet-config or config files to see whether the master IP has changed?
    Thanks,
    -Chris

  • crixo
    crixo Posts: 31

    Hi @chrispokorni,
    I destroyed previous VM and after creating a new one I was able to reboot and continue to work w/ the cluster.
    I noticed VM went wild due to kswapd0 was using most of the cpu.
    Since the suggestion in the lab setup script to execute "sudo swapoff -a", I added into the aws VM the same command into
    /etc/rc.local to execute it after each reboot

  • serewicz
    serewicz Posts: 1,000

    AWS instances typically have swap disabled by default, the ones I've looked at and used at least. Perhaps something else? Did any of the containers restart? If there is a lot of activity and not enough resources - like when only one node is ready in a cluster - that the terminations of running containers because of OOM issues causes a stamped. After rebooting, the worker node is ready for the workload sharing and things work better.

    Glad its working now.

    Regards,

  • crixo
    crixo Posts: 31

    Hi @serewicz, Thanks a lot for clarifying. How do I check if the AWS instance has the swap disabled?
    If so, is it correct that kswapd0 is active, actually very active, among the process listed by top?

  • chrispokorni
    chrispokorni Posts: 2,155
    edited January 2019

    Hi @crixo,
    You can verify swap by running one of the following:

    cat /etc/fstab

    cat /proc/swaps

    swapon -s

    free -h

    There seems to be a known issue with kswapd0 using a lot of CPU and there are a few solutions posted online, but I have not tried either of them so I am not sure what works and what doesn't.

    Regards,
    -Chris

Categories

Upcoming Training