Welcome to the Linux Foundation Forum!

master node not available after rebooting EC2

I stopped EC2 vms and after restarting master node is not available

  1. ubuntu@ip-xxx-xxx-xx-xxx:~$ systemctl status kubelet
  2. kubelet.service - kubelet: The Kubernetes Node Agent
  3. Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  4. Drop-In: /etc/systemd/system/kubelet.service.d
  5. └─10-kubeadm.conf
  6. Active: active (running) since Tue 2019-01-08 19:12:13 UTC; 9min ago
  7. Docs: https://kubernetes.io/docs/home/
  8. Main PID: 1288 (kubelet)
  9. Tasks: 21
  10. Memory: 95.0M
  11. CPU: 1min 7.658s
  12. CGroup: /system.slice/kubelet.service
  13. ├─1288 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni
  14. └─4535 /opt/cni/bin/calico
  15.  
  16. Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.267101 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
  17. Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.432768 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
  18. Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.597614 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
  19. Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] calico.go 341: Extracted identifiers ContainerID="2af3731145389d511fb6c156e6fbcf5adb586d7290aa32f7d523ea80dceeb45b" Node="ip-xxx-xxx-xx-xxx" Orchestr
  20. Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] client.go 202: Loading config from environment
  21. Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.169439 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
  22. Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.334586 1288 azure_dd.go:147] failed to get azure cloud in GetVolumeLimits, plugin.host: ip-xxx-xxx-xx-xxx
  23. Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.824445 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
  24. Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.945939 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
  25. Jan 08 19:22:05 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:05.071637 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found

I assume it should be possible to reboot/stop&restart VMs.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,453

    Hi @crixo ,
    It has been a while since I worked on AWS, but I remember being able to stop instances and then start them back up when I was ready to continue with my labs.
    -Chris

  • Posts: 2,453

    @crixo
    I went thru lab 2.1 on 2 EC2 instances on AWS, and I had no trouble completing the lab, stopping then starting my instances. Aside from being assigned new public IPs, the node IPs remained the same, but the pod IPs have changed. I was also able to retest the service by accessing the nginx webserver via curl and browser.
    Can you provide any other details?
    Can you look into the bootstrap-config, kubelet-config or config files to see whether the master IP has changed?
    Thanks,
    -Chris

  • Posts: 31

    Hi @chrispokorni,
    I destroyed previous VM and after creating a new one I was able to reboot and continue to work w/ the cluster.
    I noticed VM went wild due to kswapd0 was using most of the cpu.
    Since the suggestion in the lab setup script to execute "sudo swapoff -a", I added into the aws VM the same command into
    /etc/rc.local to execute it after each reboot

  • Posts: 1,000

    AWS instances typically have swap disabled by default, the ones I've looked at and used at least. Perhaps something else? Did any of the containers restart? If there is a lot of activity and not enough resources - like when only one node is ready in a cluster - that the terminations of running containers because of OOM issues causes a stamped. After rebooting, the worker node is ready for the workload sharing and things work better.

    Glad its working now.

    Regards,

  • Posts: 31

    Hi @serewicz, Thanks a lot for clarifying. How do I check if the AWS instance has the swap disabled?
    If so, is it correct that kswapd0 is active, actually very active, among the process listed by top?

  • Posts: 2,453
    edited January 2019

    Hi @crixo,
    You can verify swap by running one of the following:

    cat /etc/fstab

    cat /proc/swaps

    swapon -s

    free -h

    There seems to be a known issue with kswapd0 using a lot of CPU and there are a few solutions posted online, but I have not tried either of them so I am not sure what works and what doesn't.

    Regards,
    -Chris

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training