master node not available after rebooting EC2
I stopped EC2 vms and after restarting master node is not available
ubuntu@ip-xxx-xxx-xx-xxx:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Tue 2019-01-08 19:12:13 UTC; 9min ago
Docs: https://kubernetes.io/docs/home/
Main PID: 1288 (kubelet)
Tasks: 21
Memory: 95.0M
CPU: 1min 7.658s
CGroup: /system.slice/kubelet.service
├─1288 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=cgroupfs --network-plugin=cni
└─4535 /opt/cni/bin/calico
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.267101 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.432768 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:03 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:03.597614 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] calico.go 341: Extracted identifiers ContainerID="2af3731145389d511fb6c156e6fbcf5adb586d7290aa32f7d523ea80dceeb45b" Node="ip-xxx-xxx-xx-xxx" Orchestr
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: 2019-01-08 19:22:04.089 [INFO][4535] client.go 202: Loading config from environment
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.169439 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.334586 1288 azure_dd.go:147] failed to get azure cloud in GetVolumeLimits, plugin.host: ip-xxx-xxx-xx-xxx
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.824445 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:04 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:04.945939 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
Jan 08 19:22:05 ip-xxx-xxx-xx-xxx kubelet[1288]: E0108 19:22:05.071637 1288 kubelet.go:2236] node "ip-xxx-xxx-xx-xxx" not found
I assume it should be possible to reboot/stop&restart VMs.
Comments
-
Hi @crixo ,
It has been a while since I worked on AWS, but I remember being able to stop instances and then start them back up when I was ready to continue with my labs.
-Chris0 -
@crixo
I went thru lab 2.1 on 2 EC2 instances on AWS, and I had no trouble completing the lab, stopping then starting my instances. Aside from being assigned new public IPs, the node IPs remained the same, but the pod IPs have changed. I was also able to retest the service by accessing the nginx webserver via curl and browser.
Can you provide any other details?
Can you look into the bootstrap-config, kubelet-config or config files to see whether the master IP has changed?
Thanks,
-Chris0 -
Hi @chrispokorni,
I destroyed previous VM and after creating a new one I was able to reboot and continue to work w/ the cluster.
I noticed VM went wild due to kswapd0 was using most of the cpu.
Since the suggestion in the lab setup script to execute "sudo swapoff -a", I added into the aws VM the same command into
/etc/rc.local to execute it after each reboot0 -
Hi @crixo,
You can verify swap by running one of the following:cat /etc/fstabcat /proc/swapsswapon -sfree -hThere seems to be a known issue with kswapd0 using a lot of CPU and there are a few solutions posted online, but I have not tried either of them so I am not sure what works and what doesn't.
Regards,
-Chris0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 754 Linux Foundation IT Professional Programs
- 374 Cloud Engineer IT Professional Program
- 170 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 5 DevOps & GitOps IT Professional Program
- 100 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 2 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 5 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 2 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 794 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 89 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 112 Mobile Computing
- 20 Android
- 77 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 393 Off Topic
- 121 Introductions
- 182 Small Talk
- 29 Study Material
- 976 Programming and Development
- 310 Kernel Development
- 648 Software Development
- 990 Software
- 382 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)