Recommendation on 20GB+ per node (VM) disk space to avoid "Evicted pod death-spiral"
Hopefully this will help others who have taken their time going through the LFS258 material over a longer period of time, and ended up with dozens of pods in "Evicted" state around Lab 11.1, 11.2, or after there.
Any long-running pods from earlier in the lab builds will gradually increase their local disk usage over time, until ultimately the nodes will "Evict" any newly-scheduled pods. This results in a "death-spiral" because the only way to reclaim the space completely is to undeploy ~everything. Since the labs use the CP node as an additional Worker node, this impacts ~everything else.
One solution is to simply not leave your environment running overnight after linkerd & linkerd-viz are deployed, since it's the first Lab that really consumes a lot of space over time.
Alternatively, increase the disk space on your VMs up to at least 20GB, and that should give you enough buffer to continue through the end of all labs. (Search "growpart" and "lvextend" online, and "qemu-img resize" if you're on KVM.) In my case, adding the ingress controller pushed things into the endless spiral of ephemeral disk space being claimed then evicted, and thus it was far faster to resize the VMs and "throw space at the problem" rather than reconfigure external volumes, edit logging, etc.
Hopefully that helps someone else too.
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 4 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 983 Software
- 375 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)