Compute node reboot cadence
There was a discussion that I was having with co-workers the other day. These co-workers were adamant that k8s nodes should be rebooted every quarter at minimum in order to alleviate application memory leaks amongst other things. Their argument was that applications including the cgroups on the Linux server would continuously have memory leaks an the only way to clean the compute nodes were to reboot them. The question that I put forth that no one could answer was: Shouldn't the memory leak on the application be taken care of by the developers? Also, if the cgroups were such a problem that the Linux OS needed to be rebooted because of them. Why haven't I read any bulletins pointing out such a case?
So I figured I would pose the question here as to whether this is a needed process for a k8s cluster or overkill? Background on the environment. k8s cluster running CentOS 7.9 with v1.18.17.
Comments
-
Hi @jmik618,
For earlier Kubernetes releases in combination with some very specific application deployments there are a few blog posts and forum threads discussing cluster agents' and/or applications' memory leaks. However, they were describing very specific conditions, which cannot be generalized.
If cluster node reboots are desired, such reboots should be performed in a rolling fashion to minimize and/or eliminate any negative impact to the workload deployed to the cluster. And if control plane reboots are also planned, the control plane should be configured for HA.
Regards,
-Chris0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 5 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 984 Software
- 376 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)