Chapter 16 - Cluster High Availability Proofing (LFS258)

hybby · February 2023

Some of the proofing in Chapter 16, especially the "Cluster High Availability" page, makes it very difficult to understand the concepts that you're trying to introduce.

For example:

As long as the database services the cluster will continue to run and catch up with kubelet information should the cp node go down and be brought back online.

What does this mean? That the cluster can continue to run if an alternative database is available on another CP? That a single-database node will catch up from kubelet if its operations are interrupted? It's really not clear what's being said here.

Additionally:

Three instances are required for etcd to be able to determine quorum if the data is accurate, or if the data is corrupt, the database could become unavailable.

I get that three nodes are needed to determine quorum, and if quorum is achieved then the data can be considered accurate. But where does corruption come into things? If nodes disagree about the state of the database, what happens and how is it resolved? Does it just "become corrupt"?

Also in the "Collocated Databases" page:

Should a node fail, you would lose both a control plane and a database. As the database is the one object that cannot be rebuilt, this may not be an important issue.

I think this should be "this is an important issue, if you are only running one CP", unless there's another meaning?

Could the phrasing be cleared up?

Thanks,
Drew

Chapter 16 - Cluster High Availability Proofing (LFS258)

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)