Welcome to the Linux Foundation Forum!

Office hours - Jun 8 (LFS242 / LFS258)

Hello,

We had a visitor today share his experience with a known etcd bug. The bug apparently has been reported on github for etcd v3.4, where the leader control plane nodes of a highly available Kubernetes cluster would randomly cordon themselves in production, becoming unavailable, thus forcing the remaining etcd instances into a leader election process.

etcd v3.4 is part of Kubernetes release 1.22. One of the recommended fixes would be to only upgrade etcd to a version higher than v3.5 while keeping the production cluster at v1.22. This calls, however, for careful compatibility testing between a more recent etcd release and an older kube-apiserver. Another recommendation would be a full cluster upgrade, bringing the entire cluster to Kubernetes v1.23 which would also install etcd v3.5+, a release no longer manifesting the known bug from the earlier etcd release v3.4.

While not entirely related to the LFS258 Kubernetes Fundamentals course, this seemed to be an interesting topic worth sharing.

Regards,
-Chris

Categories

Upcoming Training