etcd restore overview on a kubernetes cluster

melchior · September 2021

Hi !

I would like to get some clarification about the general process of recovering a permanently failed etcd cluster for which we have a snapshot.
For the sake of simplicity let's assume we only have a single master node with one single etcd instance running and one single worker node.
So from I read from the class content and also from the documentation about etcd restore, in my mind the overall procedure to restore a failed master node would be:

1 - Deploy a brand new VM (I am assuming the master node is a VM)
2 - run kubeadm and initialize a new cluster
3 - Stop all the API server processes and kubelet processes etc.
4 - Restore the etcd database using the snapshot
5 - Start all the processes (api-server)
6 - Check and verify the cluster status

Would that be close to a valid "restore procedure" ?

Many thanks !

melchior · September 2021

Thank you !

chrispokorni · September 2021

Hi @melchior,

If planning to interact with etcd directly, then these steps should work.

Regards,
-Chris

etcd restore overview on a kubernetes cluster

Best Answer

Answers

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)