Etcd restore
Hi,
This question is related to etcd backup and restore.
I've set up a kubernetes cluster using stacked etc topology using kubeadm. Dual control-plane nodes and dual worker-nodes. I end up with the static pod manifests in /etc/kubernetes/manifests and the control-plane services are running as pods. Only kubelet is running as a systemd service.
I've created a snapshot of my etcd databases on both control-plane nodes.
I simulate a data failure:
1. Stop API servers by removing the manifest files.
2. Delete /var/lib/etcd/ on both control-plane nodes.
Now I want to do a complete restore of etcd database.
I'm doing it using etcdctl on both control-plane nodes.
It seems to work and I get the output:
2022-07-01 09:05:14.228623 I | mvcc: restore compact to 42333 2022-07-01 09:05:14.233684 I | etcdserver/membership: added member a874c87fd42044f [https://127.0.0.1:2380] to cluster c9be114fc2da2776
kubectl get nodes is showing all nodes ready. However, when I do a single change, such as scaling a deployment. Something gets really wrong. Suddenly the API servers disagree on node health, and the replication change is not performed.
Am I doing this wrong?
Best Answer
-
@oleksazhel Thank you for your willingness to help.
I think my error was to do separate etcd snapshot. One for each node. Then I restored each node from its own snapshot.
I read in etcd documentation that "all that is needed is a single snapshot “db” file" and "all members should restore using the same snapshot."
https://etcd.io/docs/v3.5/op-guide/recovery/Doing like this made it work:
- Make one snapshot from a control-plane node.
- Stop etcd, api-server, kube-scheduler, kube-controller on both nodes.
- Delete etcd data-dir on both nodes to simulate data loss.
- Restore etcd data-dir on first node using
etcdctl snapshot restore - Copy snapshot to second node and restore using
etcdctl snapshot restore - Start etcd and verify
etcdctl endpoint health,etcdctl endpoint status,etcdctl member list - Start api-server, kube-controller and kube-scheduler
- Restart kubelet
1
Answers
-
@pnts Could you provide output of
ETCDCTL_API=3 etcdctl -w table \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/server.crt \ --key /etc/kubernetes/pki/etcd/server.key \ member list
and
ETCDCTL_API=3 etcdctl -w table \ --endpoints <CP1_IP_ADDRESS>:2379,<CP2_IP_ADDRESS>:2379 \ --cacert /etc/kubernetes/pki/etcd/ca.crt \ --cert /etc/kubernetes/pki/etcd/server.crt \ --key /etc/kubernetes/pki/etcd/server.key \ endpoint status
0
Categories
- All Categories
- 158 LFX Mentorship
- 158 LFX Mentorship: Linux Kernel
- 865 Linux Foundation IT Professional Programs
- 393 Cloud Engineer IT Professional Program
- 189 Advanced Cloud Engineer IT Professional Program
- 92 DevOps IT Professional Program
- DevOps & GitOps IT Professional Program
- 160 Cloud Native Developer IT Professional Program
- 153 Express Training Courses & Microlearning
- 150 Express Courses - Discussion Forum
- 3 Microlearning - Discussion Forum
- 7.2K Training Courses
- 50 LFC110 Class Forum - Discontinued
- 74 LFC131 Class Forum - DISCONTINUED
- 58 LFD102 Class Forum
- 261 LFD103 Class Forum
- LFD103-JP クラス フォーラム
- 27 LFD110 Class Forum
- 50 LFD121 Class Forum
- 3 LFD123 Class Forum
- 1 LFD125 Class Forum
- 19 LFD133 Class Forum
- 10 LFD134 Class Forum
- 19 LFD137 Class Forum
- 1 LFD140 Class Forum
- 73 LFD201 Class Forum
- 8 LFD210 Class Forum
- 6 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- LFD221 Class Forum
- 128 LFD232 Class Forum - Discontinued
- 3 LFD233 Class Forum - Discontinued
- 5 LFD237 Class Forum
- 25 LFD254 Class Forum
- 754 LFD259 Class Forum
- 111 LFD272 Class Forum - Discontinued
- 4 LFD272-JP クラス フォーラム - Discontinued
- 20 LFD273 Class Forum
- 501 LFS101 Class Forum
- 4 LFS111 Class Forum
- 4 LFS112 Class Forum
- LFS114 Class Forum
- 5 LFS116 Class Forum
- 9 LFS118 Class Forum
- 2 LFS120 Class Forum
- LFS140 Class Forum
- 12 LFS142 Class Forum
- 9 LFS144 Class Forum
- 6 LFS145 Class Forum
- 6 LFS146 Class Forum
- 7 LFS147 Class Forum
- 22 LFS148 Class Forum
- 18 LFS151 Class Forum
- 6 LFS157 Class Forum
- 92 LFS158 Class Forum
- 1 LFS158-JP クラス フォーラム
- 14 LFS162 Class Forum
- 2 LFS166 Class Forum - Discontinued
- 9 LFS167 Class Forum
- 5 LFS170 Class Forum
- 2 LFS171 Class Forum - Discontinued
- 4 LFS178 Class Forum - Discontinued
- 4 LFS180 Class Forum
- 3 LFS182 Class Forum
- 7 LFS183 Class Forum
- 2 LFS184 Class Forum
- 41 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム - Discontinued
- 23 LFS203 Class Forum
- 145 LFS207 Class Forum
- 3 LFS207-DE-Klassenforum
- 3 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum - Discontinued
- 56 LFS216 Class Forum - Discontinued
- 60 LFS241 Class Forum
- 51 LFS242 Class Forum
- 41 LFS243 Class Forum
- 17 LFS244 Class Forum
- 8 LFS245 Class Forum
- 1 LFS246 Class Forum
- 1 LFS248 Class Forum
- 126 LFS250 Class Forum
- 3 LFS250-JP クラス フォーラム
- 2 LFS251 Class Forum - Discontinued
- 164 LFS253 Class Forum
- 1 LFS254 Class Forum - Discontinued
- 3 LFS255 Class Forum
- 16 LFS256 Class Forum
- 2 LFS257 Class Forum
- 1.4K LFS258 Class Forum
- 12 LFS258-JP クラス フォーラム
- 142 LFS260 Class Forum
- 165 LFS261 Class Forum
- 45 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 25 LFS267 Class Forum
- 28 LFS268 Class Forum
- 38 LFS269 Class Forum
- 11 LFS270 Class Forum
- 202 LFS272 Class Forum - Discontinued
- 2 LFS272-JP クラス フォーラム - Discontinued
- 2 LFS274 Class Forum - Discontinued
- 4 LFS281 Class Forum - Discontinued
- 31 LFW111 Class Forum
- 265 LFW211 Class Forum
- 190 LFW212 Class Forum
- 17 SKF100 Class Forum
- 2 SKF200 Class Forum
- 3 SKF201 Class Forum
- 800 Hardware
- 200 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 175 Networking
- 92 Printers & Scanners
- 85 Storage
- 765 Linux Distributions
- 82 Debian
- 67 Fedora
- 20 Linux Mint
- 13 Mageia
- 23 openSUSE
- 149 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 472 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 96 Linux Security
- 78 Network Management
- 102 System Management
- 48 Web Management
- 74 Mobile Computing
- 19 Android
- 42 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 390 Off Topic
- 121 Introductions
- 180 Small Talk
- 28 Study Material
- 899 Programming and Development
- 313 Kernel Development
- 568 Software Development
- 1.8K Software
- 273 Applications
- 183 Command Line
- 5 Compiling/Installing
- 989 Games
- 320 Installation
- 110 All In Program
- 110 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)