Lab 4.1 - etcd DB backup issues

chrispokorni · December 2020

@dctheroux, your latest comments from a previous discussion thread, reporting issues on LFS258 Lab 4.1, have been moved here, to keep discussions organized and relevant to a specific topic.

The two prior comments from the other discussion thread shall be removed.

Please continue posting on this discussion thread with any additional issues encountered with the etcd DB backup exercise of Lab 4.1.

STEP 4:

Chris, I am getting this error when I try to see how many databases? I am using the command : kubectl -n kube-system exec -it etcd-master -- sh -c \
"ETCDCTL_API=3 etcdctl --cert=./peer.crt --key=./peer.key --cacert=./ca.crt \
--endpoints=https://127.0.0.1:2379 member list" . So, after I initiate the command it gives me the error : Error: open ./peer.crt: no such file or directory
command terminated with exit code 128. I read that openshift has a bug with this. This is exercise 4.1 and the printout it says I should be getting is this: fb50b7ddbf4930ba, started, master, https://10.128.0.35:2380,
2 https://10.128.0.35:2379, false. instead I am getting the error. I think perhaps it is a bug or maybe the command spacing again. I tried typing it and copy and paste it to no avail. the chart works fine and I copied and pasted and typed it for practice. I checked and it has all the files were I saved them from in the etcd shell. please advise? Thank You!

STEP 6:

I cant get passed item 6 in the first lab of chapter 4. It might be working but none of the output looks like what you have printed out on the screen. number 6 seems to bring me into a sub shell of some sort and that is this command: kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 \
ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt \
ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 \
snapshot save /var/lib/etcd/snapshot.db. This command just puts me in another shell and i am not sure what to do with it.

chrispokorni · December 2020

@dctheroux,

In step 4 you need to ensure you are using the correct etcd pod name. The same etcd pod name you have used in step 2 of this exercise, has to be reused in subsequent steps 3, 4, 5, and 6. In the lab manual, the etcd-master is the name of the author's pod - yours will have a slightly different name.

In step 6 it seems that you are missing a closing double-quote (") at the very end of your command, right after ... snapshot.db.

Regards,
-Chris

dctheroux · December 2020

Thank you Chris, Sorry for for that again I will make sure to start a new thread for each one, Let me try the things you have suggested. I used tab completion and it gave me master. which is the name of my master instance.

chrispokorni · December 2020

Hi @dctheroux,

You could try providing the absolute path from Step 2 (b) when running Step 3, and update the ./peer... and ./ca.crt with /etc/kubernetes/pki/etcd/peer... and /etc/kubernetes/pki/etcd/ca.crt.

Regards,
-Chris

dctheroux · December 2020

i did the absolute and it worked.

deepakgcp · December 2020

This step needs to be reviewed and reupload to the course material.. I dont understand what do they mean by "50" in the second sentence and the command with ./ wont work. we need to provide the absolute path..

Poor Documentation from Linux Foundation.. It seems like nobody reviewed the Lab Exercises..

coop · December 2020

Poor Documentation from Linux Foundation.. It seems like nobody reviewed the Lab Exercises..

K8s is updated every 3 months and so are the exams and the courses (even more often) and the upstream is constantly shifting in ways out of the course maintainers' control. Your snarky attitude is not helpful, so if you have very specific suggestions they are always welcome, but the course material is put through extensive review and testing at every step so your criticism is both inaccurate and unfair. I do not maintain this specific course but I am familiar with the process. Your tone is not one that welcomes productive collaboration between teacher and student. So please be more respectful, everyone else is.

dino.farinha · December 2020

I'm having issues on
Exercise 4.1: Basic Node Maintenance

Check the health of the database using the loopback IP and port 2379. You will need to pass then peer cert and key as
well as the Certificate Authority as environmental variables. The command is commented, you do not need to type out
the comments or the backslashes.
student@master:˜$ kubectl -n kube-system exec -it etcd-master -- sh \ #Same as before
-c "ETCDCTL_API=3 \ #Version to use
ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt \ # Pass the certificate authority
ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt \ #Pass the peer cert and key
ETCDCTL_KEY=/etc/kubernete

Could you kindly help please

serewicz · December 2020

@dino.farinha what is the error or issue you are encountering?

olcansimsek · January 2021

step 4 command, i was able to run, is like below. I think it is important where you put the word etcdctl between double-quote (").

kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/peer.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/peer.key etcdctl --endpoints=https://127.0.0.1:2379 member list"

serewicz · January 2021

Hello,

Correct, the placement of the command and how the shell parses the variables does have an effect. Good catch!

Regards,

tjghost · February 2021

I'm having a different problem at Step 3.

kubectl -n kube-system exec -it etcd-lfs-main -- sh
# cd /etc/kubernetes/pki/etcd
# ls -la
total 40
drwxr-xr-x 2 root root 4096 Nov  8 22:20 .
drwxr-xr-x 3 root root 4096 Feb  9 15:17 ..
-rw-r--r-- 1 root root 1017 Nov  8 22:20 ca.crt
-rw------- 1 root root 1675 Nov  8 22:20 ca.key
-rw-r--r-- 1 root root 1094 Nov  8 22:20 healthcheck-client.crt
-rw------- 1 root root 1679 Nov  8 22:20 healthcheck-client.key
-rw-r--r-- 1 root root 1131 Nov  8 22:20 peer.crt
-rw------- 1 root root 1679 Nov  8 22:20 peer.key
-rw-r--r-- 1 root root 1131 Nov  8 22:20 server.crt
-rw------- 1 root root 1679 Nov  8 22:20 server.key

augspies@lfs-main:~$ kubectl -n kube-system exec -it etcd-lfs-main -- sh -c "ETCDCTL_API=3 \ #Version to use
ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt \ # Pass the certificate authority
ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt \ #Pass the peer cert and key
ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key \
etcdctl endpoint health"

Give this output

sh: 1:  #Version: not found
sh: 2:  #: not found
sh: 3:  #Pass: not found
Error: KeyFile and CertFile must both be present[key: /etc/kubernetes/pki/etcd/server.key, cert: ]
command terminated with exit code 128

chrispokorni · February 2021

Hi @tjghost,

From the error it seems to be complaining about the comments following each line of command. Removing the comments may help, and/or converting the entire command into a single line command may save you of headaches too

Regards,
-Chris

rosaiah · February 2021

Hi Team,
I'm not getting the output for Step 5 as mentioned in the document to view cluster info in Table format.

I ran below command to view in Table format.

kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379"

and also I modified the command but I'm unable to get.

k8scka@master:~$ kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 --write-out="table" "

chrispokorni · February 2021

Hi @rosaiah,

The Discussion on the same topic was removed in order to eliminate duplicates. This keeps all relevant comments on the same thread.

Due to continuous changes in the etcd container image, the etcdctl command shifts in behavior as well. The following command worked for me:

kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 member list --write-out=table"

If this does not work, you may try to replace /server. with /peer.

Regards,
-Chris

rosaiah · February 2021

Thank you @chrispokorni .
it works. Happy for your assistance.

rshimizu · February 2021

Thank you for your post, @rosaiah !

kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 endpoint status --write-out=table"

also worked for me
(this command will return a result similar to the text.)

vishwas2f4u · March 2021

Where to find kubeadm-config.yaml? I don't see that in the location mentioned in the lab. Thanks

ubuntu@ip-172-31-46-10:~$ find kubeadm-config.yaml
find: ‘kubeadm-config.yaml’: No such file or directory

chrispokorni · March 2021

Hi @vishwas2f4u,

Please post your questions in Discussions that are on the same topic as your issue, or create a new Discussion if necessary - assuming it is a completely new issue. The current Discussion thread is for Lab 4, whereas your issue is related to Lab 3.

However, it seems that your find command is incomplete. You should try the find $HOME -name <file-name> syntax instead. For additional help on the usage of find you may try find --help, or man find, or info find.

Regards,
-Chris

chrsyng · July 2021

I can't seem to get lab 4.1 step 6 to work

Step 6

chris@k8s-ctrl-node-1:~$ kubectl -v=1 -n kube-system exec -it etcd-k8s-ctrl-node-1 -- sh -c "ETCDCTL_API=3 ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt etcdctl --endpoints=https://127.0.0.1:2379 member list snapshot save /var/lib/etcd/snapshot.db"
7d4266b35b46001e, started, k8s-ctrl-node-1, https://192.168.122.150:2380, https://192.168.122.150:2379, false

Increasing the output to v=10 didn't seem to give any relevant info.

Step 7

chris@k8s-ctrl-node-1:~$ sudo ls -l /var/lib/etcd/
[sudo] password for chris: 
total 4
drwx------ 4 root root 4096 Jul 27 10:09 member

chrispokorni · July 2021

Hi @chrsyng,

It seems that you command is a mix of the commands from steps 5 and 6. Ensure you are only using the command with the options from step 6 to save the snapshot.

Regards,
-Chris

gaurav4978 · August 2021

Hi,

I am on Lab 4.1. Basic Node Maintenance and I'm stuck at step 2.
2) Log into the etcd container and look at the options etcdctl provides. Use tab to complete the container name.

student@cp: ̃$ kubectl -n kube-system exec -it etcd- -- sh

So when I try to tap 'Tab" key nothing happens meaning there is no such directory. I'm on a master node (cp). Anything else I need to do here?

Thank you!
Gaurav

chrispokorni · August 2021

Hi @gaurav4978,

The expectation here is that TAB will help autocomplete the name of the etcd Pod. Once you typed in etcd- then pressing TAB should then complete the etcd Pod name, with typically the hostname of the node where etcd is running (your control-plane node).

If autocomplete does not behave as expected, I would recommend revisiting steps 18 and 19 of Lab Exercise 3.1, where kubectl completion is enabled and then validated.

Regards,
-Chris

sairameshpv · September 2021

Hi,

I'm stuck at step4 of chapter4. I've copied the output for you reference.

step3 -
ubuntu@ip-xx-xx-x-xxx:~$ kubectl -n kube-system exec -it etcd-ip-xx-xx-x-xxx -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl endpoint health"

127.0.0.1:2379 is healthy: successfully committed proposal: took = 22.589832ms

step4
ubuntu@ip-xx-xx-x-xxx:~$ kubectl -n kube-system exec -it etcd-ip-xx-xx-x-xxx -- sh -c "ETCDCTL_API=3 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt etcdctl --endpoints=https://127.0.0.1:2379 member list"

sh: --cert=/etc/kubernetes/pki/etcd/peer.crt: No such file or directory
command terminated with exit code 127

step5 - was successful

when i did ls command i see the file exists.

ubuntu@ip-xx-xx-x-xxx:~$ ls -l /etc/kubernetes/pki/etcd/
total 32
-rw-r--r-- 1 root root 1058 Sep 7 17:36 ca.crt
-rw------- 1 root root 1679 Sep 7 17:36 ca.key
-rw-r--r-- 1 root root 1139 Sep 7 17:36 healthcheck-client.crt
-rw------- 1 root root 1679 Sep 7 17:36 healthcheck-client.key
-rw-r--r-- 1 root root 1196 Sep 7 17:36 peer.crt
-rw------- 1 root root 1679 Sep 7 17:36 peer.key
-rw-r--r-- 1 root root 1196 Sep 7 17:36 server.crt
-rw------- 1 root root 1679 Sep 7 17:36 server.key

chrispokorni · September 2021

Hi @sairameshpv,

I would recommend following the same notation found in Steps 3 and 5 - use VARIABLES instead of --options, by replacing --cert with ETCDCTL_CERT, then --key and --cacert respectively.

If the issue still persists, the peer key and crt can be replaced with server key and crt.

Regards,
-Chris

sairameshpv · September 2021

Hi @chrispokorni

After changing to
kubectl -n kube-system exec -it etcd-ip-xxx-xx-x-xxx -- sh -c "ETCDCTL_API=3 ETCDCTL_CERT=/etc/kubernetes/pki/etcd/peer.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/peer.key ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt etcdctl --endpoints=https://127.0.0.1:2379 member list"

It worked.

Thanks

hatimsue · November 2021

Today (nov 17 2021) the course documentation has this command in the 4th step:

student@cp: ̃$ kubectl -n kube-system exec -it etcd-k8scp -- sh -c \
"ETCDCTL_API=3 --cert=./peer.crt --key=./peer.key --cacert=./ca.crt \
etcdctl --endpoints=https://127.0.0.1:2379 member list"

This command does not work and its output is the following:

sh: --cert=./peer.crt: No such file or directory

I think there are two problems:

1) The current path "./" where sh execute the command is not "/etc/kubernetes/pki/etcd"
2) I thing --cert, --key and --cacert are OPTIONS of etcdctl program (see etcdctl -h ) and all of these OPTIONS should go after etcdctl command and not before.

My solution:

kubectl -n kube-system exec -it etcd-error404 -- sh -c \
"ETCDCTL_API=3 etcdctl \
--cert=/etc/kubernetes/pki/etcd/peer.crt \
--key=/etc/kubernetes/pki/etcd/peer.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt \
--endpoints=https://127.0.0.1:2379 member list"

node: etcd-error404 is the name of my pod.

pawelzajac · January 2022

@hatimsue you are right your solution works. It is just the order matter.
Course documentation is incorrect in the step 4.
Well spotted.

BTW. Seems that using server or peer key renders same output, not sure if the step 4 is some kind of inner course test point

I also tried the ENV variable syntax and it works:

kubectl -n kube-system exec -it etcd-cp -- sh -c \
"ETCDCTL_API=3 ETCDCTL_CERT=/etc/kubernetes/pki/etcd/peer.crt \
ETCDCTL_KEY=/etc/kubernetes/pki/etcd/peer.key \
ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt \
etcdctl --endpoints=https://127.0.0.1:2379 member list"

headkaze · August 2022

I think the following changes to LAB 4.1 make it easier to read and understand:

$ kubectl -n kube-system exec -it etcd-<Tab> -- sh
# export ETCDCTL_API=3
# export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
# export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
# export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
# etcdctl endpoint health
# etcdctl --endpoints=https://127.0.0.1:2379 member list
# etcdctl --endpoints=https://127.0.0.1:2379 member list -w table
# etcdctl --endpoints=https://127.0.0.1:2379 snapshot save /var/lib/etcd/snapshot.db
# exit

Also the following line assumes the cluster was installed using a yaml file:

$ sudo cp /root/kubeadm-config.yaml $HOME/backup/

It makes betters sense to get the backup using kubectl:

$ kubectl get cm kubeadm-config -n kube-system -o yaml >$HOME/backup/kubeadm-config.yaml

Lab 4.1 - etcd DB backup issues

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)