Welcome to the Linux Foundation Forum!

Lab 4.1 - etcd DB backup issues

chrispokorni
chrispokorni Posts: 2,383
edited December 2020 in LFS258 Class Forum

@dctheroux, your latest comments from a previous discussion thread, reporting issues on LFS258 Lab 4.1, have been moved here, to keep discussions organized and relevant to a specific topic.

The two prior comments from the other discussion thread shall be removed.

Please continue posting on this discussion thread with any additional issues encountered with the etcd DB backup exercise of Lab 4.1.

STEP 4:

Chris, I am getting this error when I try to see how many databases? I am using the command : kubectl -n kube-system exec -it etcd-master -- sh -c \
"ETCDCTL_API=3 etcdctl --cert=./peer.crt --key=./peer.key --cacert=./ca.crt \
--endpoints=https://127.0.0.1:2379 member list" . So, after I initiate the command it gives me the error : Error: open ./peer.crt: no such file or directory
command terminated with exit code 128. I read that openshift has a bug with this. This is exercise 4.1 and the printout it says I should be getting is this: fb50b7ddbf4930ba, started, master, https://10.128.0.35:2380,
2 https://10.128.0.35:2379, false. instead I am getting the error. I think perhaps it is a bug or maybe the command spacing again. I tried typing it and copy and paste it to no avail. the chart works fine and I copied and pasted and typed it for practice. I checked and it has all the files were I saved them from in the etcd shell. please advise? Thank You!

STEP 6:

I cant get passed item 6 in the first lab of chapter 4. It might be working but none of the output looks like what you have printed out on the screen. number 6 seems to bring me into a sub shell of some sort and that is this command: kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 \
ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt \
ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 \
snapshot save /var/lib/etcd/snapshot.db. This command just puts me in another shell and i am not sure what to do with it.

Comments

  • @dctheroux,

    In step 4 you need to ensure you are using the correct etcd pod name. The same etcd pod name you have used in step 2 of this exercise, has to be reused in subsequent steps 3, 4, 5, and 6. In the lab manual, the etcd-master is the name of the author's pod - yours will have a slightly different name.

    In step 6 it seems that you are missing a closing double-quote (") at the very end of your command, right after ... snapshot.db.

    Regards,
    -Chris

  • Thank you Chris, Sorry for for that again I will make sure to start a new thread for each one, Let me try the things you have suggested. I used tab completion and it gave me master. which is the name of my master instance.

  • Hi @dctheroux,

    You could try providing the absolute path from Step 2 (b) when running Step 3, and update the ./peer... and ./ca.crt with /etc/kubernetes/pki/etcd/peer... and /etc/kubernetes/pki/etcd/ca.crt.

    Regards,
    -Chris

  • i did the absolute and it worked.

  • deepakgcp
    deepakgcp Posts: 10
    edited December 2020

    This step needs to be reviewed and reupload to the course material.. I dont understand what do they mean by "50" in the second sentence and the command with ./ wont work. we need to provide the absolute path..

    Poor Documentation from Linux Foundation.. It seems like nobody reviewed the Lab Exercises..

  • coop
    coop Posts: 916

    Poor Documentation from Linux Foundation.. It seems like nobody reviewed the Lab Exercises..

    K8s is updated every 3 months and so are the exams and the courses (even more often) and the upstream is constantly shifting in ways out of the course maintainers' control. Your snarky attitude is not helpful, so if you have very specific suggestions they are always welcome, but the course material is put through extensive review and testing at every step so your criticism is both inaccurate and unfair. I do not maintain this specific course but I am familiar with the process. Your tone is not one that welcomes productive collaboration between teacher and student. So please be more respectful, everyone else is.

  • I'm having issues on
    Exercise 4.1: Basic Node Maintenance

    1. Check the health of the database using the loopback IP and port 2379. You will need to pass then peer cert and key as
      well as the Certificate Authority as environmental variables. The command is commented, you do not need to type out
      the comments or the backslashes.
      student@master:˜$ kubectl -n kube-system exec -it etcd-master -- sh \ #Same as before
      -c "ETCDCTL_API=3 \ #Version to use
      ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt \ # Pass the certificate authority
      ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt \ #Pass the peer cert and key
      ETCDCTL_KEY=/etc/kubernete

    Could you kindly help please

  • serewicz
    serewicz Posts: 1,000

    @dino.farinha what is the error or issue you are encountering?

  • step 4 command, i was able to run, is like below. I think it is important where you put the word etcdctl between double-quote (").

    kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/peer.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/peer.key etcdctl --endpoints=https://127.0.0.1:2379 member list"

  • serewicz
    serewicz Posts: 1,000

    Hello,

    Correct, the placement of the command and how the shell parses the variables does have an effect. Good catch!

    Regards,

  • tjghost
    tjghost Posts: 2
    edited February 2021

    I'm having a different problem at Step 3.

    kubectl -n kube-system exec -it etcd-lfs-main -- sh
    # cd /etc/kubernetes/pki/etcd
    # ls -la
    total 40
    drwxr-xr-x 2 root root 4096 Nov  8 22:20 .
    drwxr-xr-x 3 root root 4096 Feb  9 15:17 ..
    -rw-r--r-- 1 root root 1017 Nov  8 22:20 ca.crt
    -rw------- 1 root root 1675 Nov  8 22:20 ca.key
    -rw-r--r-- 1 root root 1094 Nov  8 22:20 healthcheck-client.crt
    -rw------- 1 root root 1679 Nov  8 22:20 healthcheck-client.key
    -rw-r--r-- 1 root root 1131 Nov  8 22:20 peer.crt
    -rw------- 1 root root 1679 Nov  8 22:20 peer.key
    -rw-r--r-- 1 root root 1131 Nov  8 22:20 server.crt
    -rw------- 1 root root 1679 Nov  8 22:20 server.key
    
    augspies@lfs-main:~$ kubectl -n kube-system exec -it etcd-lfs-main -- sh -c "ETCDCTL_API=3 \ #Version to use
    ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt \ # Pass the certificate authority
    ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt \ #Pass the peer cert and key
    ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key \
    etcdctl endpoint health"
    

    Give this output

    sh: 1:  #Version: not found
    sh: 2:  #: not found
    sh: 3:  #Pass: not found
    Error: KeyFile and CertFile must both be present[key: /etc/kubernetes/pki/etcd/server.key, cert: ]
    command terminated with exit code 128
    
  • Hi @tjghost,

    From the error it seems to be complaining about the comments following each line of command. Removing the comments may help, and/or converting the entire command into a single line command may save you of headaches too :wink:

    Regards,
    -Chris

  • rosaiah
    rosaiah Posts: 3
    edited February 2021

    Hi Team,
    I'm not getting the output for Step 5 as mentioned in the document to view cluster info in Table format.

    I ran below command to view in Table format.

    kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379"

    and also I modified the command but I'm unable to get.

    k8scka@master:~$ kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 --write-out="table" "

  • Hi @rosaiah,

    The Discussion on the same topic was removed in order to eliminate duplicates. This keeps all relevant comments on the same thread.

    Due to continuous changes in the etcd container image, the etcdctl command shifts in behavior as well. The following command worked for me:

    kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 member list --write-out=table"

    If this does not work, you may try to replace /server. with /peer.

    Regards,
    -Chris

  • Thank you @chrispokorni .
    it works. Happy for your assistance.

  • rshimizu
    rshimizu Posts: 1
    edited February 2021

    Thank you for your post, @rosaiah !

    kubectl -n kube-system exec -it etcd-master -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl --endpoints=https://127.0.0.1:2379 endpoint status --write-out=table"

    also worked for me :)
    (this command will return a result similar to the text.)

  • Where to find kubeadm-config.yaml? I don't see that in the location mentioned in the lab. Thanks

    ubuntu@ip-172-31-46-10:~$ find kubeadm-config.yaml
    find: ‘kubeadm-config.yaml’: No such file or directory

  • chrispokorni
    chrispokorni Posts: 2,383

    Hi @vishwas2f4u,

    Please post your questions in Discussions that are on the same topic as your issue, or create a new Discussion if necessary - assuming it is a completely new issue. The current Discussion thread is for Lab 4, whereas your issue is related to Lab 3.

    However, it seems that your find command is incomplete. You should try the find $HOME -name <file-name> syntax instead. For additional help on the usage of find you may try find --help, or man find, or info find.

    Regards,
    -Chris

  • chrsyng
    chrsyng Posts: 1

    I can't seem to get lab 4.1 step 6 to work

    Step 6

    chris@k8s-ctrl-node-1:~$ kubectl -v=1 -n kube-system exec -it etcd-k8s-ctrl-node-1 -- sh -c "ETCDCTL_API=3 ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt etcdctl --endpoints=https://127.0.0.1:2379 member list snapshot save /var/lib/etcd/snapshot.db"
    7d4266b35b46001e, started, k8s-ctrl-node-1, https://192.168.122.150:2380, https://192.168.122.150:2379, false
    

    Increasing the output to v=10 didn't seem to give any relevant info.

    Step 7

    chris@k8s-ctrl-node-1:~$ sudo ls -l /var/lib/etcd/
    [sudo] password for chris: 
    total 4
    drwx------ 4 root root 4096 Jul 27 10:09 member
    
  • chrispokorni
    chrispokorni Posts: 2,383

    Hi @chrsyng,

    It seems that you command is a mix of the commands from steps 5 and 6. Ensure you are only using the command with the options from step 6 to save the snapshot.

    Regards,
    -Chris

  • Hi,

    I am on Lab 4.1. Basic Node Maintenance and I'm stuck at step 2.
    2) Log into the etcd container and look at the options etcdctl provides. Use tab to complete the container name.

    student@cp: ̃$ kubectl -n kube-system exec -it etcd- -- sh

    So when I try to tap 'Tab" key nothing happens meaning there is no such directory. I'm on a master node (cp). Anything else I need to do here?

    Thank you!
    Gaurav

  • chrispokorni
    chrispokorni Posts: 2,383

    Hi @gaurav4978,

    The expectation here is that TAB will help autocomplete the name of the etcd Pod. Once you typed in etcd- then pressing TAB should then complete the etcd Pod name, with typically the hostname of the node where etcd is running (your control-plane node).

    If autocomplete does not behave as expected, I would recommend revisiting steps 18 and 19 of Lab Exercise 3.1, where kubectl completion is enabled and then validated.

    Regards,
    -Chris

  • Hi,

    I'm stuck at step4 of chapter4. I've copied the output for you reference.

    step3 -
    ubuntu@ip-xx-xx-x-xxx:~$ kubectl -n kube-system exec -it etcd-ip-xx-xx-x-xxx -- sh -c "ETCDCTL_API=3 ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key etcdctl endpoint health"

    127.0.0.1:2379 is healthy: successfully committed proposal: took = 22.589832ms

    step4
    ubuntu@ip-xx-xx-x-xxx:~$ kubectl -n kube-system exec -it etcd-ip-xx-xx-x-xxx -- sh -c "ETCDCTL_API=3 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt etcdctl --endpoints=https://127.0.0.1:2379 member list"

    sh: --cert=/etc/kubernetes/pki/etcd/peer.crt: No such file or directory
    command terminated with exit code 127

    step5 - was successful


    when i did ls command i see the file exists.

    ubuntu@ip-xx-xx-x-xxx:~$ ls -l /etc/kubernetes/pki/etcd/
    total 32
    -rw-r--r-- 1 root root 1058 Sep 7 17:36 ca.crt
    -rw------- 1 root root 1679 Sep 7 17:36 ca.key
    -rw-r--r-- 1 root root 1139 Sep 7 17:36 healthcheck-client.crt
    -rw------- 1 root root 1679 Sep 7 17:36 healthcheck-client.key
    -rw-r--r-- 1 root root 1196 Sep 7 17:36 peer.crt
    -rw------- 1 root root 1679 Sep 7 17:36 peer.key
    -rw-r--r-- 1 root root 1196 Sep 7 17:36 server.crt
    -rw------- 1 root root 1679 Sep 7 17:36 server.key

  • Hi @sairameshpv,

    I would recommend following the same notation found in Steps 3 and 5 - use VARIABLES instead of --options, by replacing --cert with ETCDCTL_CERT, then --key and --cacert respectively.

    If the issue still persists, the peer key and crt can be replaced with server key and crt.

    Regards,
    -Chris

  • Hi @chrispokorni

    After changing to
    kubectl -n kube-system exec -it etcd-ip-xxx-xx-x-xxx -- sh -c "ETCDCTL_API=3 ETCDCTL_CERT=/etc/kubernetes/pki/etcd/peer.crt ETCDCTL_KEY=/etc/kubernetes/pki/etcd/peer.key ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt etcdctl --endpoints=https://127.0.0.1:2379 member list"

    It worked.

    Thanks

  • Today (nov 17 2021) the course documentation has this command in the 4th step:

    student@cp: ̃$ kubectl -n kube-system exec -it etcd-k8scp -- sh -c \
    "ETCDCTL_API=3 --cert=./peer.crt --key=./peer.key --cacert=./ca.crt \
    etcdctl --endpoints=https://127.0.0.1:2379 member list"

    This command does not work and its output is the following:

    sh: --cert=./peer.crt: No such file or directory

    I think there are two problems:

    1) The current path "./" where sh execute the command is not "/etc/kubernetes/pki/etcd"
    2) I thing --cert, --key and --cacert are OPTIONS of etcdctl program (see etcdctl -h ) and all of these OPTIONS should go after etcdctl command and not before.

    My solution:

    kubectl -n kube-system exec -it etcd-error404 -- sh -c \
    "ETCDCTL_API=3 etcdctl \
    --cert=/etc/kubernetes/pki/etcd/peer.crt \
    --key=/etc/kubernetes/pki/etcd/peer.key \
    --cacert=/etc/kubernetes/pki/etcd/ca.crt \
    --endpoints=https://127.0.0.1:2379 member list"

    node: etcd-error404 is the name of my pod.

  • pawelzajac
    pawelzajac Posts: 5
    edited January 2022

    @hatimsue you are right your solution works. It is just the order matter.
    Course documentation is incorrect in the step 4.
    Well spotted.

    BTW. Seems that using server or peer key renders same output, not sure if the step 4 is some kind of inner course test point :smile:

    I also tried the ENV variable syntax and it works:

    kubectl -n kube-system exec -it etcd-cp -- sh -c \
    "ETCDCTL_API=3 ETCDCTL_CERT=/etc/kubernetes/pki/etcd/peer.crt \
    ETCDCTL_KEY=/etc/kubernetes/pki/etcd/peer.key \
    ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt \
    etcdctl --endpoints=https://127.0.0.1:2379 member list"

  • headkaze
    headkaze Posts: 15

    I think the following changes to LAB 4.1 make it easier to read and understand:

    $ kubectl -n kube-system exec -it etcd-<Tab> -- sh
    # export ETCDCTL_API=3
    # export ETCDCTL_CACERT=/etc/kubernetes/pki/etcd/ca.crt
    # export ETCDCTL_CERT=/etc/kubernetes/pki/etcd/server.crt
    # export ETCDCTL_KEY=/etc/kubernetes/pki/etcd/server.key
    # etcdctl endpoint health
    # etcdctl --endpoints=https://127.0.0.1:2379 member list
    # etcdctl --endpoints=https://127.0.0.1:2379 member list -w table
    # etcdctl --endpoints=https://127.0.0.1:2379 snapshot save /var/lib/etcd/snapshot.db
    # exit
    

    Also the following line assumes the cluster was installed using a yaml file:

    $ sudo cp /root/kubeadm-config.yaml $HOME/backup/
    

    It makes betters sense to get the backup using kubectl:

    $ kubectl get cm kubeadm-config -n kube-system -o yaml >$HOME/backup/kubeadm-config.yaml
    

Categories

Upcoming Training