Welcome to the Linux Foundation Forum!

LFD259 - New version live on Kubernetes v1.21.1 (5.21.2021)

fcioanca
fcioanca Posts: 1,144
edited May 2021 in LFD259 Class Forum

Hi,

A new course version of LFD259 went live today. In this release, all labs were updated to use Kubernetes v1.21.1, and are now using Podman. The updates also cover most of the upcoming CKAD exam updates.

To ensure you have access to the latest version, please clear your cache.

Regards,
Flavia
The Linux Foundation Training Team

Comments

  • smartinj
    smartinj Posts: 2

    hi, I just tried to create a new cluster with the provided installation scripts (v1.21.1) on Ubuntu 18 (KVM VMs) but it's not working. When executing the script 'k8sMaster.sh' i get the following error:

    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] Initial timeout of 40s passed.

    Unfortunately, an error has occurred:
        timed out waiting for the condition
    
    This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    
    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'
    
    Additionally, a control plane component may have crashed or exited when started by the container runtime.
    To troubleshoot, list all containers using your preferred container runtimes CLI.
    
    Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
        - 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'
    

    error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

    The script continues its execution without checking the result of the previous step:
    sudo kubeadm init --config=$(find / -name kubeadm.yaml2>/dev/null)
    sleep5
    echo"Running the steps explained at the end of the init output for you"
    mkdir -p $HOME/.kub
    ...

    Does anyone know what's the issue?

    Thanks.
    KR,
    Javier.

  • leifsegen
    leifsegen Posts: 10

    Is there a high-level summary of differences? (Or could there be?)

  • Got the same error, running on fresh Ubuntu 18.04.5 server (VirtualBox).

    [control-plane] Creating static Pod manifest for "kube-scheduler"
    [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] Initial timeout of 40s passed.
    
            Unfortunately, an error has occurred:
                    timed out waiting for the condition
    
            This error is likely caused by:
                    - The kubelet is not running
                    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    ...
    ...
    

    Any ideas about this? It's blocking setting up the cluster. @fcioanca ?

  • leifsegen
    leifsegen Posts: 10

    @lesnyrumcajs I'm sorry I don't have an answer regarding your configuration. Though, I am also using VirtualBox to set up my cluster. Some things I changed in my process when switching from the V2021-01-26 version so far are:

    • download the newer version (make sure the url used by wget has V2021-05-21 in it
    • make sure the names of the two nodes match what is expected by the documentation (master and worker)

    @leifsegen said:
    Is there a high-level summary of differences? (Or could there be?)

    Regarding this question, so far I notice that the initialization seem to use cri-o instead of docker. I notice the previous scripts are in there, though. Are they intended to still be compatible with the current version of the course. I'm running into trouble getting the k8sMaster.sh script to run as well:

    [kubelet-check] Initial timeout of 40s passed.
    
    
            Unfortunately, an error has occurred:
                    timed out waiting for the condition
    
            This error is likely caused by:
                    - The kubelet is not running
                    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
            If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                    - 'systemctl status kubelet'
                    - 'journalctl -xeu kubelet'
    
            Additionally, a control plane component may have crashed or exited when started by the container runtime.
            To troubleshoot, list all containers using your preferred container runtimes CLI.
    
            Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
                    - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
                    Once you have found the failing container, you can inspect its logs with:
                    - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock logs CONTAINERID'
    
    [email protected]:~$ systemctl status kubelet
    ● kubelet.service - kubelet: The Kubernetes Node Agent
       Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
      Drop-In: /etc/systemd/system/kubelet.service.d
               └─10-kubeadm.conf
       Active: active (running) since Sun 2021-05-30 14:16:14 UTC; 5min ago
         Docs: https://kubernetes.io/docs/home/
     Main PID: 22653 (kubelet)
        Tasks: 14 (limit: 2316)
       CGroup: /system.slice/kubelet.service
               └─22653 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --co
    
    May 30 14:21:21 master kubelet[22653]: E0530 14:21:21.244234   22653 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    # duplicates omitted
    
    [email protected]:~$ journalctl -xeu kubelet
    May 30 14:21:51 master kubelet[22653]: E0530 14:21:51.869400   22653 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    May 30 14:21:51 master kubelet[22653]: E0530 14:21:51.969776   22653 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    # duplicates omitted
    May 30 14:21:52 master kubelet[22653]: E0530 14:21:52.396559   22653 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.0.18:6443/apis/coordination.k8s.io/v1/namesp
    # etc.
    
    [email protected]:~$ sudo crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause
    # no output
    
  • serewicz
    serewicz Posts: 990

    Hello,

    The crio based nodes take a bit longer to start, but typically within the listed 4min timeout.

    When you look at kubelet, is it running?

    What does the output of crictl show?

    Do you nodes have the required amount of resources as listed in Exercise 2.1: Overview and Preliminaries?

    Regards,

  • lesnyrumcajs
    lesnyrumcajs Posts: 5
    edited June 2021

    Hello @serewicz , thanks for responding.

    When you look at kubelet, is it running?

    [email protected]:~$ systemctl status kubelet
    ● kubelet.service - kubelet: The Kubernetes Node Agent
       Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
      Drop-In: /etc/systemd/system/kubelet.service.d
               └─10-kubeadm.conf
       Active: active (running) since Wed 2021-06-02 20:27:29 UTC; 6min ago
         Docs: https://kubernetes.io/docs/home/
     Main PID: 19648 (kubelet)
        Tasks: 15 (limit: 4915)
       CGroup: /system.slice/kubelet.service
               └─19648 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --
    
    Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.582356   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.696327   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.796635   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.897810   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:06 k8s-master kubelet[19648]: I0602 20:34:06.968908   19648 kubelet_node_status.go:71] "Attempting to register node" node="master
    Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.969847   19648 kubelet_node_status.go:93] "Unable to register node with API server"
    Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.998438   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:07 k8s-master kubelet[19648]: E0602 20:34:07.098588   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:07 k8s-master kubelet[19648]: E0602 20:34:07.199096   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    Jun 02 20:34:07 k8s-master kubelet[19648]: E0602 20:34:07.300501   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
    

    Seems something is off here, no?

    What does the output of crictl show?

    Not sure what subcommand do you exactly mean? E.g. sudo crictl info:

    [email protected]:~$ sudo crictl info
    {
      "status": {
        "conditions": [
          {
            "type": "RuntimeReady",
            "status": true,
            "reason": "",
            "message": ""
          },
          {
            "type": "NetworkReady",
            "status": true,
            "reason": "",
            "message": ""
          }
        ]
      }
    }
    
    

    Do you nodes have the required amount of resources as listed in Exercise 2.1: Overview and Preliminaries?

    I assigned 8 GB RAM and 20 GB disk + 3 vCPUs for good measure.

    I attached the master.out output from the initial script. Plus some output from the journal.

    Completely unrelated to the problem - the link in the course pdf seems incorrect (it's https://training.linuxfoundation.org/cm/LFD259/LFD259V2021-05-21SOLUTIONS.tar.xz , should be https://training.linuxfoundation.org/cm/LFD259/LFD259_V2021-05-21_SOLUTIONS.tar.xz).

  • fcioanca
    fcioanca Posts: 1,144

    @lesnyrumcajs The solutions link in the course pdf is actually correct. However, when you copy/paste from the pdf, depending on your software, underscores tend to disappear - this is also mentioned in the Note immediately under the command to download the files.

  • chrispokorni
    chrispokorni Posts: 1,417

    Hi @lesnyrumcajs,

    These error messages ("node not found", "connection refused") are indicating a networking issue with your VirtualBox VMs. Misconfigured VM networking from within the VirtualBox hypervisor often leads to these issues during Kubernetes cluster init process.

    How many network interfaces do you have on each VM, and of what type?

    Regards,
    -Chris

  • @fcioanca Thanks, sorry for the noise then.

    @chrispokorni It's a brand new Virtual Box with default NAT (and SSH port forwarded to host) and defaults everywhere, nothing else.

    [email protected]:~$ ifconfig
    enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
            inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
            inet6 fe80::a00:27ff:fe25:2708  prefixlen 64  scopeid 0x20<link>
            ether 08:00:27:25:27:08  txqueuelen 1000  (Ethernet)
            RX packets 52549  bytes 78891443 (78.8 MB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 2874  bytes 186963 (186.9 KB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
            inet 127.0.0.1  netmask 255.0.0.0
            inet6 ::1  prefixlen 128  scopeid 0x10<host>
            loop  txqueuelen 1000  (Local Loopback)
            RX packets 124  bytes 10160 (10.1 KB)
            RX errors 0  dropped 0  overruns 0  frame 0
            TX packets 124  bytes 10160 (10.1 KB)
            TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
    
    
  • chrispokorni
    chrispokorni Posts: 1,417

    Hi @lesnyrumcajs,

    The VBox NAT & defaults are not sufficient for a Kubernetes cluster. If you examine closely the NAT properties in the VBox networking documentation, it prevents guests from talking to each other, a feature that Kubernetes relies on. I would recommend a bridged network instead, with promiscuous mode also enabled and set to allow all traffic.

    Regards,
    -Chris

  • @chrispokorni Thanks for the suggestions. I actually tried it originally (because I didn't want to bother with port forwarding :smile: ). Unfortunately the error persists, just with my local network IP (172.16.x.x).

  • chrispokorni
    chrispokorni Posts: 1,417

    Hi @lesnyrumcajs,

    The suggested configuration works for me with VBox VMs. I am wondering whether the host OS has an active firewall that may block traffic.

    Regards,
    -Chris

  • ioef
    ioef Posts: 4
    edited June 2021

    Hello.

    I am facing the same issue as mentioned above. I have created two brand new Ubuntu 18.04 LTS Server VMs using the iso "ubuntu-18.04.5-live-server-amd64.iso". One VM is for the cp and another for worker with 2 CPUs, 8GB RAM and 25GB disks each. Also the Networking has been configured to be bridged with promiscuous mode obtaining IP addresses from my local dhcp server. The cp is obtaining the IP 192.168.0.190 whether the worker the 192.168.0.191.

    However when executing the ./k8scp.sh in the cp node i am stuck at the following phase:

    [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
    [kubeconfig] Writing "admin.conf" kubeconfig file
    [kubeconfig] Writing "kubelet.conf" kubeconfig file
    [kubeconfig] Writing "controller-manager.conf" kubeconfig file
    [kubeconfig] Writing "scheduler.conf" kubeconfig file
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Starting the kubelet
    [control-plane] Using manifest folder "/etc/kubernetes/manifests"
    [control-plane] Creating static Pod manifest for "kube-apiserver"
    [control-plane] Creating static Pod manifest for "kube-controller-manager"
    [control-plane] Creating static Pod manifest for "kube-scheduler"
    [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] Initial timeout of 40s passed.
    
        Unfortunately, an error has occurred:
            timed out waiting for the condition
    
        This error is likely caused by:
            - The kubelet is not running
            - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    
        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
            - 'systemctl status kubelet'
            - 'journalctl -xeu kubelet'
    
    

    Could you please help?

    Thanks a lot!

  • chrispokorni
    chrispokorni Posts: 1,417

    Hi @ioef,

    The course author posted the following:

    https://forum.linuxfoundation.org/discussion/859506/k8scp-sh-install-script-issues

    Regards,
    -Chris

  • ioef
    ioef Posts: 4

    Hello!

    Thank you for the heads up. I have also commented in the indicated thread with a Workaround that worked for me.

  • The workaround worked fine for me, thanks!

Categories

Upcoming Training