LFD259 - New version live on Kubernetes v1.21.1 (5.21.2021)

fcioanca · May 2021

Hi,

A new course version of LFD259 went live today. In this release, all labs were updated to use Kubernetes v1.21.1, and are now using Podman. The updates also cover most of the upcoming CKAD exam updates.

To ensure you have access to the latest version, please clear your cache.

Regards,
Flavia
The Linux Foundation Training Team

smartinj · May 2021

hi, I just tried to create a new cluster with the provided installation scripts (v1.21.1) on Ubuntu 18 (KVM VMs) but it's not working. When executing the script 'k8sMaster.sh' i get the following error:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.

Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
    - 'crictl --runtime-endpoint /var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint /var/run/crio/crio.sock logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster

The script continues its execution without checking the result of the previous step:
sudo kubeadm init --config=$(find / -name kubeadm.yaml2>/dev/null)
sleep5
echo"Running the steps explained at the end of the init output for you"
mkdir -p $HOME/.kub
...

Does anyone know what's the issue?

Thanks.
KR,
Javier.

leifsegen · May 2021

Is there a high-level summary of differences? (Or could there be?)

lesnyrumcajs · May 2021

Got the same error, running on fresh Ubuntu 18.04.5 server (VirtualBox).

[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
...
...

Any ideas about this? It's blocking setting up the cluster. @fcioanca ?

leifsegen · May 2021

@lesnyrumcajs I'm sorry I don't have an answer regarding your configuration. Though, I am also using VirtualBox to set up my cluster. Some things I changed in my process when switching from the V2021-01-26 version so far are:

download the newer version (make sure the url used by wget has V2021-05-21 in it
make sure the names of the two nodes match what is expected by the documentation (master and worker)

@leifsegen said:
Is there a high-level summary of differences? (Or could there be?)

Regarding this question, so far I notice that the initialization seem to use cri-o instead of docker. I notice the previous scripts are in there, though. Are they intended to still be compatible with the current version of the course. I'm running into trouble getting the k8sMaster.sh script to run as well:

[kubelet-check] Initial timeout of 40s passed.


        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
                - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock logs CONTAINERID'

master@master:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2021-05-30 14:16:14 UTC; 5min ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 22653 (kubelet)
    Tasks: 14 (limit: 2316)
   CGroup: /system.slice/kubelet.service
           └─22653 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --co

May 30 14:21:21 master kubelet[22653]: E0530 14:21:21.244234   22653 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
# duplicates omitted

master@master:~$ journalctl -xeu kubelet
May 30 14:21:51 master kubelet[22653]: E0530 14:21:51.869400   22653 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
May 30 14:21:51 master kubelet[22653]: E0530 14:21:51.969776   22653 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
# duplicates omitted
May 30 14:21:52 master kubelet[22653]: E0530 14:21:52.396559   22653 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.0.18:6443/apis/coordination.k8s.io/v1/namesp
# etc.

master@master:~$ sudo crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause
# no output

serewicz · June 2021

Hello,

The crio based nodes take a bit longer to start, but typically within the listed 4min timeout.

When you look at kubelet, is it running?

What does the output of crictl show?

Do you nodes have the required amount of resources as listed in Exercise 2.1: Overview and Preliminaries?

Regards,

lesnyrumcajs · June 2021

Hello @serewicz , thanks for responding.

When you look at kubelet, is it running?

student@k8s-master:~$ systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Wed 2021-06-02 20:27:29 UTC; 6min ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 19648 (kubelet)
    Tasks: 15 (limit: 4915)
   CGroup: /system.slice/kubelet.service
           └─19648 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --

Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.582356   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.696327   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.796635   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.897810   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:06 k8s-master kubelet[19648]: I0602 20:34:06.968908   19648 kubelet_node_status.go:71] "Attempting to register node" node="master
Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.969847   19648 kubelet_node_status.go:93] "Unable to register node with API server"
Jun 02 20:34:06 k8s-master kubelet[19648]: E0602 20:34:06.998438   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:07 k8s-master kubelet[19648]: E0602 20:34:07.098588   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:07 k8s-master kubelet[19648]: E0602 20:34:07.199096   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
Jun 02 20:34:07 k8s-master kubelet[19648]: E0602 20:34:07.300501   19648 kubelet.go:2291] "Error getting node" err="node \"master\" not found"

Seems something is off here, no?

What does the output of crictl show?

Not sure what subcommand do you exactly mean? E.g. sudo crictl info:

student@k8s-master:~$ sudo crictl info
{
  "status": {
    "conditions": [
      {
        "type": "RuntimeReady",
        "status": true,
        "reason": "",
        "message": ""
      },
      {
        "type": "NetworkReady",
        "status": true,
        "reason": "",
        "message": ""
      }
    ]
  }
}

Do you nodes have the required amount of resources as listed in Exercise 2.1: Overview and Preliminaries?

I assigned 8 GB RAM and 20 GB disk + 3 vCPUs for good measure.

I attached the master.out output from the initial script. Plus some output from the journal.

Completely unrelated to the problem - the link in the course pdf seems incorrect (it's https://training.linuxfoundation.org/cm/LFD259/LFD259V2021-05-21SOLUTIONS.tar.xz , should be https://training.linuxfoundation.org/cm/LFD259/LFD259_V2021-05-21_SOLUTIONS.tar.xz).

fcioanca · June 2021

@lesnyrumcajs The solutions link in the course pdf is actually correct. However, when you copy/paste from the pdf, depending on your software, underscores tend to disappear - this is also mentioned in the Note immediately under the command to download the files.

chrispokorni · June 2021

Hi @lesnyrumcajs,

These error messages ("node not found", "connection refused") are indicating a networking issue with your VirtualBox VMs. Misconfigured VM networking from within the VirtualBox hypervisor often leads to these issues during Kubernetes cluster init process.

How many network interfaces do you have on each VM, and of what type?

Regards,
-Chris

lesnyrumcajs · June 2021

@fcioanca Thanks, sorry for the noise then.

@chrispokorni It's a brand new Virtual Box with default NAT (and SSH port forwarded to host) and defaults everywhere, nothing else.

student@k8s-master:~$ ifconfig
enp0s3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.2.15  netmask 255.255.255.0  broadcast 10.0.2.255
        inet6 fe80::a00:27ff:fe25:2708  prefixlen 64  scopeid 0x20<link>
        ether 08:00:27:25:27:08  txqueuelen 1000  (Ethernet)
        RX packets 52549  bytes 78891443 (78.8 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2874  bytes 186963 (186.9 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 124  bytes 10160 (10.1 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 124  bytes 10160 (10.1 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

chrispokorni · June 2021

Hi @lesnyrumcajs,

The VBox NAT & defaults are not sufficient for a Kubernetes cluster. If you examine closely the NAT properties in the VBox networking documentation, it prevents guests from talking to each other, a feature that Kubernetes relies on. I would recommend a bridged network instead, with promiscuous mode also enabled and set to allow all traffic.

Regards,
-Chris

lesnyrumcajs · June 2021

@chrispokorni Thanks for the suggestions. I actually tried it originally (because I didn't want to bother with port forwarding ). Unfortunately the error persists, just with my local network IP (172.16.x.x).

chrispokorni · June 2021

Hi @lesnyrumcajs,

The suggested configuration works for me with VBox VMs. I am wondering whether the host OS has an active firewall that may block traffic.

Regards,
-Chris

ioef · June 2021

Hello.

I am facing the same issue as mentioned above. I have created two brand new Ubuntu 18.04 LTS Server VMs using the iso "ubuntu-18.04.5-live-server-amd64.iso". One VM is for the cp and another for worker with 2 CPUs, 8GB RAM and 25GB disks each. Also the Networking has been configured to be bridged with promiscuous mode obtaining IP addresses from my local dhcp server. The cp is obtaining the IP 192.168.0.190 whether the worker the 192.168.0.191.

However when executing the ./k8scp.sh in the cp node i am stuck at the following phase:

[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

    Unfortunately, an error has occurred:
        timed out waiting for the condition

    This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

    If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Could you please help?

Thanks a lot!

chrispokorni · June 2021

Hi @ioef,

The course author posted the following:

https://forum.linuxfoundation.org/discussion/859506/k8scp-sh-install-script-issues

Regards,
-Chris

ioef · June 2021

Hello!

Thank you for the heads up. I have also commented in the indicated thread with a Workaround that worked for me.

lesnyrumcajs · June 2021

The workaround worked fine for me, thanks!

LFD259 - New version live on Kubernetes v1.21.1 (5.21.2021)

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)