LFS258_V2021-09-20 - Cannot initialize cluster with kubeadm 1.21.1 and crio 1.21.3

olmorupert · October 2021

Experiencing issue initializing cluster with kubeadm and crio-o

Trying to provision the exact system from the lab ( s_03 )

LFS258_V2021-09-20_SOLUTIONS.tar.xz, LFS258-labs_V2021-09-20.pdf

2 vCPU, 8 GB, Ubuntu 18.04.6 LTS
running on vsphere, 1 interface, no swap

installed

kubeadm 1.21.1-00
kubectl 1.21.1-00
kubelet 1.21.1-00
kubernetes-cni 0.8.7-00
cri-o 1.21.3~0
cri-o-runc 1.0.1~0

configured system and crio, enabled and started, according to the latest pdf, and verified from:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cri-o
for ubuntu 18.04

cgroup driver is systemd

/etc/hosts
updated

172.21.90.50 te-olmo-k8m0101 k8scp

using kubeadm config:
LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml
updated:
podNetwork: 100.68.0.0/16

Init:

kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

kubelet fails to start
[kubelet-check] Initial timeout of 40s passed.

log: Error getting node err="node \"k8scp" not found

Try adding the described crio.conf from the lab tar to /etc/crio/crio.conf,
could not find anything in the PDF about this file, just randomly found it in the tar basically.

Init:

kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

Journal:
okt 26 23:21:49 te-olmo-k8m0101 kubelet[25616]: E1026 23:21:49.526385 25616 kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"

and reading up on crio's documentation. supposedly i also have to add /etc/cni/net.d/<some-crio-bridge.conf>, but reading into crio doc atm, as the lab is totally unclear on this.

is there another version set that is supposed to work?

can we expect any questions about cri-o or are we expected to be able to configure it?

spending 5 ours last night with no success. documentation/lab seems unclear. quite frustrating.

olmorupert · October 2021

Also added

/etc/default/kubelet

KUBELET_EXTRA_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock"

olmorupert · October 2021

attached kubelet logs and kubeadm.yaml

olmorupert · November 2021

Hi @serewicz thanks for replying.

I'm sure I've executed all those steps. I am going to restart again completely from scratch and follow your described procedure.

The only thing different should be my hostname, and respectively, the /etc/hosts file:

olmo@te-olmo-k8m0101:~$ cat /etc/hosts
127.0.0.1       localhost
172.21.90.50    te-olmo-k8m0101.my.domain te-olmo-k8m0101 k8scp
172.21.90.51    te-olmo-k8m0102.my.domain      te-olmo-k8m0102
172.21.90.52    te-olmo-k8m0103.my.domain      te-olmo-k8m0103
172.21.90.53    te-olmo-k8w0101.my.domain      te-olmo-k8w0101
172.21.90.54    te-olmo-k8w0102.my.domain      te-olmo-k8w0102
172.21.90.55    te-olmo-hap0101.my.domain      te-olmo-hap0101

# The following lines are desirable for IPv6 capable hosts
::1     localhost ip6-localhost ip6-loopback
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

which should be fine...

ftr:

olmo@te-olmo-k8m0101:~$ cat /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.bridge.bridge-nf-call-ip6tables = 1
olmo@te-olmo-k8m0101:~$ sudo sysctl -a | grep "bridge-nf-call\|ip_forward"
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
net.ipv4.ip_forward_use_pmtu = 0

root@te-olmo-k8m0101:/etc/modules-load.d# cat 001-kubernetes.conf
overlay
br_netfilter
root@te-olmo-k8m0101:/root# lsmod | grep 'overlay\|br_netfilter'
br_netfilter           24576  0
bridge                155648  1 br_netfilter
overlay

olmo@te-olmo-k8m0101:~$ ping k8scp
PING te-olmo-k8m0101 (172.21.90.50) 56(84) bytes of data.
64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=2 ttl=64 time=0.049 ms

In regarding of not-mentioned config file I ment the crio.conf file, not the kubeadm-crio.yaml file:

s_03  ❯  ls -l
total 36
-rw-r--r-- 1 ruperto ruperto   121 Nov  2  2020 99-kubernetes-cri.conf
-rw-r--r-- 1 ruperto ruperto 10200 Aug 23 14:32 crio.conf
-rw-r--r-- 1 ruperto ruperto   958 Nov  2  2020 first.yaml
-rw-r--r-- 1 ruperto ruperto   163 Sep 20 15:36 kubeadm-config.yaml
-rw-r--r-- 1 ruperto ruperto  1699 Aug 23 14:32 kubeadm-crio.yaml
-rw-r--r-- 2 ruperto ruperto   206 Oct 23  2020 low-resource-range.yaml
-rw-r--r-- 1 ruperto ruperto  2469 Aug 23 14:32 second.yaml

however placing or not placing the file ( /etc/crio/crio.conf ) did not make a difference.

i'm going to try again and get back on this.

thanks for the reply.

olmorupert · November 2021

following your procedure at step # systemctl start crio, i get:

nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.498563086Z" level=info msg="Starting CRI-O, version: 1.21.3, git: ff0b7feb8e12509076b4b0e338b6334ce466b293(clean)"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499271441Z" level=info msg="Node configuration value for hugetlb cgroup is true"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499451353Z" level=info msg="Node configuration value for pid cgroup is true"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499652779Z" level=error msg="Node configuration validation for memoryswap cgroup failed: node not configured with memory swap"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499828792Z" level=info msg="Node configuration value for memoryswap cgroup is false"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.505779558Z" level=info msg="Node configuration value for systemd CollectMode is true"nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.517452800Z" level=info msg="Node configuration value for systemd AllowedCPUs is false"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.609768863Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.610176582Z" level=fatal msg="Validating runtime config: runtime validation: \"runc\" not found in $PATH: exec: \"runc\": executable file not found in $PATH"
nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Failed with result 'exit-code'.
nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).

adding this made crio happier:

root@te-olmo-k8m0101:/etc/crio/crio.conf.d# cat 10-runc.conf
[crio.runtime]
  default_runtime = "runc"
  [crio.runtime.runtimes]
    [crio.runtime.runtimes.runc]
      runtime_path="/usr/lib/cri-o-runc/sbin/runc"

also installed conntrack ( crio asked about it... )

nov 01 22:53:12 te-olmo-k8m0101 crio[2540]: W1101 22:53:12.999043    2540 hostport_manager.go:71] The binary conntrack is not installed, this can cause failures in network connection cleanup.

however. following exact above steps I again have the same results.

after founding issue: https://github.com/cri-o/cri-o/issues/3631

I also updated

/etc/contains/storage.conf
[storage.options.overlay]
#mountop = "nodev,metacopy=on"
mountop = "nodev"

after this

# kubeadm init --kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

worked

olmorupert · November 2021

however...

you are correct. and i screwed up an installation earlier.

i removed containers-common and deleted /etc/crio* /etc/containers and /etc/cni

and tried again and it worked.

crd477 · December 2021

Hello,

I had exactly the same initial problem as @olmorupert and I found this thread.

I am working in a very similar environment and am following along with the materials as they exist in LFS258_V2021-09-20_SOLUTIONS.tar.xz.

As the OP indicated, I had to make the following changes in order for this to work:

sudo sed -i 's/,metacopy=on//g' /etc/containers/storage.conf
sudo systemctl restart crio

(see also: https://forum.linuxfoundation.org/discussion/comment/31994#Comment_31994)

$ diff -u LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml kubeadm-crio.yaml
--- LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml 2021-08-23 08:32:54.000000000 -0400
+++ kubeadm-crio.yaml 2021-12-10 11:05:41.350953400 -0500

dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
-kubernetesVersion: 1.20.0
+kubernetesVersion: 1.21.1
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12

Without these two changes, I was also getting the:

kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"

error message after the kubeadm init invocation.

Everything is working for me up to this point now. Hopefully this helps someone else who is struggling with the cri-o method at least at the time of this writing.

olmorupert · January 2022

still stuck with the same issue when trying to go HA

try to reinstall from scratch.

and came back to my own post however, deleting

i removed containers-common and deleted rm -rf /etc/crio* /etc/containers and /etc/cni

i fail to be able install crio....

crio is reinstall containers-common as it should however, for some reason this is not creating all files as listed in the package.

$ dpkg -L containers-common
/.
/etc
/etc/containers
/etc/containers/policy.json
/etc/containers/registries.conf
/etc/containers/registries.conf.d
/etc/containers/registries.conf.d/000-shortnames.conf
/etc/containers/registries.d
/etc/containers/registries.d/default.yaml
/etc/containers/storage.conf
..

$ find /etc/containers/
/etc/containers/
/etc/containers/registries.d
/etc/containers/registries.conf.d

and crio fails with runc error, after setting it like above
/etc/crio/crio.conf.d/10-runc.conf
10-runc.conf
[crio.runtime]
default_runtime = "runc"
[crio.runtime.runtimes]
[crio.runtime.runtimes.runc]
runtime_path="/usr/lib/cri-o-runc/sbin/runc"

it mentions the missing policy.conf

so it's getting late and will try again tomorrow.

totally unclear why containers-common is not installing all required files.

i can rebootstrap the entire box as my other prepared machines still have the correct content of /etc/containers but i wish to know what causes me to fail to just to install crio at this stage.

olmorupert · January 2022

one thing i find is that cri-o is implicitly installing 1.21.4 whilst i'm installing 1.21.1 according to docs.

VER=1.21
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/ /" | tee -a /etc/apt/sources.list.d/cri-0.list

aligning 1.21.4 kubernetes packages makes it work.

LFS258_V2021-09-20 - Cannot initialize cluster with kubeadm 1.21.1 and crio 1.21.3

kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)