LFS258_V2021-09-20 - Cannot initialize cluster with kubeadm 1.21.1 and crio 1.21.3
Experiencing issue initializing cluster with kubeadm and crio-o
Trying to provision the exact system from the lab ( s_03 )
LFS258_V2021-09-20_SOLUTIONS.tar.xz, LFS258-labs_V2021-09-20.pdf
2 vCPU, 8 GB, Ubuntu 18.04.6 LTS
running on vsphere, 1 interface, no swap
installed
kubeadm 1.21.1-00
kubectl 1.21.1-00
kubelet 1.21.1-00
kubernetes-cni 0.8.7-00
cri-o 1.21.3~0
cri-o-runc 1.0.1~0
configured system and crio, enabled and started, according to the latest pdf, and verified from:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cri-o
for ubuntu 18.04
cgroup driver is systemd
/etc/hosts
updated
172.21.90.50 te-olmo-k8m0101 k8scp
using kubeadm config:
LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml
updated:
podNetwork: 100.68.0.0/16
Init:
kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
kubelet fails to start
[kubelet-check] Initial timeout of 40s passed.
log: Error getting node err="node \"k8scp" not found
Try adding the described crio.conf from the lab tar to /etc/crio/crio.conf,
could not find anything in the PDF about this file, just randomly found it in the tar basically.
Init:
kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
Journal:
okt 26 23:21:49 te-olmo-k8m0101 kubelet[25616]: E1026 23:21:49.526385 25616 kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"
and reading up on crio's documentation. supposedly i also have to add /etc/cni/net.d/<some-crio-bridge.conf>, but reading into crio doc atm, as the lab is totally unclear on this.
is there another version set that is supposed to work?
can we expect any questions about cri-o or are we expected to be able to configure it?
spending 5 ours last night with no success. documentation/lab seems unclear. quite frustrating.
Comments
-
Also added
/etc/default/kubelet
KUBELET_EXTRA_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock"
0 -
-
Hi @serewicz thanks for replying.
I'm sure I've executed all those steps. I am going to restart again completely from scratch and follow your described procedure.
The only thing different should be my hostname, and respectively, the /etc/hosts file:
olmo@te-olmo-k8m0101:~$ cat /etc/hosts 127.0.0.1 localhost 172.21.90.50 te-olmo-k8m0101.my.domain te-olmo-k8m0101 k8scp 172.21.90.51 te-olmo-k8m0102.my.domain te-olmo-k8m0102 172.21.90.52 te-olmo-k8m0103.my.domain te-olmo-k8m0103 172.21.90.53 te-olmo-k8w0101.my.domain te-olmo-k8w0101 172.21.90.54 te-olmo-k8w0102.my.domain te-olmo-k8w0102 172.21.90.55 te-olmo-hap0101.my.domain te-olmo-hap0101 # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters
which should be fine...
ftr:
olmo@te-olmo-k8m0101:~$ cat /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 olmo@te-olmo-k8m0101:~$ sudo sysctl -a | grep "bridge-nf-call\|ip_forward" net.bridge.bridge-nf-call-arptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_use_pmtu = 0 root@te-olmo-k8m0101:/etc/modules-load.d# cat 001-kubernetes.conf overlay br_netfilter root@te-olmo-k8m0101:/root# lsmod | grep 'overlay\|br_netfilter' br_netfilter 24576 0 bridge 155648 1 br_netfilter overlay
olmo@te-olmo-k8m0101:~$ ping k8scp
PING te-olmo-k8m0101 (172.21.90.50) 56(84) bytes of data.
64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=2 ttl=64 time=0.049 msIn regarding of not-mentioned config file I ment the crio.conf file, not the kubeadm-crio.yaml file:
s_03 ❯ ls -l total 36 -rw-r--r-- 1 ruperto ruperto 121 Nov 2 2020 99-kubernetes-cri.conf -rw-r--r-- 1 ruperto ruperto 10200 Aug 23 14:32 crio.conf -rw-r--r-- 1 ruperto ruperto 958 Nov 2 2020 first.yaml -rw-r--r-- 1 ruperto ruperto 163 Sep 20 15:36 kubeadm-config.yaml -rw-r--r-- 1 ruperto ruperto 1699 Aug 23 14:32 kubeadm-crio.yaml -rw-r--r-- 2 ruperto ruperto 206 Oct 23 2020 low-resource-range.yaml -rw-r--r-- 1 ruperto ruperto 2469 Aug 23 14:32 second.yaml
however placing or not placing the file ( /etc/crio/crio.conf ) did not make a difference.
i'm going to try again and get back on this.
thanks for the reply.
0 -
following your procedure at step # systemctl start crio, i get:
nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)... nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.498563086Z" level=info msg="Starting CRI-O, version: 1.21.3, git: ff0b7feb8e12509076b4b0e338b6334ce466b293(clean)" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499271441Z" level=info msg="Node configuration value for hugetlb cgroup is true" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499451353Z" level=info msg="Node configuration value for pid cgroup is true" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499652779Z" level=error msg="Node configuration validation for memoryswap cgroup failed: node not configured with memory swap" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499828792Z" level=info msg="Node configuration value for memoryswap cgroup is false" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.505779558Z" level=info msg="Node configuration value for systemd CollectMode is true"nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.517452800Z" level=info msg="Node configuration value for systemd AllowedCPUs is false" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.609768863Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.610176582Z" level=fatal msg="Validating runtime config: runtime validation: \"runc\" not found in $PATH: exec: \"runc\": executable file not found in $PATH" nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Failed with result 'exit-code'. nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
adding this made crio happier:
root@te-olmo-k8m0101:/etc/crio/crio.conf.d# cat 10-runc.conf [crio.runtime] default_runtime = "runc" [crio.runtime.runtimes] [crio.runtime.runtimes.runc] runtime_path="/usr/lib/cri-o-runc/sbin/runc"also installed conntrack ( crio asked about it... )
nov 01 22:53:12 te-olmo-k8m0101 crio[2540]: W1101 22:53:12.999043 2540 hostport_manager.go:71] The binary conntrack is not installed, this can cause failures in network connection cleanup.
however. following exact above steps I again have the same results.
after founding issue: https://github.com/cri-o/cri-o/issues/3631
I also updated
/etc/contains/storage.conf [storage.options.overlay] #mountop = "nodev,metacopy=on" mountop = "nodev"
after this
# kubeadm init --kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
worked
0 -
however...
you are correct. and i screwed up an installation earlier.
i removed containers-common and deleted /etc/crio* /etc/containers and /etc/cni
and tried again and it worked.
0 -
Hello,
I had exactly the same initial problem as @olmorupert and I found this thread.
I am working in a very similar environment and am following along with the materials as they exist in
LFS258_V2021-09-20_SOLUTIONS.tar.xz.As the OP indicated, I had to make the following changes in order for this to work:
sudo sed -i 's/,metacopy=on//g' /etc/containers/storage.conf
sudo systemctl restart crio(see also: https://forum.linuxfoundation.org/discussion/comment/31994#Comment_31994)
$ diff -u LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml kubeadm-crio.yaml
--- LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml 2021-08-23 08:32:54.000000000 -0400
+++ kubeadm-crio.yaml 2021-12-10 11:05:41.350953400 -0500
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
-kubernetesVersion: 1.20.0
+kubernetesVersion: 1.21.1
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12Without these two changes, I was also getting the:
kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"
error message after the
kubeadm initinvocation.Everything is working for me up to this point now. Hopefully this helps someone else who is struggling with the cri-o method at least at the time of this writing.
0 -
still stuck with the same issue when trying to go HA
try to reinstall from scratch.
and came back to my own post however, deleting
i removed containers-common and deleted rm -rf /etc/crio* /etc/containers and /etc/cni
i fail to be able install crio....
crio is reinstall containers-common as it should however, for some reason this is not creating all files as listed in the package.
$ dpkg -L containers-common
/.
/etc
/etc/containers
/etc/containers/policy.json
/etc/containers/registries.conf
/etc/containers/registries.conf.d
/etc/containers/registries.conf.d/000-shortnames.conf
/etc/containers/registries.d
/etc/containers/registries.d/default.yaml
/etc/containers/storage.conf
..$ find /etc/containers/
/etc/containers/
/etc/containers/registries.d
/etc/containers/registries.conf.dand crio fails with runc error, after setting it like above
/etc/crio/crio.conf.d/10-runc.conf
10-runc.conf
[crio.runtime]
default_runtime = "runc"
[crio.runtime.runtimes]
[crio.runtime.runtimes.runc]
runtime_path="/usr/lib/cri-o-runc/sbin/runc"it mentions the missing policy.conf
so it's getting late and will try again tomorrow.
totally unclear why containers-common is not installing all required files.
i can rebootstrap the entire box as my other prepared machines still have the correct content of /etc/containers but i wish to know what causes me to fail to just to install crio at this stage.
0 -
one thing i find is that cri-o is implicitly installing 1.21.4 whilst i'm installing 1.21.1 according to docs.
VER=1.21
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/ /" | tee -a /etc/apt/sources.list.d/cri-0.listaligning 1.21.4 kubernetes packages makes it work.
0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 3 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 1 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 983 Software
- 375 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)