LFS258_V2021-09-20 - Cannot initialize cluster with kubeadm 1.21.1 and crio 1.21.3
Experiencing issue initializing cluster with kubeadm and crio-o
Trying to provision the exact system from the lab ( s_03 )
LFS258_V2021-09-20_SOLUTIONS.tar.xz, LFS258-labs_V2021-09-20.pdf
2 vCPU, 8 GB, Ubuntu 18.04.6 LTS
running on vsphere, 1 interface, no swap
installed
kubeadm 1.21.1-00
kubectl 1.21.1-00
kubelet 1.21.1-00
kubernetes-cni 0.8.7-00
cri-o 1.21.3~0
cri-o-runc 1.0.1~0
configured system and crio, enabled and started, according to the latest pdf, and verified from:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cri-o
for ubuntu 18.04
cgroup driver is systemd
/etc/hosts
updated
172.21.90.50 te-olmo-k8m0101 k8scp
using kubeadm config:
LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml
updated:
podNetwork: 100.68.0.0/16
Init:
kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
kubelet fails to start
[kubelet-check] Initial timeout of 40s passed.
log: Error getting node err="node \"k8scp" not found
Try adding the described crio.conf from the lab tar to /etc/crio/crio.conf,
could not find anything in the PDF about this file, just randomly found it in the tar basically.
Init:
kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
Journal:
okt 26 23:21:49 te-olmo-k8m0101 kubelet[25616]: E1026 23:21:49.526385 25616 kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"
and reading up on crio's documentation. supposedly i also have to add /etc/cni/net.d/<some-crio-bridge.conf>, but reading into crio doc atm, as the lab is totally unclear on this.
is there another version set that is supposed to work?
can we expect any questions about cri-o or are we expected to be able to configure it?
spending 5 ours last night with no success. documentation/lab seems unclear. quite frustrating.
Comments
-
Also added
/etc/default/kubelet
KUBELET_EXTRA_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock"
0 -
-
Hello,
I think you may need a closer examination of the lab. For example you said there was no mention of the kubeadm-crio.yaml file but it is specifically mentioned in step 14 and step 15. Also the error about not finding k8scp means you did not edit /etc/hosts properly.
You may need to edit the kubeadm-crio.yaml file to be a matching version, such as 1.21.1. Otherwise I have just run the exact steps from the lab and it worked. Here is my command history as copy and paste. There is even an error where I did not edit, to illustrate that the lab worked as written.
root@cp:~# history
1 apt-get update && apt-get upgrade -y
2 apt-get install -y vim
3 modprobe overlay
4 modprobe br_netfilter
5 vim /etc/sysctl.d/99-kubernetes-cri.conf
6 sysctl --system
7 export OS=xUbuntu_18.04
8 export VER=1.21
9 echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/ /" | tee -a /etc/apt/sources.list.d/cri-0.list
10 curl -L http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/Release.key | apt-key add -
11 echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" | tee -a /etc/apt/sources.list.d/libcontainers.list
12 curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | apt-key add -
13 apt-get update
14 apt-get install -y cri-o cri-o-runc
15 systemctl daemon-reload
16 systemctl enable crio
17 systemctl start crio
18 systemctl status crio
19 vim /etc/apt/sources.list.d/kubernetes.list
20 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
21 apt-get update
22 apt-get install -y kubeadm=1.21.1-00 kubelet=1.21.1-00 kubectl=1.21.1-00
23 apt-mark hold kubelet kubeadm kubectl
24 wget https://docs.projectcalico.org/manifests/calico.yaml
25 hostname -i
26 vim /etc/hosts
27 find /home -name kubeadm-crio.yaml
28 cp /home/student/LFS458/SOLUTIONS/s_03/kubeadm-crio.yaml .
29 kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out #<<-copy paste error
30 kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
31 vim kubeadm-crio.yaml
32 kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
33 historyYour Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.128.0.34:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash >sha256:d0d0db476a0cfedf3aed709c23146630d73596c01996fd8879b5e68a08cfd9eeRegards,
0 -
Hi @serewicz thanks for replying.
I'm sure I've executed all those steps. I am going to restart again completely from scratch and follow your described procedure.
The only thing different should be my hostname, and respectively, the /etc/hosts file:
olmo@te-olmo-k8m0101:~$ cat /etc/hosts 127.0.0.1 localhost 172.21.90.50 te-olmo-k8m0101.my.domain te-olmo-k8m0101 k8scp 172.21.90.51 te-olmo-k8m0102.my.domain te-olmo-k8m0102 172.21.90.52 te-olmo-k8m0103.my.domain te-olmo-k8m0103 172.21.90.53 te-olmo-k8w0101.my.domain te-olmo-k8w0101 172.21.90.54 te-olmo-k8w0102.my.domain te-olmo-k8w0102 172.21.90.55 te-olmo-hap0101.my.domain te-olmo-hap0101 # The following lines are desirable for IPv6 capable hosts ::1 localhost ip6-localhost ip6-loopback ff02::1 ip6-allnodes ff02::2 ip6-allrouters
which should be fine...
ftr:
olmo@te-olmo-k8m0101:~$ cat /etc/sysctl.d/99-kubernetes-cri.conf net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.bridge.bridge-nf-call-ip6tables = 1 olmo@te-olmo-k8m0101:~$ sudo sysctl -a | grep "bridge-nf-call\|ip_forward" net.bridge.bridge-nf-call-arptables = 1 net.bridge.bridge-nf-call-ip6tables = 1 net.bridge.bridge-nf-call-iptables = 1 net.ipv4.ip_forward = 1 net.ipv4.ip_forward_use_pmtu = 0 root@te-olmo-k8m0101:/etc/modules-load.d# cat 001-kubernetes.conf overlay br_netfilter root@te-olmo-k8m0101:/root# lsmod | grep 'overlay\|br_netfilter' br_netfilter 24576 0 bridge 155648 1 br_netfilter overlay
olmo@te-olmo-k8m0101:~$ ping k8scp
PING te-olmo-k8m0101 (172.21.90.50) 56(84) bytes of data.
64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=2 ttl=64 time=0.049 msIn regarding of not-mentioned config file I ment the crio.conf file, not the kubeadm-crio.yaml file:
s_03 ❯ ls -l total 36 -rw-r--r-- 1 ruperto ruperto 121 Nov 2 2020 99-kubernetes-cri.conf -rw-r--r-- 1 ruperto ruperto 10200 Aug 23 14:32 crio.conf -rw-r--r-- 1 ruperto ruperto 958 Nov 2 2020 first.yaml -rw-r--r-- 1 ruperto ruperto 163 Sep 20 15:36 kubeadm-config.yaml -rw-r--r-- 1 ruperto ruperto 1699 Aug 23 14:32 kubeadm-crio.yaml -rw-r--r-- 2 ruperto ruperto 206 Oct 23 2020 low-resource-range.yaml -rw-r--r-- 1 ruperto ruperto 2469 Aug 23 14:32 second.yaml
however placing or not placing the file ( /etc/crio/crio.conf ) did not make a difference.
i'm going to try again and get back on this.
thanks for the reply.
0 -
following your procedure at step # systemctl start crio, i get:
nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)... nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.498563086Z" level=info msg="Starting CRI-O, version: 1.21.3, git: ff0b7feb8e12509076b4b0e338b6334ce466b293(clean)" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499271441Z" level=info msg="Node configuration value for hugetlb cgroup is true" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499451353Z" level=info msg="Node configuration value for pid cgroup is true" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499652779Z" level=error msg="Node configuration validation for memoryswap cgroup failed: node not configured with memory swap" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499828792Z" level=info msg="Node configuration value for memoryswap cgroup is false" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.505779558Z" level=info msg="Node configuration value for systemd CollectMode is true"nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.517452800Z" level=info msg="Node configuration value for systemd AllowedCPUs is false" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.609768863Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL" nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.610176582Z" level=fatal msg="Validating runtime config: runtime validation: \"runc\" not found in $PATH: exec: \"runc\": executable file not found in $PATH" nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Failed with result 'exit-code'. nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
adding this made crio happier:
root@te-olmo-k8m0101:/etc/crio/crio.conf.d# cat 10-runc.conf [crio.runtime] default_runtime = "runc" [crio.runtime.runtimes] [crio.runtime.runtimes.runc] runtime_path="/usr/lib/cri-o-runc/sbin/runc"
also installed conntrack ( crio asked about it... )
nov 01 22:53:12 te-olmo-k8m0101 crio[2540]: W1101 22:53:12.999043 2540 hostport_manager.go:71] The binary conntrack is not installed, this can cause failures in network connection cleanup.
however. following exact above steps I again have the same results.
after founding issue: https://github.com/cri-o/cri-o/issues/3631
I also updated
/etc/contains/storage.conf [storage.options.overlay] #mountop = "nodev,metacopy=on" mountop = "nodev"
after this
# kubeadm init --kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
worked
0 -
however...
you are correct. and i screwed up an installation earlier.
i removed containers-common and deleted /etc/crio* /etc/containers and /etc/cni
and tried again and it worked.
0 -
Hello,
I had exactly the same initial problem as @olmorupert and I found this thread.
I am working in a very similar environment and am following along with the materials as they exist in
LFS258_V2021-09-20_SOLUTIONS.tar.xz
.As the OP indicated, I had to make the following changes in order for this to work:
sudo sed -i 's/,metacopy=on//g' /etc/containers/storage.conf
sudo systemctl restart crio(see also: https://forum.linuxfoundation.org/discussion/comment/31994#Comment_31994)
$ diff -u LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml kubeadm-crio.yaml
--- LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml 2021-08-23 08:32:54.000000000 -0400
+++ kubeadm-crio.yaml 2021-12-10 11:05:41.350953400 -0500
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
-kubernetesVersion: 1.20.0
+kubernetesVersion: 1.21.1
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12Without these two changes, I was also getting the:
kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"
error message after the
kubeadm init
invocation.Everything is working for me up to this point now. Hopefully this helps someone else who is struggling with the cri-o method at least at the time of this writing.
0 -
still stuck with the same issue when trying to go HA
try to reinstall from scratch.
and came back to my own post however, deleting
i removed containers-common and deleted rm -rf /etc/crio* /etc/containers and /etc/cni
i fail to be able install crio....
crio is reinstall containers-common as it should however, for some reason this is not creating all files as listed in the package.
$ dpkg -L containers-common
/.
/etc
/etc/containers
/etc/containers/policy.json
/etc/containers/registries.conf
/etc/containers/registries.conf.d
/etc/containers/registries.conf.d/000-shortnames.conf
/etc/containers/registries.d
/etc/containers/registries.d/default.yaml
/etc/containers/storage.conf
..$ find /etc/containers/
/etc/containers/
/etc/containers/registries.d
/etc/containers/registries.conf.dand crio fails with runc error, after setting it like above
/etc/crio/crio.conf.d/10-runc.conf
10-runc.conf
[crio.runtime]
default_runtime = "runc"
[crio.runtime.runtimes]
[crio.runtime.runtimes.runc]
runtime_path="/usr/lib/cri-o-runc/sbin/runc"it mentions the missing policy.conf
so it's getting late and will try again tomorrow.
totally unclear why containers-common is not installing all required files.
i can rebootstrap the entire box as my other prepared machines still have the correct content of /etc/containers but i wish to know what causes me to fail to just to install crio at this stage.
0 -
one thing i find is that cri-o is implicitly installing 1.21.4 whilst i'm installing 1.21.1 according to docs.
VER=1.21
echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/ /" | tee -a /etc/apt/sources.list.d/cri-0.listaligning 1.21.4 kubernetes packages makes it work.
0
Categories
- All Categories
- 167 LFX Mentorship
- 219 LFX Mentorship: Linux Kernel
- 795 Linux Foundation IT Professional Programs
- 355 Cloud Engineer IT Professional Program
- 179 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 127 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 138 Express Courses - Discussion Forum
- 6.2K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 35 LFD102 Class Forum
- 227 LFD103 Class Forum
- 14 LFD110 Class Forum
- 39 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 697 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 154 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 1 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 7 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 33 LFS158 Class Forum
- 8 LFS162 Class Forum
- 1 LFS166 Class Forum
- 1 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 1 LFS178 Class Forum
- 1 LFS180 Class Forum
- 1 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 102 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 48 LFS241 Class Forum
- 42 LFS242 Class Forum
- 37 LFS243 Class Forum
- 15 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- 50 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 154 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 5 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 111 LFS260 Class Forum
- 159 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 24 LFS268 Class Forum
- 29 LFS269 Class Forum
- 1 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 9 LFW111 Class Forum
- 261 LFW211 Class Forum
- 182 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 96 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 758 Linux Distributions
- 80 Debian
- 67 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 143 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 461 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 64 Mobile Computing
- 17 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 19 Study Material
- 507 Programming and Development
- 285 Kernel Development
- 204 Software Development
- 1.8K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 309 Installation
- 97 All In Program
- 97 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)