Welcome to the Linux Foundation Forum!

LFS258_V2021-09-20 - Cannot initialize cluster with kubeadm 1.21.1 and crio 1.21.3

Experiencing issue initializing cluster with kubeadm and crio-o

Trying to provision the exact system from the lab ( s_03 )

LFS258_V2021-09-20_SOLUTIONS.tar.xz, LFS258-labs_V2021-09-20.pdf

2 vCPU, 8 GB, Ubuntu 18.04.6 LTS
running on vsphere, 1 interface, no swap

installed

kubeadm 1.21.1-00
kubectl 1.21.1-00
kubelet 1.21.1-00
kubernetes-cni 0.8.7-00
cri-o 1.21.3~0
cri-o-runc 1.0.1~0

configured system and crio, enabled and started, according to the latest pdf, and verified from:
https://kubernetes.io/docs/setup/production-environment/container-runtimes/#cri-o
for ubuntu 18.04

cgroup driver is systemd

/etc/hosts
updated

172.21.90.50 te-olmo-k8m0101 k8scp

using kubeadm config:
LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml
updated:
podNetwork: 100.68.0.0/16

Init:

kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

kubelet fails to start
[kubelet-check] Initial timeout of 40s passed.

log: Error getting node err="node \"k8scp" not found

Try adding the described crio.conf from the lab tar to /etc/crio/crio.conf,
could not find anything in the PDF about this file, just randomly found it in the tar basically.

Init:

kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out

Journal:
okt 26 23:21:49 te-olmo-k8m0101 kubelet[25616]: E1026 23:21:49.526385 25616 kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"

and reading up on crio's documentation. supposedly i also have to add /etc/cni/net.d/<some-crio-bridge.conf>, but reading into crio doc atm, as the lab is totally unclear on this.

is there another version set that is supposed to work?

can we expect any questions about cri-o or are we expected to be able to configure it?

spending 5 ours last night with no success. documentation/lab seems unclear. quite frustrating.

Comments

  • olmorupert
    olmorupert Posts: 14
    edited October 2021

    Also added

    /etc/default/kubelet

    KUBELET_EXTRA_ARGS="--container-runtime=remote --container-runtime-endpoint=unix:///var/run/crio/crio.sock"

  • attached kubelet logs and kubeadm.yaml

  • serewicz
    serewicz Posts: 1,000

    Hello,

    I think you may need a closer examination of the lab. For example you said there was no mention of the kubeadm-crio.yaml file but it is specifically mentioned in step 14 and step 15. Also the error about not finding k8scp means you did not edit /etc/hosts properly.

    You may need to edit the kubeadm-crio.yaml file to be a matching version, such as 1.21.1. Otherwise I have just run the exact steps from the lab and it worked. Here is my command history as copy and paste. There is even an error where I did not edit, to illustrate that the lab worked as written.

    root@cp:~# history
    1 apt-get update && apt-get upgrade -y
    2 apt-get install -y vim
    3 modprobe overlay
    4 modprobe br_netfilter
    5 vim /etc/sysctl.d/99-kubernetes-cri.conf
    6 sysctl --system
    7 export OS=xUbuntu_18.04
    8 export VER=1.21
    9 echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/ /" | tee -a /etc/apt/sources.list.d/cri-0.list
    10 curl -L http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/Release.key | apt-key add -
    11 echo "deb https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/ /" | tee -a /etc/apt/sources.list.d/libcontainers.list
    12 curl -L https://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable/$OS/Release.key | apt-key add -
    13 apt-get update
    14 apt-get install -y cri-o cri-o-runc
    15 systemctl daemon-reload
    16 systemctl enable crio
    17 systemctl start crio
    18 systemctl status crio
    19 vim /etc/apt/sources.list.d/kubernetes.list
    20 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
    21 apt-get update
    22 apt-get install -y kubeadm=1.21.1-00 kubelet=1.21.1-00 kubectl=1.21.1-00
    23 apt-mark hold kubelet kubeadm kubectl
    24 wget https://docs.projectcalico.org/manifests/calico.yaml
    25 hostname -i
    26 vim /etc/hosts
    27 find /home -name kubeadm-crio.yaml
    28 cp /home/student/LFS458/SOLUTIONS/s_03/kubeadm-crio.yaml .
    29 kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out #<<-copy paste error
    30 kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
    31 vim kubeadm-crio.yaml
    32 kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
    33 history

    Your Kubernetes control-plane has initialized successfully!
    To start using your cluster, you need to run the following as a regular user:
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    Alternatively, if you are the root user, you can run:
    export KUBECONFIG=/etc/kubernetes/admin.conf
    You should now deploy a pod network to the cluster.
    Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
    https://kubernetes.io/docs/concepts/cluster-administration/addons/
    Then you can join any number of worker nodes by running the following on each as root:
    kubeadm join 10.128.0.34:6443 --token abcdef.0123456789abcdef \
    --discovery-token-ca-cert-hash >sha256:d0d0db476a0cfedf3aed709c23146630d73596c01996fd8879b5e68a08cfd9ee

    Regards,

  • Hi @serewicz thanks for replying.

    I'm sure I've executed all those steps. I am going to restart again completely from scratch and follow your described procedure.

    The only thing different should be my hostname, and respectively, the /etc/hosts file:

    olmo@te-olmo-k8m0101:~$ cat /etc/hosts
    127.0.0.1       localhost
    172.21.90.50    te-olmo-k8m0101.my.domain te-olmo-k8m0101 k8scp
    172.21.90.51    te-olmo-k8m0102.my.domain      te-olmo-k8m0102
    172.21.90.52    te-olmo-k8m0103.my.domain      te-olmo-k8m0103
    172.21.90.53    te-olmo-k8w0101.my.domain      te-olmo-k8w0101
    172.21.90.54    te-olmo-k8w0102.my.domain      te-olmo-k8w0102
    172.21.90.55    te-olmo-hap0101.my.domain      te-olmo-hap0101
    
    # The following lines are desirable for IPv6 capable hosts
    ::1     localhost ip6-localhost ip6-loopback
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    

    which should be fine...

    ftr:

    olmo@te-olmo-k8m0101:~$ cat /etc/sysctl.d/99-kubernetes-cri.conf
    net.bridge.bridge-nf-call-iptables = 1
    net.ipv4.ip_forward = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    olmo@te-olmo-k8m0101:~$ sudo sysctl -a | grep "bridge-nf-call\|ip_forward"
    net.bridge.bridge-nf-call-arptables = 1
    net.bridge.bridge-nf-call-ip6tables = 1
    net.bridge.bridge-nf-call-iptables = 1
    net.ipv4.ip_forward = 1
    net.ipv4.ip_forward_use_pmtu = 0
    
    root@te-olmo-k8m0101:/etc/modules-load.d# cat 001-kubernetes.conf
    overlay
    br_netfilter
    root@te-olmo-k8m0101:/root# lsmod | grep 'overlay\|br_netfilter'
    br_netfilter           24576  0
    bridge                155648  1 br_netfilter
    overlay
    
    
    
    

    olmo@te-olmo-k8m0101:~$ ping k8scp
    PING te-olmo-k8m0101 (172.21.90.50) 56(84) bytes of data.
    64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=1 ttl=64 time=0.032 ms
    64 bytes from te-olmo-k8m0101 (172.21.90.50): icmp_seq=2 ttl=64 time=0.049 ms

    In regarding of not-mentioned config file I ment the crio.conf file, not the kubeadm-crio.yaml file:

    s_03  ❯  ls -l
    total 36
    -rw-r--r-- 1 ruperto ruperto   121 Nov  2  2020 99-kubernetes-cri.conf
    -rw-r--r-- 1 ruperto ruperto 10200 Aug 23 14:32 crio.conf
    -rw-r--r-- 1 ruperto ruperto   958 Nov  2  2020 first.yaml
    -rw-r--r-- 1 ruperto ruperto   163 Sep 20 15:36 kubeadm-config.yaml
    -rw-r--r-- 1 ruperto ruperto  1699 Aug 23 14:32 kubeadm-crio.yaml
    -rw-r--r-- 2 ruperto ruperto   206 Oct 23  2020 low-resource-range.yaml
    -rw-r--r-- 1 ruperto ruperto  2469 Aug 23 14:32 second.yaml
    

    however placing or not placing the file ( /etc/crio/crio.conf ) did not make a difference.

    i'm going to try again and get back on this.

    thanks for the reply.

  • following your procedure at step # systemctl start crio, i get:

    nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Starting Container Runtime Interface for OCI (CRI-O)...
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.498563086Z" level=info msg="Starting CRI-O, version: 1.21.3, git: ff0b7feb8e12509076b4b0e338b6334ce466b293(clean)"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499271441Z" level=info msg="Node configuration value for hugetlb cgroup is true"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499451353Z" level=info msg="Node configuration value for pid cgroup is true"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499652779Z" level=error msg="Node configuration validation for memoryswap cgroup failed: node not configured with memory swap"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.499828792Z" level=info msg="Node configuration value for memoryswap cgroup is false"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.505779558Z" level=info msg="Node configuration value for systemd CollectMode is true"nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.517452800Z" level=info msg="Node configuration value for systemd AllowedCPUs is false"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.609768863Z" level=info msg="Using default capabilities: CAP_CHOWN, CAP_DAC_OVERRIDE, CAP_FSETID, CAP_FOWNER, CAP_SETGID, CAP_SETUID, CAP_SETPCAP, CAP_NET_BIND_SERVICE, CAP_KILL"
    nov 01 22:34:44 te-olmo-k8m0101 crio[1975]: time="2021-11-01 22:34:44.610176582Z" level=fatal msg="Validating runtime config: runtime validation: \"runc\" not found in $PATH: exec: \"runc\": executable file not found in $PATH"
    nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
    nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: crio.service: Failed with result 'exit-code'.
    nov 01 22:34:44 te-olmo-k8m0101 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
    

    adding this made crio happier:

    root@te-olmo-k8m0101:/etc/crio/crio.conf.d# cat 10-runc.conf
    [crio.runtime]
      default_runtime = "runc"
      [crio.runtime.runtimes]
        [crio.runtime.runtimes.runc]
          runtime_path="/usr/lib/cri-o-runc/sbin/runc"
    

    also installed conntrack ( crio asked about it... )

    nov 01 22:53:12 te-olmo-k8m0101 crio[2540]: W1101 22:53:12.999043    2540 hostport_manager.go:71] The binary conntrack is not installed, this can cause failures in network connection cleanup.
    

    however. following exact above steps I again have the same results.

    after founding issue: https://github.com/cri-o/cri-o/issues/3631

    I also updated

    /etc/contains/storage.conf
    [storage.options.overlay]
    #mountop = "nodev,metacopy=on"
    mountop = "nodev"
    

    after this

    # kubeadm init --kubeadm init --config=kubeadm-crio.yaml --upload-certs | tee kubeadm-init.out
    

    worked

  • however...

    you are correct. and i screwed up an installation earlier.

    i removed containers-common and deleted /etc/crio* /etc/containers and /etc/cni

    and tried again and it worked.

  • Hello,

    I had exactly the same initial problem as @olmorupert and I found this thread.

    I am working in a very similar environment and am following along with the materials as they exist in LFS258_V2021-09-20_SOLUTIONS.tar.xz.

    As the OP indicated, I had to make the following changes in order for this to work:

    sudo sed -i 's/,metacopy=on//g' /etc/containers/storage.conf
    sudo systemctl restart crio

    (see also: https://forum.linuxfoundation.org/discussion/comment/31994#Comment_31994)

    $ diff -u LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml kubeadm-crio.yaml
    --- LFS258/SOLUTIONS/s_03/kubeadm-crio.yaml 2021-08-23 08:32:54.000000000 -0400
    +++ kubeadm-crio.yaml 2021-12-10 11:05:41.350953400 -0500

    dataDir: /var/lib/etcd
    imageRepository: k8s.gcr.io
    kind: ClusterConfiguration
    -kubernetesVersion: 1.20.0
    +kubernetesVersion: 1.21.1
    networking:
    dnsDomain: cluster.local
    serviceSubnet: 10.96.0.0/12

    Without these two changes, I was also getting the:

    kubelet.go:2291] "Error getting node" err="node \"k8scp\" not found"

    error message after the kubeadm init invocation.

    Everything is working for me up to this point now. Hopefully this helps someone else who is struggling with the cri-o method at least at the time of this writing.

  • still stuck with the same issue when trying to go HA

    try to reinstall from scratch.

    and came back to my own post however, deleting

    i removed containers-common and deleted rm -rf /etc/crio* /etc/containers and /etc/cni

    i fail to be able install crio....

    crio is reinstall containers-common as it should however, for some reason this is not creating all files as listed in the package.

    $ dpkg -L containers-common
    /.
    /etc
    /etc/containers
    /etc/containers/policy.json
    /etc/containers/registries.conf
    /etc/containers/registries.conf.d
    /etc/containers/registries.conf.d/000-shortnames.conf
    /etc/containers/registries.d
    /etc/containers/registries.d/default.yaml
    /etc/containers/storage.conf
    ..

    $ find /etc/containers/
    /etc/containers/
    /etc/containers/registries.d
    /etc/containers/registries.conf.d

    and crio fails with runc error, after setting it like above
    /etc/crio/crio.conf.d/10-runc.conf
    10-runc.conf
    [crio.runtime]
    default_runtime = "runc"
    [crio.runtime.runtimes]
    [crio.runtime.runtimes.runc]
    runtime_path="/usr/lib/cri-o-runc/sbin/runc"

    it mentions the missing policy.conf

    so it's getting late and will try again tomorrow.

    totally unclear why containers-common is not installing all required files.

    i can rebootstrap the entire box as my other prepared machines still have the correct content of /etc/containers but i wish to know what causes me to fail to just to install crio at this stage.

  • one thing i find is that cri-o is implicitly installing 1.21.4 whilst i'm installing 1.21.1 according to docs.

    VER=1.21
    echo "deb http://download.opensuse.org/repositories/devel:/kubic:/libcontainers:/stable:/cri-o:/$VER/$OS/ /" | tee -a /etc/apt/sources.list.d/cri-0.list

    aligning 1.21.4 kubernetes packages makes it work.

Categories

Upcoming Training