Welcome to the Linux Foundation Forum!

Cilium problem: cni plugin not initialized

PetroKazmirchuk
PetroKazmirchuk Posts: 15
edited March 2024 in LFS258 Class Forum

I'm at the end of Lab exercise 3.2 "Grow the cluster" and my working node remains permanently NotReady.
My setup:
VirtualBox with network in promiscuous mode
VM with the control node: openSUSE Leap 15.5, hostname: opensuse
VM with the worker node: openSUSE MicroOS 20240229, hostname: microos
kubectl get nodes

NAME       STATUS     ROLES           AGE     VERSION
microos    NotReady   <none>          6d22h   v1.27.11
opensuse   Ready      control-plane   9d      v1.27.11

I've seen similar complaints in this forum, but they were caused by IP conflicts with Cilium's 192.168.0.0/16 , which is not my case AFAICS. My /etc/hosts:

10.0.2.15 opensuse
10.0.2.4 microos
10.0.2.15 k8scp

Cilium is working fine on the control node:
crictl ps

CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
0945fc48055a8       ead0a4a53df89       16 hours ago        Running             coredns                   2                   7ea706d95fe31       coredns-5d78c9869d-gpc9r
3606309809790       ead0a4a53df89       16 hours ago        Running             coredns                   2                   e3ceacba89bb8       coredns-5d78c9869d-ph479
4ae0666756199       33a5be5e9ebc0       16 hours ago        Running             cilium-agent              2                   8c3fe5b19db2a       cilium-67wkh
a5e3701e6e70b       c961e5e7cae7b       16 hours ago        Running             cilium-operator           2                   d069a8f4fbd28       cilium-operator-788c7d7585-nw9x7
a4e941333b15f       fbe39e5d66b6a       16 hours ago        Running             kube-proxy                2                   896e8ef472115       kube-proxy-shcq8
e965fbad9f5fa       6468fa8f98696       16 hours ago        Running             kube-scheduler            3                   f247867c4d25a       kube-scheduler-opensuse
71caba1690736       6f6e73fa8162b       16 hours ago        Running             kube-apiserver            3                   5f7116c4e5188       kube-apiserver-opensuse
068bd43d6476e       c6b5118178229       16 hours ago        Running             kube-controller-manager   3                   43b321f99b050       kube-controller-manager-opensuse
f30130109194b       a0eed15eed449       16 hours ago        Running             etcd                      3                   fb63047307af7       etcd-opensuse

But on the worker node it is in infinite restart loop:
crictl ps -a

CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID              POD
9add745e3a6eb       c961e5e7cae7b       16 seconds ago      Running             cilium-operator           41                  ab11da22b98ef       cilium-operator-788c7d7585-z452c
a9f521243e050       33a5be5e9ebc0       2 minutes ago       Exited              install-cni-binaries      0                   6d1137f51732e       cilium-9sjjt
28abb9123d681       33a5be5e9ebc0       2 minutes ago       Exited              clean-cilium-state        0                   6d1137f51732e       cilium-9sjjt
8b2fecb2864c9       33a5be5e9ebc0       2 minutes ago       Exited              mount-bpf-fs              0                   6d1137f51732e       cilium-9sjjt
47ae718fd829f       33a5be5e9ebc0       2 minutes ago       Exited              apply-sysctl-overwrites   0                   6d1137f51732e       cilium-9sjjt
9dd78b138ad37       33a5be5e9ebc0       2 minutes ago       Exited              mount-cgroup              0                   6d1137f51732e       cilium-9sjjt
ae5c70717123e       33a5be5e9ebc0       2 minutes ago       Exited              config                    42                  6d1137f51732e       cilium-9sjjt
0dc297dd04fcc       fbe39e5d66b6a       2 minutes ago       Exited              kube-proxy                42                  609c20e1296ad       kube-proxy-pm7g9
e7ab04008d4a5       33a5be5e9ebc0       3 minutes ago       Exited              cilium-agent              47                  eb5b3b5cc0afe       cilium-9sjjt
b7e4a8902c7cf       c961e5e7cae7b       4 minutes ago       Exited              cilium-operator           40                  e55810f0185e4       cilium-operator-788c7d7585-z452c

Comments

  • PetroKazmirchuk
    PetroKazmirchuk Posts: 15
    edited March 2024

    on the worker node:
    journalctl -u kubelet -f

    Mar 09 14:29:33 microos kubelet[6552]: E0309 14:29:33.566848    6552 kubelet.go:2760] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
    

    Using crictl logs I've checked the logs of cilium-operator and they look fine until some point when it receives a "terminate" signal:

    level=info msg="Establishing connection to apiserver" host="https://10.96.0.1:443" subsys=k8s-client
    level=info msg="Connected to apiserver" subsys=k8s-client
    ...
    level=info msg="attempting to acquire leader lease kube-system/cilium-operator-resource-lock..." subsys=klog
    level=info msg="Leader re-election complete" newLeader=opensuse-lxMlRHFwMR operatorID=microos-HVDKTCgqWG subsys=cilium-operator-generic
    level=info msg="Start hook executed" duration=6.257525ms function="*api.server.Start" subsys=hive
    level=info msg="Signal received" signal=terminated subsys=hive
    

    from cilium-agent:

    level=info msg="Compiled new BPF template" ...
    level=info msg="Rewrote endpoint BPF program" ...
    level=info msg="Serving cilium health API at unix:///var/run/cilium/health.sock" subsys=health-server
    level=info msg="Signal received" signal=terminated subsys=hive
    
  • Problem solved.
    For anyone coming here later from search: I had to reboot the worker node after kubeadm join. Probably, to apply new sysctl settings from Cilium.

  • chrispokorni
    chrispokorni Posts: 2,384

    Hi @PetroKazmirchuk,

    Glad you had it figured out. However, keep in mind that the lab material was written and tested on Ubuntu 20.04 LTS, to be in sync with the OS requirements of the CKA certification exam.
    In addition, "k8scp" was intended to be an alias to the control plane node, not the actual hostname.

    Regards,
    -Chris

  • at work I need OpenSUSE, so my choice is deliberate, thanks.
    And k8scp is indeed an alias.
    Unfortunately, I've hit the new problem right away:
    "Probe failed" for cilium-operator
    Get \"http://127.0.0.1:9234/healthz\": dial tcp 127.0.0.1:9234: connect: connection refused"

    Looking into logs of cilium-operator, I can see any mention of a health monitoring endpoint (unlike cilium-agent that does say "Serving cilium health API at unix:///var/run/cilium/health.sock", I hope it's ok)

  • kubelet logs show the loop of recreating cilium-agent, cilium-operator and kube-proxy with no errors in the respective container logs. Somebody is sending them the "terminate" signal.
    How can I troubleshoot this further?

  • seems like MicroOS is too exotic for Cilium :( I've created a new worker node using the same OpenSUSE Leap as the cp node, and Cilium started fine there

Categories

Upcoming Training