Welcome to the Linux Foundation Forum!

Help with lab - Upgrade the Cluster

leonardo2021
leonardo2021 Posts: 4
edited August 2021 in LFS258 Class Forum

I've some problems with lab 4.1 Upgrade The Cluster, after I finished the CP node upgrade the Calico Controller Manager does not start.

NAMESPACE     NAME                                       READY   STATUS             RESTARTS   AGE
kube-system   calico-kube-controllers-5f6cfd688c-pgm6x   0/1     CrashLoopBackOff   5          4m50s
kube-system   calico-node-bbtns                          1/1     Running            0          7m59s
kube-system   calico-node-lhxc4                          1/1     Running            0          28m
kube-system   coredns-558bd4d5db-g2l9k                   0/1     Running            0          92s
kube-system   coredns-558bd4d5db-z84b5                   0/1     Running            0          92s
kube-system   coredns-74ff55c5b-d2v8d                    0/1     Running            0          4m50s
kube-system   etcd-cp                                    1/1     Running            1          64s
kube-system   kube-apiserver-cp                          1/1     Running            1          63s
kube-system   kube-controller-manager-cp                 1/1     Running            0          63s
kube-system   kube-proxy-7x9gp                           1/1     Running            0          14s
kube-system   kube-proxy-95lcf                           1/1     Running            0          38s
kube-system   kube-scheduler-cp                          1/1     Running            0          64s
kube-system   upgrade-health-check-8bws2                 0/1     Completed          0          38s

I'm just follow the instructions and do it the same with worker node. When I drain worker node this happens:

error when evicting pods/"calico-kube-controllers-5f6cfd688c-pgm6x" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget.

I've check the Pods and Calico Kube Constrollers doest not start.

students@cp:~$ kubectl get pods --all-namespaces
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-5f6cfd688c-pgm6x   0/1     Running   10         10m
kube-system   calico-node-bbtns                          1/1     Running   0          13m
kube-system   calico-node-lhxc4                          1/1     Running   0          34m
kube-system   coredns-558bd4d5db-csnp8                   1/1     Running   0          3m21s
kube-system   coredns-558bd4d5db-x2g5s                   1/1     Running   0          3m21s
kube-system   etcd-cp                                    1/1     Running   1          6m32s
kube-system   kube-apiserver-cp                          1/1     Running   1          6m31s
kube-system   kube-controller-manager-cp                 1/1     Running   0          6m31s
kube-system   kube-proxy-7x9gp                           1/1     Running   0          5m42s
kube-system   kube-proxy-95lcf                           1/1     Running   0          6m6s
kube-system   kube-scheduler-cp                          1/1     Running   0          6m32s

When i inspect the logs from calico-kube-controller, I've got this:

main.go 118: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded

I think the calico does not connect with coredns and i'v delete thoose pods. After that all pods running fine. But i dont know why this happens.

I've tried 2 times this lab and this happens twice. But on the second I've succed and solved the problem.

Comments

  • I had this issue, calico-kube-controllers stuck in a crashloop. Deleting the pod fixed the issue, i was then able to drain the worker

Categories

Upcoming Training