Problem with calico-kube-controller (Lab 4.1)
Hi.
After upgrading the cp node successfully, I proceed to upgrade the worker node and I get that error:
jose@k8scp:~$ kubectl drain k8swrk --ignore-daemonsets node/k8swrk already cordoned WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-j8ntn, kube-system/kube-proxy-2gl5s evicting pod kube-system/calico-kube-controllers-6b9fbfff44-4lmlh error when evicting pods/"calico-kube-controllers-6b9fbfff44-4lmlh" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod kube-system/calico-kube-controllers-6b9fbfff44-4lmlh error when evicting pods/"calico-kube-controllers-6b9fbfff44-4lmlh" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. evicting pod kube-system/calico-kube-controllers-6b9fbfff44-4lmlh error when evicting pods/"calico-kube-controllers-6b9fbfff44-4lmlh" -n "kube-system" (will retry after 5s): Cannot evict pod as it would violate the pod's disruption budget. ^C .........
Until now the Lab was going well and the upgrade process on the cp node went fine:
jose@k8scp:~$ kubectl get node NAME STATUS ROLES AGE VERSION k8scp Ready control-plane,master 39h v1.22.1 k8swrk Ready,SchedulingDisabled <none> 39h v1.21.1
Then I uncordon k8swrk and:
jose@k8scp:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION k8scp Ready control-plane,master 42h v1.22.1 k8swrk Ready <none> 41h v1.22.1
I ignored the issue because it seemed to go fine and I din't notice any problem on my installation. But after continuing with lab 4.2 I saw the output of some commands that worried me, for example:
jose@k8scp:~$ kubectl -n kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-6b9fbfff44-4lmlh 0/1 CrashLoopBackOff 58 (2m52s ago) 3h13m 192.168.164.138 k8swrk <none> <none> calico-node-j8ntn 1/1 Running 5 (82m ago) 41h 192.168.122.3 k8swrk <none> <none> calico-node-tnffg 1/1 Running 5 41h 192.168.122.2 k8scp <none> <none> coredns-78fcd69978-hz2kl 1/1 Running 2 (82m ago) 173m 192.168.74.146 k8scp <none> <none> coredns-78fcd69978-mczhs 1/1 Running 2 (82m ago) 173m 192.168.74.147 k8scp <none> <none> <ommited>
And, as it can be observed, the calico-kube-controllers are with a CrashLoopBackOff status and I suspect that it is not a good signal.
What is wrong with that?
I tried kubectl drain again but with the same results.
Comments
-
I'm doing again Lab 4.1 from a snapshot of my VM's and, after upgrading the cp node, on step 15 I noticed that the problem is there again. I issued the command
kubectl uncordon k8scp
but that didn't help. I append the output of all this for better debugging:jose@k8scp:~$ kubectl get node NAME STATUS ROLES AGE VERSION k8scp Ready,SchedulingDisabled control-plane,master 46h v1.22.1 k8swrk Ready <none> 46h v1.21.1 jose@k8scp:~$ kubectl -n kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-6b9fbfff44-cwk6z 0/1 CrashLoopBackOff 6 (2m53s ago) 10m 192.168.164.131 k8swrk <none> <none> calico-node-j8ntn 1/1 Running 2 (23h ago) 46h 192.168.122.3 k8swrk <none> <none> calico-node-tnffg 1/1 Running 2 (23h ago) 46h 192.168.122.2 k8scp <none> <none> coredns-558bd4d5db-d22ht 0/1 Running 0 10m 192.168.164.130 k8swrk <none> <none> coredns-78fcd69978-87kzd 0/1 Running 0 5m30s 192.168.164.134 k8swrk <none> <none> coredns-78fcd69978-sbzck 0/1 Running 0 5m30s 192.168.164.135 k8swrk <none> <none> etcd-k8scp 1/1 Running 0 7m6s 192.168.122.2 k8scp <none> <none> kube-apiserver-k8scp 1/1 Running 0 6m22s 192.168.122.2 k8scp <none> <none> kube-controller-manager-k8scp 1/1 Running 0 5m59s 192.168.122.2 k8scp <none> <none> kube-proxy-bxbqz 1/1 Running 0 5m24s 192.168.122.3 k8swrk <none> <none> kube-proxy-jmt4k 1/1 Running 0 4m57s 192.168.122.2 k8scp <none> <none> kube-scheduler-k8scp 1/1 Running 0 5m45s 192.168.122.2 k8scp <none> <none> jose@k8scp:~$ kubectl uncordon k8scp node/k8scp uncordoned jose@k8scp:~$ kubectl get node NAME STATUS ROLES AGE VERSION k8scp Ready control-plane,master 46h v1.22.1 k8swrk Ready <none> 46h v1.21.1 jose@k8scp:~$ kubectl -n kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-6b9fbfff44-cwk6z 0/1 Running 7 (5m14s ago) 12m 192.168.164.131 k8swrk <none> <none> calico-node-j8ntn 1/1 Running 2 (24h ago) 46h 192.168.122.3 k8swrk <none> <none> calico-node-tnffg 1/1 Running 2 (24h ago) 46h 192.168.122.2 k8scp <none> <none> coredns-558bd4d5db-d22ht 0/1 Running 0 12m 192.168.164.130 k8swrk <none> <none> coredns-78fcd69978-87kzd 0/1 Running 0 7m51s 192.168.164.134 k8swrk <none> <none> coredns-78fcd69978-sbzck 0/1 Running 0 7m51s 192.168.164.135 k8swrk <none> <none> etcd-k8scp 1/1 Running 0 9m27s 192.168.122.2 k8scp <none> <none> kube-apiserver-k8scp 1/1 Running 0 8m43s 192.168.122.2 k8scp <none> <none> kube-controller-manager-k8scp 1/1 Running 0 8m20s 192.168.122.2 k8scp <none> <none> kube-proxy-bxbqz 1/1 Running 0 7m45s 192.168.122.3 k8swrk <none> <none> kube-proxy-jmt4k 1/1 Running 0 7m18s 192.168.122.2 k8scp <none> <none> kube-scheduler-k8scp 1/1 Running 0 8m6s 192.168.122.2 k8scp <none> <none> jose@k8scp:~$ kubectl -n kube-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-6b9fbfff44-cwk6z 0/1 CrashLoopBackOff 7 (8s ago) 12m 192.168.164.131 k8swrk <none> <none> calico-node-j8ntn 1/1 Running 2 (24h ago) 46h 192.168.122.3 k8swrk <none> <none> calico-node-tnffg 1/1 Running 2 (24h ago) 46h 192.168.122.2 k8scp <none> <none> coredns-558bd4d5db-d22ht 0/1 Running 0 12m 192.168.164.130 k8swrk <none> <none> coredns-78fcd69978-87kzd 0/1 Running 0 8m3s 192.168.164.134 k8swrk <none> <none> coredns-78fcd69978-sbzck 0/1 Running 0 8m3s 192.168.164.135 k8swrk <none> <none> etcd-k8scp 1/1 Running 0 9m39s 192.168.122.2 k8scp <none> <none> kube-apiserver-k8scp 1/1 Running 0 8m55s 192.168.122.2 k8scp <none> <none> kube-controller-manager-k8scp 1/1 Running 0 8m32s 192.168.122.2 k8scp <none> <none> kube-proxy-bxbqz 1/1 Running 0 7m57s 192.168.122.3 k8swrk <none> <none> kube-proxy-jmt4k 1/1 Running 0 7m30s 192.168.122.2 k8scp <none> <none> kube-scheduler-k8scp 1/1 Running 0 8m18s 192.168.122.2 k8scp <none> <none>
Between the last command and the previous only a few seconds passed. The output after
kubectl uncordon k8scp
of the firstkubectl -n kube-system get pods -o wide
shows the calico-kube-controller status as "Running" but after a few seconds it showss back again "CrashLoopBackOff"Could it be necessary to upgrade Calico too as step 9 seems to suggest? If that is the case, I don't know how to do it and what version would run ok with upgraded versions.
0 -
Well, I followed the instructions described here, that redirects me here for upgrading calico installations through calico.yaml manifest, and now I think that the problem is gone. But now the calico objects belongs to a new namespace (calico-system) instead to the original (kube-system):
jose@k8scp:~$ kubectl -n calico-system get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES calico-kube-controllers-58494599f9-pr7kn 1/1 Running 0 106s 192.168.74.138 k8scp <none> <none> calico-node-8hfkw 1/1 Running 0 47s 192.168.122.2 k8scp <none> <none> calico-node-drjf6 1/1 Running 0 35s 192.168.122.3 k8swrk <none> <none> calico-typha-66698b6b8b-whnbt 1/1 Running 0 49s 192.168.122.3 k8swrk <none> <none>
I continued with the worker node upgrade and everything was ok and the previous errors was gone:
jose@k8scp:~$ kubectl drain k8swrk --ignore-daemonsets node/k8swrk cordoned WARNING: ignoring DaemonSet-managed Pods: calico-system/calico-node-drjf6, kube-system/kube-proxy-bxbqz evicting pod kube-system/coredns-78fcd69978-sbzck evicting pod kube-system/coredns-78fcd69978-87kzd evicting pod calico-system/calico-typha-66698b6b8b-whnbt evicting pod kube-system/coredns-558bd4d5db-d22ht pod/calico-typha-66698b6b8b-whnbt evicted pod/coredns-78fcd69978-87kzd evicted pod/coredns-78fcd69978-sbzck evicted pod/coredns-558bd4d5db-d22ht evicted node/k8swrk evicted
If this new configuration can cause me problems in the future for incompatibilities with next labs I will thank that somebody warns me.
Otherwise, I close that thread0 -
Hi @jmarinho,
Your issues are caused by overlapping IP addresses between the node/VM IPs managed by the hypervisor and the pod IPs managed by Calico. As long as there is such overlap, your cluster will not operate successfully.
I would recommend rebuilding your cluster and ensuring that the VM IP addresses managed by the hypervisor do not overlap the default
192.168.0.0/16
pod network managed by Calico. You could try assigning your VMs IP addresses from the10.200.0.0/16
network to prevent any such IP address overlap.Regards,
-Chris1 -
Hi @chrispokorni,
Sorry for not answering before, but I did not see the message until today.
Thank for your advice. You're right and that was the problem. I did not pay attention at the subnet mask, a very silly mistake.
I thought that upgrading Calico as I mentioned solved the issue and for some reason that I don't know, it seemed that. I did not have any problems after that but, before I noticed your answer, I was having problems installing linkerd on lab 11.1 and probably was related with that.
As I finally had to rebuild my cluster, I'm redoing the labs and, when I get lab 11 I will see if that was the problem.Regards
Jose0 -
I am glad to find this note. I was precisely at the stage of upgrading worker node. I have spent about 6 days and while troubleshooting learned a lot but not what I needed.
I hope that @Chris could comment.
AWS master-worker node
ubuntu@ip-172-31-25-66:~$ ku -n kube-system get po -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-685b65ddf9-rlcrv 0/1 CrashLoopBackOff 23 (84s ago) 98m 172.31.104.240 ip-172-31-26-1
calico-node-gm8sz 1/1 Running 0 3h43m 172.31.26.1 ip-172-31-26-1
calico-node-xnqnx 1/1 Running 1 (3h17m ago) 3h43m 172.31.25.66 ip-172-31-25-66
coredns-64897985d-fz4nj 1/1 Running 6 (3h17m ago) 7d23h 172.31.52.227 ip-172-31-25-66
coredns-64897985d-shblb 1/1 Running 6 (3h17m ago) 7d23h 172.31.52.228 ip-172-31-25-66
etcd-ip-172-31-25-66 1/1 Running 7 (3h17m ago) 8d 172.31.25.66 ip-172-31-25-66
kube-apiserver-ip-172-31-25-66 1/1 Running 10 (3h17m ago) 8d 172.31.25.66 ip-172-31-25-66
kube-controller-manager-ip-172-31-25-66 1/1 Running 6 (3h17m ago) 8d 172.31.25.66 ip-172-31-25-66
kube-proxy-x9xrc 1/1 Running 6 (3h17m ago) 8d 172.31.25.66 ip-172-31-25-66
kube-proxy-zcq2t 1/1 Running 4 (4h41m ago) 8d 172.31.26.1 ip-172-31-26-1
kube-scheduler-ip-172-31-25-66 1/1 Running 6 (3h17m ago) 8d
172.31.25.66 ip-172-31-25-66I am getting the following errors
DESCRIBE
ku -n kube-system describe po calico-kube-controllers-685b65ddf9-rlcrv
Warning Unhealthy 29m (x10 over 30m) kubelet Readiness probe failed: Failed to read status file /status/status.json: unexpected end of JSON input
Normal Pulled 29m (x4 over 30m) kubelet Container image "docker.io/calico/kube-controllers:v3.23.1" already present on machine
Warning BackOff 40s (x136 over 30m) kubelet Back-off restarting failed containerLOG KUBE-PROXY
ku -n kube-system logs kube-proxy-x9xrc
E0703 17:58:39.111615 1 proxier.go:1600] "can't open port, skipping it" err="listen tcp4 :31107: bind: address already in use" port={Description:nodePort for default/nginx IP: IPFamily:4 Port:31107 Protocol:TCP}DESCRIBE
ku -n kube-system describe po kube-proxy-x9xrc
kube-proxy:
Container ID: docker://fb063dc9345ee6e122f00c00265e7c41e5f330a240db855a0c580b71823207e7
Image: k8s.gcr.io/kube-proxy:v1.23.1
Image ID: docker-pullable://k8s.gcr.io/kube-proxy@sha256:e40f3a28721588affcf187f3f246d1e078157dabe274003eaa2957a83f7170c8
Port:
Host Port:
ku -n kube-system describe po kube-proxy-zcq2t
Containers:
kube-proxy:
Container ID: docker://a4163a3b6548904078d592ade2f948d0a96bb566863cbddbd153a5fa18fd0300
Image: k8s.gcr.io/kube-proxy:v1.23.1
Image ID: docker-pullable://k8s.gcr.io/kube-proxy@sha256:e40f3a28721588affcf187f3f246d1e078157dabe274003eaa2957a83f7170c8
Port:
Host Port:ku -n kube-system logs kube-proxy-zcq2t
E0703 16:35:32.334879 1 proxier.go:1600] "can't open port, skipping it" err="listen tcp4 :31107: bind: address already in use" port={Description:nodePort for default/nginx IP: IPFamily:4 Port:31107 Protocol:TCP}ku -n kube-system logs calico-node-gm8sz
2022-07-03 20:22:18.097 [INFO][71] monitor-addresses/autodetection_methods.go 103: Using autodetected IPv4 address on interface eth0: 172.31.26.1/20
2022-07-03 20:22:22.413 [INFO][68] felix/summary.go 100: Summarising 12 dataplane reconciliation loops over 1m2.5s: avg=4ms longest=8ms (resync-nat-v4)ku -n kube-system logs calico-node-xnqnx
2022-07-03 20:24:44.924 [INFO][71] monitor-addresses/autodetection_methods.go 103: Using autodetected IPv4 address on interface eth0: 172.31.25.66/20
2022-07-03 20:24:47.346 [INFO][66] felix/summary.go 100: Summarising 11 dataplane reconciliation loops over 1m3.4s: avg=4ms longest=12ms ()error: unable to upgrade connection: container not found ("calico-kube-controllers")
I could use some insight!
0
Categories
- All Categories
- 217 LFX Mentorship
- 217 LFX Mentorship: Linux Kernel
- 788 Linux Foundation IT Professional Programs
- 352 Cloud Engineer IT Professional Program
- 177 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 146 Cloud Native Developer IT Professional Program
- 137 Express Training Courses
- 137 Express Courses - Discussion Forum
- 6.2K Training Courses
- 46 LFC110 Class Forum - Discontinued
- 70 LFC131 Class Forum
- 42 LFD102 Class Forum
- 226 LFD103 Class Forum
- 18 LFD110 Class Forum
- 37 LFD121 Class Forum
- 18 LFD133 Class Forum
- 7 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 4 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 693 LFD259 Class Forum
- 111 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 145 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 2 LFS116 Class Forum
- 4 LFS118 Class Forum
- 6 LFS142 Class Forum
- 5 LFS144 Class Forum
- 4 LFS145 Class Forum
- 2 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 2 LFS157 Class Forum
- 25 LFS158 Class Forum
- 7 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 31 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム
- 18 LFS203 Class Forum
- 130 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 2 LFS245 Class Forum
- LFS246 Class Forum
- 48 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 150 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 7 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 118 LFS260 Class Forum
- 159 LFS261 Class Forum
- 42 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 22 LFS268 Class Forum
- 30 LFS269 Class Forum
- LFS270 Class Forum
- 202 LFS272 Class Forum
- 2 LFS272-JP クラス フォーラム
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 9 LFW111 Class Forum
- 259 LFW211 Class Forum
- 181 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 795 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 102 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 63 Mobile Computing
- 18 Android
- 33 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 370 Off Topic
- 114 Introductions
- 173 Small Talk
- 22 Study Material
- 805 Programming and Development
- 303 Kernel Development
- 484 Software Development
- 1.8K Software
- 261 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 96 All In Program
- 96 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)