Lab 4.1 step 16: Cluster Upgrade is successful but Control Plane version is still at 1.28.1
Help! My cluster upgrade and kubelet upgrade completed successfully on the cp, but doing a kubectl get node shows the cp is still at the old version level.
Output shown below. what have I missed....?
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.29.1". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
billy@cp:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-32-81 Ready 16d v1.28.1
ip-172-31-33-91 Ready,SchedulingDisabled control-plane 21d v1.28.1
billy@cp:~$ sudo apt-mark unhold kubelet kubectl
Canceled hold on kubelet.
Canceled hold on kubectl.
billy@cp:~$ sudo apt-get install -y kubelet=1.29.1-1.1 kubectl=1.29.1-1.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
kubectl kubelet
2 upgraded, 0 newly installed, 0 to remove and 25 not upgraded.
Need to get 30.3 MB of archives.
After this operation, 889 kB of additional disk space will be used.
Get:1 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/deb kubectl 1.29.1-1.1 [10.5 MB]
Get:2 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/deb kubelet 1.29.1-1.1 [19.8 MB]
Fetched 30.3 MB in 1s (48.7 MB/s)
(Reading database ... 90923 files and directories currently installed.)
Preparing to unpack .../kubectl_1.29.1-1.1_amd64.deb ...
Unpacking kubectl (1.29.1-1.1) over (1.28.1-1.1) ...
Preparing to unpack .../kubelet_1.29.1-1.1_amd64.deb ...
Unpacking kubelet (1.29.1-1.1) over (1.28.1-1.1) ...
Setting up kubectl (1.29.1-1.1) ...
Setting up kubelet (1.29.1-1.1) ...
billy@cp:~$ sudo apt-mark hold kubelet kubectl
kubelet set on hold.
kubectl set on hold.
billy@cp:~$ sudo systemctl daemon-reload
billy@cp:~$ sudo systemctl restart kubelet
billy@cp:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-32-81 Ready 16d v1.28.1
ip-172-31-33-91 Ready,SchedulingDisabled control-plane 21d v1.28.1
Best Answers
-
In the end, a reboot of the cp host has resolved the issues. The cilium pods started successfully after the reboot, and I have been able to drain the cp node and upgrade it.
0 -
Thanks Chris. I resolved the issues by rebooting the node. I did post a message about that yesterday, but it seems to have got lost somewhere. Thanks for your help!
0
Answers
-
Hi @trevski,
It appears the
apt
packages list is not set with the correct URL. Please revisit steps 2 and 3 to add the correct version repository definition and gpg key, and then validate that the/etc/apt/sources.list.d/kubernetes.list
file's entry coincides with the one from the lab guide. Validate this before theapt-get update
in step 4, and after.Regards,
-Chris0 -
Hi @chrispokorni ,
Thanks for your suggestion. I've attempted to redo all the steps but now I hit problems I think because the cp is already cordoned from before. That step just sits there for ever:
kubectl drain ip-172-31-33-91 --ignore-daemonsets
node/ip-172-31-33-91 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/cilium-fxlmh, kube-system/kube-proxy-2jzv9
evicting pod kube-system/cilium-operator-788c4f69bc-vrgq7To get out of this I Ctrl-C, however now the plan and the upgrade fail:
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
[ERROR ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [ip-172-31-33-91]
[preflight] If you know what you are doing, you can make a check non-fatal with--ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higherHow can I reset the node or clear this up?
Referring to your original point, I checked what I did the first time and there was nothing wrong with
the packages list or the gpg key, unless I have an outdated version of the instructions or something...?0 -
Update: the drain is waiting for the cilium-operator pod to be deleted, but it appears to be stuck in Terminating status. I've tried force deleting it, but the cluster then spawns another pod to replace it, which gets stuck in Pending status. Aaaargh!
0 -
Hi @trevski,
Since the control plane node is already cordoned/drained, and you can no longer proceed as instructed... try to uncordon the node. The uncordon command can be found in step 17. Validate the node is in Ready state, then re-attempt the control plane upgrade process.
If the error persists you may try the suggested flag to ignore the preflight error.
unless I have an outdated version of the instructions or something
The course resources are at the latest version release, and I was basing my observation by comparing the output provided earlier with commands from the lab guide.
Regards,
-Chris0 -
Thanks Chris. My cp node is still in NotReady status, I think because the cilium-operator pod is stuck at Pending:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system cilium-operator-788c4f69bc-jn9wz 0/1 Pending 0 6m22s ip-172-31-33-91
I've tried connecting to the pod but no luck, also tried force deleting it, in which case I just get another stuck at Pending. I'll keep trying to figure it out, any suggestions welcome.
From describing the pod I can see that it has been scheduled:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-operator-788c4f69bc-jn9wz to ip-172-31-33-91Here is the complete describe output:
Name: cilium-operator-788c4f69bc-jn9wz
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: cilium-operator
Node: ip-172-31-33-91/
Labels: app.kubernetes.io/name=cilium-operator
app.kubernetes.io/part-of=cilium
io.cilium/app=operator
name=cilium-operator
pod-template-hash=788c4f69bc
Annotations:
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/cilium-operator-788c4f69bc
Containers:
cilium-operator:
Image: quay.io/cilium/operator-generic:v1.14.1@sha256:e061de0a930534c7e3f8feda8330976367971238ccafff42659f104effd4b5f7
Port:
Host Port:
Command:
cilium-operator-generic
Args:
--config-dir=/tmp/cilium/config-map
--debug=$(CILIUM_DEBUG)
Liveness: http-get http://127.0.0.1:9234/healthz delay=60s timeout=3s period=10s #success=1 #failure=3
Readiness: http-get http://127.0.0.1:9234/healthz delay=0s timeout=3s period=5s #success=1 #failure=5
Environment:
K8S_NODE_NAME: (v1:spec.nodeName)
CILIUM_K8S_NAMESPACE: kube-system (v1:metadata.namespace)
CILIUM_DEBUG: Optional: true
Mounts:
/tmp/cilium/config-map from cilium-config-path (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x8m2s (ro)
Conditions:
Type Status
PodScheduled True
Volumes:
cilium-config-path:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cilium-config
Optional: false
kube-api-access-x8m2s:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-operator-788c4f69bc-jn9wz to ip-172-31-33-910 -
Hi @trevski,
What are the outputs of the following commands?
kubectl get nodes -o wide
kubectl get pods -A -o wide
Regards,
-Chris0
Categories
- All Categories
- 167 LFX Mentorship
- 167 LFX Mentorship: Linux Kernel
- 802 Linux Foundation IT Professional Programs
- 358 Cloud Engineer IT Professional Program
- 181 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 127 Cloud Native Developer IT Professional Program
- 138 Express Training Courses
- 138 Express Courses - Discussion Forum
- 6.3K Training Courses
- 24 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 42 LFD102 Class Forum
- 228 LFD103 Class Forum
- 19 LFD110 Class Forum
- 41 LFD121 Class Forum
- 15 LFD133 Class Forum
- 8 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 698 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 162 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 3 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 1 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 2 LFS148 Class Forum
- 2 LFS151 Class Forum
- 1 LFS157 Class Forum
- 1 LFS158 Class Forum
- 10 LFS162 Class Forum
- 2 LFS166 Class Forum
- 1 LFS167 Class Forum
- 1 LFS170 Class Forum
- 1 LFS171 Class Forum
- 1 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 135 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 56 LFS216 Class Forum
- 48 LFS241 Class Forum
- 48 LFS242 Class Forum
- 37 LFS243 Class Forum
- 12 LFS244 Class Forum
- 3 LFS245 Class Forum
- LFS246 Class Forum
- LFS248 Class Forum
- 52 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 141 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 9 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 111 LFS260 Class Forum
- 149 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 18 LFS268 Class Forum
- 29 LFS269 Class Forum
- 5 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS147 Class Forum
- LFS274 Class Forum
- 3 LFS281 Class Forum
- LFW111 Class Forum
- 256 LFW211 Class Forum
- 183 LFW212 Class Forum
- 10 SKF100 Class Forum
- SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 174 Networking
- 87 Printers & Scanners
- 83 Storage
- 743 Linux Distributions
- 80 Debian
- 66 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 55 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 373 Off Topic
- 115 Introductions
- 169 Small Talk
- 23 Study Material
- 507 Programming and Development
- 304 Kernel Development
- 204 Software Development
- 1.1K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 317 Installation
- 59 All In Program
- 59 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)