Lab 4.1 step 16: Cluster Upgrade is successful but Control Plane version is still at 1.28.1
Help! My cluster upgrade and kubelet upgrade completed successfully on the cp, but doing a kubectl get node shows the cp is still at the old version level.
Output shown below. what have I missed....?
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.29.1". Enjoy!
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
billy@cp:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-32-81 Ready 16d v1.28.1
ip-172-31-33-91 Ready,SchedulingDisabled control-plane 21d v1.28.1
billy@cp:~$ sudo apt-mark unhold kubelet kubectl
Canceled hold on kubelet.
Canceled hold on kubectl.
billy@cp:~$ sudo apt-get install -y kubelet=1.29.1-1.1 kubectl=1.29.1-1.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
kubectl kubelet
2 upgraded, 0 newly installed, 0 to remove and 25 not upgraded.
Need to get 30.3 MB of archives.
After this operation, 889 kB of additional disk space will be used.
Get:1 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/deb kubectl 1.29.1-1.1 [10.5 MB]
Get:2 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/deb kubelet 1.29.1-1.1 [19.8 MB]
Fetched 30.3 MB in 1s (48.7 MB/s)
(Reading database ... 90923 files and directories currently installed.)
Preparing to unpack .../kubectl_1.29.1-1.1_amd64.deb ...
Unpacking kubectl (1.29.1-1.1) over (1.28.1-1.1) ...
Preparing to unpack .../kubelet_1.29.1-1.1_amd64.deb ...
Unpacking kubelet (1.29.1-1.1) over (1.28.1-1.1) ...
Setting up kubectl (1.29.1-1.1) ...
Setting up kubelet (1.29.1-1.1) ...
billy@cp:~$ sudo apt-mark hold kubelet kubectl
kubelet set on hold.
kubectl set on hold.
billy@cp:~$ sudo systemctl daemon-reload
billy@cp:~$ sudo systemctl restart kubelet
billy@cp:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-32-81 Ready 16d v1.28.1
ip-172-31-33-91 Ready,SchedulingDisabled control-plane 21d v1.28.1
Best Answers
-
In the end, a reboot of the cp host has resolved the issues. The cilium pods started successfully after the reboot, and I have been able to drain the cp node and upgrade it.
0 -
Thanks Chris. I resolved the issues by rebooting the node. I did post a message about that yesterday, but it seems to have got lost somewhere. Thanks for your help!
0
Answers
-
Hi @trevski,
It appears the
aptpackages list is not set with the correct URL. Please revisit steps 2 and 3 to add the correct version repository definition and gpg key, and then validate that the/etc/apt/sources.list.d/kubernetes.listfile's entry coincides with the one from the lab guide. Validate this before theapt-get updatein step 4, and after.Regards,
-Chris0 -
Hi @chrispokorni ,
Thanks for your suggestion. I've attempted to redo all the steps but now I hit problems I think because the cp is already cordoned from before. That step just sits there for ever:
kubectl drain ip-172-31-33-91 --ignore-daemonsets
node/ip-172-31-33-91 already cordoned
Warning: ignoring DaemonSet-managed Pods: kube-system/cilium-fxlmh, kube-system/kube-proxy-2jzv9
evicting pod kube-system/cilium-operator-788c4f69bc-vrgq7To get out of this I Ctrl-C, however now the plan and the upgrade fail:
[upgrade/health] FATAL: [preflight] Some fatal errors occurred:
[ERROR ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [ip-172-31-33-91]
[preflight] If you know what you are doing, you can make a check non-fatal with--ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higherHow can I reset the node or clear this up?
Referring to your original point, I checked what I did the first time and there was nothing wrong with
the packages list or the gpg key, unless I have an outdated version of the instructions or something...?0 -
Update: the drain is waiting for the cilium-operator pod to be deleted, but it appears to be stuck in Terminating status. I've tried force deleting it, but the cluster then spawns another pod to replace it, which gets stuck in Pending status. Aaaargh!
0 -
Hi @trevski,
Since the control plane node is already cordoned/drained, and you can no longer proceed as instructed... try to uncordon the node. The uncordon command can be found in step 17. Validate the node is in Ready state, then re-attempt the control plane upgrade process.
If the error persists you may try the suggested flag to ignore the preflight error.
unless I have an outdated version of the instructions or something
The course resources are at the latest version release, and I was basing my observation by comparing the output provided earlier with commands from the lab guide.
Regards,
-Chris0 -
Thanks Chris. My cp node is still in NotReady status, I think because the cilium-operator pod is stuck at Pending:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system cilium-operator-788c4f69bc-jn9wz 0/1 Pending 0 6m22s ip-172-31-33-91
I've tried connecting to the pod but no luck, also tried force deleting it, in which case I just get another stuck at Pending. I'll keep trying to figure it out, any suggestions welcome.
From describing the pod I can see that it has been scheduled:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-operator-788c4f69bc-jn9wz to ip-172-31-33-91Here is the complete describe output:
Name: cilium-operator-788c4f69bc-jn9wz
Namespace: kube-system
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: cilium-operator
Node: ip-172-31-33-91/
Labels: app.kubernetes.io/name=cilium-operator
app.kubernetes.io/part-of=cilium
io.cilium/app=operator
name=cilium-operator
pod-template-hash=788c4f69bc
Annotations:
Status: Pending
IP:
IPs:
Controlled By: ReplicaSet/cilium-operator-788c4f69bc
Containers:
cilium-operator:
Image: quay.io/cilium/operator-generic:v1.14.1@sha256:e061de0a930534c7e3f8feda8330976367971238ccafff42659f104effd4b5f7
Port:
Host Port:
Command:
cilium-operator-generic
Args:
--config-dir=/tmp/cilium/config-map
--debug=$(CILIUM_DEBUG)
Liveness: http-get http://127.0.0.1:9234/healthz delay=60s timeout=3s period=10s #success=1 #failure=3
Readiness: http-get http://127.0.0.1:9234/healthz delay=0s timeout=3s period=5s #success=1 #failure=5
Environment:
K8S_NODE_NAME: (v1:spec.nodeName)
CILIUM_K8S_NAMESPACE: kube-system (v1:metadata.namespace)
CILIUM_DEBUG: Optional: true
Mounts:
/tmp/cilium/config-map from cilium-config-path (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x8m2s (ro)
Conditions:
Type Status
PodScheduled True
Volumes:
cilium-config-path:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: cilium-config
Optional: false
kube-api-access-x8m2s:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: kubernetes.io/os=linux
Tolerations: op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-operator-788c4f69bc-jn9wz to ip-172-31-33-910 -
Hi @trevski,
What are the outputs of the following commands?
kubectl get nodes -o widekubectl get pods -A -o wideRegards,
-Chris0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 4 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 2 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 983 Software
- 375 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
- 1.4K LFS258 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)