Welcome to the Linux Foundation Forum!

Lab 4.1 step 16: Cluster Upgrade is successful but Control Plane version is still at 1.28.1

Options

Help! My cluster upgrade and kubelet upgrade completed successfully on the cp, but doing a kubectl get node shows the cp is still at the old version level.

Output shown below. what have I missed....?

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.29.1". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.
billy@cp:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-32-81 Ready 16d v1.28.1
ip-172-31-33-91 Ready,SchedulingDisabled control-plane 21d v1.28.1
billy@cp:~$ sudo apt-mark unhold kubelet kubectl
Canceled hold on kubelet.
Canceled hold on kubectl.
billy@cp:~$ sudo apt-get install -y kubelet=1.29.1-1.1 kubectl=1.29.1-1.1
Reading package lists... Done
Building dependency tree
Reading state information... Done
The following packages will be upgraded:
kubectl kubelet
2 upgraded, 0 newly installed, 0 to remove and 25 not upgraded.
Need to get 30.3 MB of archives.
After this operation, 889 kB of additional disk space will be used.
Get:1 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/deb kubectl 1.29.1-1.1 [10.5 MB]
Get:2 https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/deb kubelet 1.29.1-1.1 [19.8 MB]
Fetched 30.3 MB in 1s (48.7 MB/s)
(Reading database ... 90923 files and directories currently installed.)
Preparing to unpack .../kubectl_1.29.1-1.1_amd64.deb ...
Unpacking kubectl (1.29.1-1.1) over (1.28.1-1.1) ...
Preparing to unpack .../kubelet_1.29.1-1.1_amd64.deb ...
Unpacking kubelet (1.29.1-1.1) over (1.28.1-1.1) ...
Setting up kubectl (1.29.1-1.1) ...
Setting up kubelet (1.29.1-1.1) ...
billy@cp:~$ sudo apt-mark hold kubelet kubectl
kubelet set on hold.
kubectl set on hold.
billy@cp:~$ sudo systemctl daemon-reload
billy@cp:~$ sudo systemctl restart kubelet
billy@cp:~$ kubectl get node
NAME STATUS ROLES AGE VERSION
ip-172-31-32-81 Ready 16d v1.28.1
ip-172-31-33-91 Ready,SchedulingDisabled control-plane 21d v1.28.1

Best Answers

  • trevski
    trevski Posts: 6
    Answer ✓
    Options

    In the end, a reboot of the cp host has resolved the issues. The cilium pods started successfully after the reboot, and I have been able to drain the cp node and upgrade it.

  • trevski
    trevski Posts: 6
    Answer ✓
    Options

    Thanks Chris. I resolved the issues by rebooting the node. I did post a message about that yesterday, but it seems to have got lost somewhere. Thanks for your help!

Answers

  • chrispokorni
    chrispokorni Posts: 2,184
    Options

    Hi @trevski,

    It appears the apt packages list is not set with the correct URL. Please revisit steps 2 and 3 to add the correct version repository definition and gpg key, and then validate that the /etc/apt/sources.list.d/kubernetes.list file's entry coincides with the one from the lab guide. Validate this before the apt-get update in step 4, and after.

    Regards,
    -Chris

  • trevski
    trevski Posts: 6
    Options

    Hi @chrispokorni ,

    Thanks for your suggestion. I've attempted to redo all the steps but now I hit problems I think because the cp is already cordoned from before. That step just sits there for ever:

    kubectl drain ip-172-31-33-91 --ignore-daemonsets
    node/ip-172-31-33-91 already cordoned
    Warning: ignoring DaemonSet-managed Pods: kube-system/cilium-fxlmh, kube-system/kube-proxy-2jzv9
    evicting pod kube-system/cilium-operator-788c4f69bc-vrgq7

    To get out of this I Ctrl-C, however now the plan and the upgrade fail:

    [upgrade/health] FATAL: [preflight] Some fatal errors occurred:
    [ERROR ControlPlaneNodesReady]: there are NotReady control-planes in the cluster: [ip-172-31-33-91]
    [preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
    To see the stack trace of this error execute with --v=5 or higher

    How can I reset the node or clear this up?

    Referring to your original point, I checked what I did the first time and there was nothing wrong with
    the packages list or the gpg key, unless I have an outdated version of the instructions or something...?

  • trevski
    trevski Posts: 6
    edited April 17
    Options

    Update: the drain is waiting for the cilium-operator pod to be deleted, but it appears to be stuck in Terminating status. I've tried force deleting it, but the cluster then spawns another pod to replace it, which gets stuck in Pending status. Aaaargh!

  • chrispokorni
    chrispokorni Posts: 2,184
    Options

    Hi @trevski,

    Since the control plane node is already cordoned/drained, and you can no longer proceed as instructed... try to uncordon the node. The uncordon command can be found in step 17. Validate the node is in Ready state, then re-attempt the control plane upgrade process.

    If the error persists you may try the suggested flag to ignore the preflight error.

    unless I have an outdated version of the instructions or something

    The course resources are at the latest version release, and I was basing my observation by comparing the output provided earlier with commands from the lab guide.

    Regards,
    -Chris

  • trevski
    trevski Posts: 6
    edited April 22
    Options

    Thanks Chris. My cp node is still in NotReady status, I think because the cilium-operator pod is stuck at Pending:
    NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
    kube-system cilium-operator-788c4f69bc-jn9wz 0/1 Pending 0 6m22s ip-172-31-33-91
    I've tried connecting to the pod but no luck, also tried force deleting it, in which case I just get another stuck at Pending. I'll keep trying to figure it out, any suggestions welcome.
    From describing the pod I can see that it has been scheduled:
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-operator-788c4f69bc-jn9wz to ip-172-31-33-91

    Here is the complete describe output:
    Name: cilium-operator-788c4f69bc-jn9wz
    Namespace: kube-system
    Priority: 2000000000
    Priority Class Name: system-cluster-critical
    Service Account: cilium-operator
    Node: ip-172-31-33-91/
    Labels: app.kubernetes.io/name=cilium-operator
    app.kubernetes.io/part-of=cilium
    io.cilium/app=operator
    name=cilium-operator
    pod-template-hash=788c4f69bc
    Annotations:
    Status: Pending
    IP:
    IPs:
    Controlled By: ReplicaSet/cilium-operator-788c4f69bc
    Containers:
    cilium-operator:
    Image: quay.io/cilium/operator-generic:v1.14.1@sha256:e061de0a930534c7e3f8feda8330976367971238ccafff42659f104effd4b5f7
    Port:
    Host Port:
    Command:
    cilium-operator-generic
    Args:
    --config-dir=/tmp/cilium/config-map
    --debug=$(CILIUM_DEBUG)
    Liveness: http-get http://127.0.0.1:9234/healthz delay=60s timeout=3s period=10s #success=1 #failure=3
    Readiness: http-get http://127.0.0.1:9234/healthz delay=0s timeout=3s period=5s #success=1 #failure=5
    Environment:
    K8S_NODE_NAME: (v1:spec.nodeName)
    CILIUM_K8S_NAMESPACE: kube-system (v1:metadata.namespace)
    CILIUM_DEBUG: Optional: true
    Mounts:
    /tmp/cilium/config-map from cilium-config-path (ro)
    /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-x8m2s (ro)
    Conditions:
    Type Status
    PodScheduled True
    Volumes:
    cilium-config-path:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: cilium-config
    Optional: false
    kube-api-access-x8m2s:
    Type: Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds: 3607
    ConfigMapName: kube-root-ca.crt
    ConfigMapOptional:
    DownwardAPI: true
    QoS Class: BestEffort
    Node-Selectors: kubernetes.io/os=linux
    Tolerations: op=Exists
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Scheduled 17m default-scheduler Successfully assigned kube-system/cilium-operator-788c4f69bc-jn9wz to ip-172-31-33-91

  • chrispokorni
    chrispokorni Posts: 2,184
    Options

    Hi @trevski,

    What are the outputs of the following commands?

    kubectl get nodes -o wide
    kubectl get pods -A -o wide

    Regards,
    -Chris

Categories

Upcoming Training