Welcome to the Linux Foundation Forum!

Lab 2.2 - Unable To Start Control Plane Node

Options

Hello everyone,

i am currently facing an issue during exercise 2.2:

My setup is the following:

Using an AWS Instance (t2.large) with the following spec:

2 CPU
8G memory
20G disk space

After startup & connect i did the following:

check firewall status - disabled
disabled swap
checked SELinux - disabled
Disabled AppArmor with the following commands

sudo systemctl stop apparmor
sudo systemctl disable apparmor

when running the mentioned shell script k8scp.sh i get the success message:

Your Kubernetes control-plane has initialized successfully!

But the kubectl at the end of the script will show the following output:

The connection to the server 172.31.39.164:6443 was refused - did you specify the right host or port?

After some time, i can run the kubectl command but it will show the CP node as NotReady.

The describe command for this node gives returns:

Name:               ip-172-31-39-164
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=ip-172-31-39-164
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Thu, 25 Aug 2022 09:19:42 +0000
Taints:             node-role.kubernetes.io/control-plane:NoSchedule
                    node-role.kubernetes.io/master:NoSchedule
                    node.kubernetes.io/not-ready:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  ip-172-31-39-164
  AcquireTime:     <unset>
  RenewTime:       Thu, 25 Aug 2022 09:21:18 +0000
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  MemoryPressure   False   Thu, 25 Aug 2022 09:20:57 +0000   Thu, 25 Aug 2022 09:19:38 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Thu, 25 Aug 2022 09:20:57 +0000   Thu, 25 Aug 2022 09:19:38 +0000   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Thu, 25 Aug 2022 09:20:57 +0000   Thu, 25 Aug 2022 09:19:38 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            False   Thu, 25 Aug 2022 09:20:57 +0000   Thu, 25 Aug 2022 09:19:38 +0000   KubeletNotReady              container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Addresses:
  InternalIP:  172.31.39.164
  Hostname:    ip-172-31-39-164
Capacity:
  cpu:                2
  ephemeral-storage:  20134592Ki
  hugepages-2Mi:      0
  memory:             8137712Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  18556039957
  hugepages-2Mi:      0
  memory:             8035312Ki
  pods:               110
System Info:
  Machine ID:                 18380e0a74d14c1db72eeaba35b3daa2
  System UUID:                ec2c0143-a6ec-7352-60c1-21888f960243
  Boot ID:                    50f8ff11-1232-4069-bcee-9df6ba3da059
  Kernel Version:             5.15.0-1017-aws
  OS Image:                   Ubuntu 22.04.1 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.6.7
  Kubelet Version:            v1.24.1
  Kube-Proxy Version:         v1.24.1
Non-terminated Pods:          (4 in total)
  Namespace                   Name                                        CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                        ------------  ----------  ---------------  -------------  ---
  kube-system                 etcd-ip-172-31-39-164                       100m (5%)     0 (0%)      100Mi (1%)       0 (0%)         24s
  kube-system                 kube-apiserver-ip-172-31-39-164             250m (12%)    0 (0%)      0 (0%)           0 (0%)         24s
  kube-system                 kube-controller-manager-ip-172-31-39-164    200m (10%)    0 (0%)      0 (0%)           0 (0%)         18s
  kube-system                 kube-scheduler-ip-172-31-39-164             100m (5%)     0 (0%)      0 (0%)           0 (0%)         17s
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                650m (32%)  0 (0%)
  memory             100Mi (1%)  0 (0%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                  From     Message
  ----     ------                   ----                 ----     -------
  Warning  InvalidDiskCapacity      107s                 kubelet  invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  107s (x3 over 107s)  kubelet  Node ip-172-31-39-164 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    107s (x3 over 107s)  kubelet  Node ip-172-31-39-164 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     107s (x2 over 107s)  kubelet  Node ip-172-31-39-164 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  107s                 kubelet  Updated Node Allocatable limit across pods
  Normal   Starting                 107s                 kubelet  Starting kubelet.
  Normal   NodeAllocatableEnforced  97s                  kubelet  Updated Node Allocatable limit across pods
  Normal   Starting                 97s                  kubelet  Starting kubelet.
  Warning  InvalidDiskCapacity      97s                  kubelet  invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  97s                  kubelet  Node ip-172-31-39-164 status is now: NodeHasSufficientMemory
  Normal   NodeHasSufficientPID     97s                  kubelet  Node ip-172-31-39-164 status is now: NodeHasSufficientPID
  Normal   NodeHasNoDiskPressure    97s                  kubelet  Node ip-172-31-39-164 status is now: NodeHasNoDiskPressure
  Normal   Starting                 33s                  kubelet  Starting kubelet.
  Warning  InvalidDiskCapacity      33s                  kubelet  invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  32s (x8 over 33s)    kubelet  Node ip-172-31-39-164 status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    32s (x7 over 33s)    kubelet  Node ip-172-31-39-164 status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     32s (x7 over 33s)    kubelet  Node ip-172-31-39-164 status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  32s                  kubelet  Updated Node Allocatable limit across pods

After some time, the node seems to terminate and any kubectl command will return this error message:

The connection to the server 172.31.39.164:6443 was refused - did you specify the right host or port?

I have the feeling, that there is some issue with the networking, but i cant figure out, what exactly. I tried the steps several times, everytime with a fresh AWS instance.

Can anyone please help me with this issue?

Many thanks in advance

Best Answer

  • amayorga
    amayorga Posts: 6
    edited August 2022 Answer ✓
    Options

    HI @chrispokorni, thanks for the help and tips. After reading other threads in the forum, I have tried using the LTS version 20.04 instead of 22.04. Although the default version in EC2 does not have containerd installed, but easy to solve :)

    Now seems to be working fine

     kubectl get node
    NAME               STATUS   ROLES           AGE   VERSION
    ip-172-31-41-155   Ready    <none>          11m   v1.24.1
    ip-172-31-47-37    Ready    control-plane   23h   v1.24.1
    
    kubectl get pod -n kube-system
    NAME                                       READY   STATUS              RESTARTS      AGE
    calico-kube-controllers-5b97f5d8cf-sfwfb   1/1     Running             1 (46m ago)   24h
    calico-node-5h77g                          0/1     Init:0/3            0             24m
    calico-node-9vz4r                          1/1     Running             1 (46m ago)   24h
    coredns-6d4b75cb6d-b5tf6                   1/1     Running             1 (46m ago)   24h
    coredns-6d4b75cb6d-wknrz                   1/1     Running             1 (46m ago)   24h
    etcd-ip-172-31-47-37                       1/1     Running             1 (46m ago)   24h
    kube-apiserver-ip-172-31-47-37             1/1     Running             1 (46m ago)   24h
    kube-controller-manager-ip-172-31-47-37    1/1     Running             1 (46m ago)   24h
    kube-proxy-8wpqj                           1/1     Running             1 (46m ago)   24h
    kube-proxy-dk9p6                           0/1     ContainerCreating   0             24m
    kube-scheduler-ip-172-31-47-37             1/1     Running             1 (46m ago)   24h
    

    BR
    Alberto

Answers

  • j0hns0n
    j0hns0n Posts: 5
    Options

    When checking the pods of the kube-system namespace i can see that some of these are caught in a loop.

    NAME                                       READY   STATUS             RESTARTS      AGE
    coredns-6d4b75cb6d-5mv6l                   0/1     Pending            0             51s
    coredns-6d4b75cb6d-ht77w                   0/1     Pending            0             51s
    etcd-ip-172-31-39-164                      1/1     Running            2 (94s ago)   85s
    kube-apiserver-ip-172-31-39-164            1/1     Running            1 (94s ago)   85s
    kube-controller-manager-ip-172-31-39-164   1/1     Running            2 (94s ago)   79s
    kube-proxy-292zd                           1/1     Running            1 (50s ago)   52s
    kube-scheduler-ip-172-31-39-164            0/1     CrashLoopBackOff   2 (5s ago)    78s
    

    Looking closer into the kube-scheduler i can see the following:

                  kubernetes.io/config.hash: 641b4e44950584cb2848b582a6bae80f
                          kubernetes.io/config.mirror: 641b4e44950584cb2848b582a6bae80f
                          kubernetes.io/config.seen: 2022-08-25T09:20:49.832469811Z
                          kubernetes.io/config.source: file
                          seccomp.security.alpha.kubernetes.io/pod: runtime/default
    Status:               Running
    IP:                   172.31.39.164
    IPs:
      IP:           172.31.39.164
    Controlled By:  Node/ip-172-31-39-164
    Containers:
      kube-scheduler:
        Container ID:  containerd://be09d0a5460bd2cc62849d9a66f4ea2e771471ca6bba0eebf5b18a576dd328d8
        Image:         k8s.gcr.io/kube-scheduler:v1.24.4
        Image ID:      k8s.gcr.io/kube-scheduler@sha256:378509dd1111937ca2791cf4c4814bc0647714e2ab2f4fc15396707ad1a987a2
        Port:          <none>
        Host Port:     <none>
        Command:
          kube-scheduler
          --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
          --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
          --bind-address=127.0.0.1
          --kubeconfig=/etc/kubernetes/scheduler.conf
          --leader-elect=true
        State:          Running
          Started:      Thu, 25 Aug 2022 09:22:43 +0000
        Last State:     Terminated
          Reason:       Completed
          Exit Code:    0
          Started:      Thu, 25 Aug 2022 09:20:51 +0000
          Finished:     Thu, 25 Aug 2022 09:22:18 +0000
        Ready:          False
        Restart Count:  3
        Requests:
          cpu:        100m
        Liveness:     http-get https://127.0.0.1:10259/healthz delay=10s timeout=15s period=10s #success=1 #failure=8
        Startup:      http-get https://127.0.0.1:10259/healthz delay=10s timeout=15s period=10s #success=1 #failure=24
        Environment:  <none>
        Mounts:
          /etc/kubernetes/scheduler.conf from kubeconfig (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
      kubeconfig:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/kubernetes/scheduler.conf
        HostPathType:  FileOrqCreate
    QoS Class:         Burstable
    Node-Selectors:    <none>
    Tolerations:       :NoExecute op=Exists
    Events:
      Type     Reason          Age                 From     Message
      ----     ------          ----                ----     -------
      Normal   SandboxChanged  31s (x2 over 119s)  kubelet  Pod sandbox changed, it will be killed and re-created.
      Normal   Killing         31s                 kubelet  Stopping container kube-scheduler
      Warning  BackOff         22s (x5 over 31s)   kubelet  Back-off restarting failed container
      Normal   Pulled          6s (x2 over 118s)   kubelet  Container image "k8s.gcr.io/kube-scheduler:v1.24.4" already present on machine
      Normal   Created         6s (x2 over 118s)   kubelet  Created container kube-scheduler
      Normal   Started         6s (x2 over 117s)   kubelet  Started container kube-scheduler
    
    
  • j0hns0n
    j0hns0n Posts: 5
    Options

    The logs for this pod look like this:

    I0825 09:23:21.869581       1 serving.go:348] Generated self-signed cert in-memory
    I0825 09:23:22.199342       1 server.go:147] "Starting Kubernetes Scheduler" version="v1.24.4"
    I0825 09:23:22.199377       1 server.go:149] "Golang settings" GOGC="" GOMAXPROCS="" GOTRACEBACK=""
    I0825 09:23:22.203198       1 secure_serving.go:210] Serving securely on 127.0.0.1:10259
    I0825 09:23:22.203278       1 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController
    I0825 09:23:22.203296       1 shared_informer.go:255] Waiting for caches to sync for RequestHeaderAuthRequestController
    I0825 09:23:22.203323       1 tlsconfig.go:240] "Starting DynamicServingCertificateController"
    I0825 09:23:22.211009       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file"
    I0825 09:23:22.211197       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
    I0825 09:23:22.211296       1 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file"
    I0825 09:23:22.211417       1 shared_informer.go:255] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
    I0825 09:23:22.304407       1 shared_informer.go:262] Caches are synced for RequestHeaderAuthRequestController
    I0825 09:23:22.304694       1 leaderelection.go:248] attempting to acquire leader lease kube-system/kube-scheduler...
    I0825 09:23:22.312381       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file
    I0825 09:23:22.312443       1 shared_informer.go:262] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file
    I0825 09:23:22.313870       1 leaderelection.go:258] successfully acquired lease kube-system/kube-scheduler
    
  • amayorga
    Options

    Hello everyone.
    I'm facing similar issue during this exercice.
    Same AWS instance configuration and firewall and rules pre k8scp.sh execution.

    Your Kubernetes control-plane has initialized successfully!

    For the first few minutes of boot I can access the kubectl commands, but after a while I can't.

    Any clue on this?

    Thanks

    kubectl get pods -n kube-system
    NAME                                       READY   STATUS             RESTARTS       AGE
    coredns-6d4b75cb6d-2wrn2                   0/1     Pending            0              23m
    coredns-6d4b75cb6d-vg5f7                   0/1     Pending            0              23m
    etcd-ip-172-31-34-203                      1/1     Running            9 (87s ago)    24m
    kube-apiserver-ip-172-31-34-203            1/1     Running            8 (87s ago)    24m
    kube-controller-manager-ip-172-31-34-203   0/1     CrashLoopBackOff   10 (9s ago)    24m
    kube-proxy-fhl7c                           1/1     Running            11 (68s ago)   23m
    kube-scheduler-ip-172-31-34-203            1/1     Running            10 (64s ago)   24m
    
    NAME                                       READY   STATUS             RESTARTS        AGE
    coredns-6d4b75cb6d-2wrn2                   0/1     Pending            0               25m
    coredns-6d4b75cb6d-vg5f7                   0/1     Pending            0               25m
    etcd-ip-172-31-34-203                      1/1     Running            11 (65s ago)    26m
    kube-apiserver-ip-172-31-34-203            1/1     Running            8 (3m34s ago)   26m
    kube-controller-manager-ip-172-31-34-203   0/1     CrashLoopBackOff   11 (85s ago)    26m
    kube-proxy-fhl7c                           1/1     Running            12 (98s ago)    25m
    kube-scheduler-ip-172-31-34-203            1/1     Running            11 (85s ago)    27m
    
  • amayorga
    Options

    My control-plane node information

  • chrispokorni
    chrispokorni Posts: 2,224
    Options

    Hello @j0hns0n and @amayorga,

    Prior to provisioning the EC2 instances and any SGs needed for the lab environment, did you happen to watch the demo video from the introductory chapter of the course? It may provide tips for configuring the networking required by the EC2 instances to support the Kubernetes installation.

    From all pod listings it seems that the pod network plugin (calico) is not running. It may have not been installed, or it did not start due possible provisioning and networking issues.

    Regards,
    -Chris

  • j0hns0n
    j0hns0n Posts: 5
    Options

    Hello @chrispokorni ,

    many thanks for your reply. I watched the videos three times and read the beginning instructions several times.

    i tried adjusting the script, so that the calico is initialized afterwards. In this case i got the node in a Ready state. Unfortunately the node shuts down after several minutes. I could see that the kube-controller-manager-pod had an error, which seems to cause the whole node to shut down.

    when describing the kube-controller-manager i get the following output:

    Name:                 kube-controller-manager-ip-172-31-15-79
    Namespace:            kube-system
    Priority:             2000001000
    Priority Class Name:  system-node-critical
    Node:                 ip-172-31-15-79/172.31.15.79
    Start Time:           Mon, 29 Aug 2022 19:22:10 +0000
    Labels:               component=kube-controller-manager
                          tier=control-plane
    Annotations:          kubernetes.io/config.hash: 779a2592f7699f3e79c55431781e2f49
                          kubernetes.io/config.mirror: 779a2592f7699f3e79c55431781e2f49
                          kubernetes.io/config.seen: 2022-08-29T19:21:32.165202650Z
                          kubernetes.io/config.source: file
                          seccomp.security.alpha.kubernetes.io/pod: runtime/default
    Status:               Running
    IP:                   172.31.15.79
    IPs:
      IP:           172.31.15.79
    Controlled By:  Node/ip-172-31-15-79
    Containers:
      kube-controller-manager:
        Container ID:  containerd://bef1b64a79c090852db4331f0d7f92fa15347ed5b5a72e4f97920678c948aeb2
        Image:         k8s.gcr.io/kube-controller-manager:v1.24.4
        Image ID:      k8s.gcr.io/kube-controller-manager@sha256:f9400b11d780871e4e87cac8a8d4f8fc6bb83d7793b58981020b43be55f71cb9
        Port:          <none>
        Host Port:     <none>
        Command:
          kube-controller-manager
          --allocate-node-cidrs=true
          --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
          --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
          --bind-address=127.0.0.1
          --client-ca-file=/etc/kubernetes/pki/ca.crt
          --cluster-cidr=192.168.0.0/16
          --cluster-name=kubernetes
          --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
          --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
          --controllers=*,bootstrapsigner,tokencleaner
          --kubeconfig=/etc/kubernetes/controller-manager.conf
          --leader-elect=true
          --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
          --root-ca-file=/etc/kubernetes/pki/ca.crt
          --service-account-private-key-file=/etc/kubernetes/pki/sa.key
          --service-cluster-ip-range=10.96.0.0/12
          --use-service-account-credentials=true
        State:          Waiting
          Reason:       CrashLoopBackOff
        Last State:     Terminated
          Reason:       Error
          Exit Code:    2
          Started:      Mon, 29 Aug 2022 19:30:14 +0000
          Finished:     Mon, 29 Aug 2022 19:30:21 +0000
        Ready:          False
        Restart Count:  10
        Requests:
          cpu:        200m
        Liveness:     http-get https://127.0.0.1:10257/healthz delay=10s timeout=15s period=10s #success=1 #failure=8
        Startup:      http-get https://127.0.0.1:10257/healthz delay=10s timeout=15s period=10s #success=1 #failure=24
        Environment:  <none>
        Mounts:
          /etc/ca-certificates from etc-ca-certificates (ro)
          /etc/kubernetes/controller-manager.conf from kubeconfig (ro)
          /etc/kubernetes/pki from k8s-certs (ro)
          /etc/pki from etc-pki (ro)
          /etc/ssl/certs from ca-certs (ro)
          /usr/libexec/kubernetes/kubelet-plugins/volume/exec from flexvolume-dir (rw)
          /usr/local/share/ca-certificates from usr-local-share-ca-certificates (ro)
          /usr/share/ca-certificates from usr-share-ca-certificates (ro)
    Conditions:
      Type              Status
      Initialized       True 
      Ready             False 
      ContainersReady   False 
      PodScheduled      True 
    Volumes:
      ca-certs:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/ssl/certs
        HostPathType:  DirectoryOrCreate
      etc-ca-certificates:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/ca-certificates
        HostPathType:  DirectoryOrCreate
      etc-pki:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/pki
        HostPathType:  DirectoryOrCreate
      flexvolume-dir:
        Type:          HostPath (bare host directory volume)
        Path:          /usr/libexec/kubernetes/kubelet-plugins/volume/exec
        HostPathType:  DirectoryOrCreate
      k8s-certs:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/kubernetes/pki
        HostPathType:  DirectoryOrCreate
      kubeconfig:
        Type:          HostPath (bare host directory volume)
        Path:          /etc/kubernetes/controller-manager.conf
        HostPathType:  FileOrCreate
      usr-local-share-ca-certificates:
        Type:          HostPath (bare host directory volume)
        Path:          /usr/local/share/ca-certificates
        HostPathType:  DirectoryOrCreate
      usr-share-ca-certificates:
        Type:          HostPath (bare host directory volume)
        Path:          /usr/share/ca-certificates
        HostPathType:  DirectoryOrCreate
    QoS Class:         Burstable
    Node-Selectors:    <none>
    Tolerations:       :NoExecute op=Exists
    Events:
      Type     Reason          Age                    From     Message
      ----     ------          ----                   ----     -------
      Normal   Killing         9m28s                  kubelet  Stopping container kube-controller-manager
      Warning  Unhealthy       9m23s                  kubelet  Startup probe failed: Get "https://127.0.0.1:10257/healthz": dial tcp 127.0.0.1:10257: connect: connection refused
      Normal   SandboxChanged  9m8s                   kubelet  Pod sandbox changed, it will be killed and re-created.
      Warning  BackOff         7m12s (x3 over 7m15s)  kubelet  Back-off restarting failed container
      Normal   Created         6m59s (x2 over 9m8s)   kubelet  Created container kube-controller-manager
      Normal   Started         6m59s (x2 over 9m8s)   kubelet  Started container kube-controller-manager
      Normal   Pulled          6m59s (x2 over 9m8s)   kubelet  Container image "k8s.gcr.io/kube-controller-manager:v1.24.4" already present on machine
      Normal   Killing         76s (x2 over 2m57s)    kubelet  Stopping container kube-controller-manager
      Normal   SandboxChanged  75s (x3 over 4m24s)    kubelet  Pod sandbox changed, it will be killed and re-created.
      Warning  BackOff         58s (x11 over 2m57s)   kubelet  Back-off restarting failed container
      Normal   Pulled          46s (x3 over 4m23s)    kubelet  Container image "k8s.gcr.io/kube-controller-manager:v1.24.4" already present on machine
      Normal   Created         46s (x3 over 4m23s)    kubelet  Created container kube-controller-manager
      Normal   Started         46s (x3 over 4m23s)    kubelet  Started container kube-controller-manager
    
    

    For some reason the startup probe seems to fail. This indicates a possible network issue. But i followed all the steps from the instructions. Do you have any idea?

    Many thanks in advance

  • chrispokorni
    chrispokorni Posts: 2,224
    edited August 2022
    Options

    Hi @j0hns0n,

    Did you experience the same behavior on Kubernetes v1.24.1, as presented by the lab guide?

    A similar behavior was observed some years back prior to a new version release. Since 1.24.4 is currently the last release prior to 1.25.0, I am suspecting some unexpected changes in the code are causing this behavior.

    By delaying calico start, did you eventually see all calico pods in a Running state?

    Can you provide a screenshot of the SG configuration, and the output of:

    kubectl get pods --all-namespaces -o wide
    OR
    kubectl get po -A -owide

    Just to rule out any possible node and pod networking issues.

    Regards,
    -Chris

  • chrispokorni
    chrispokorni Posts: 2,224
    Options

    Hi @amayorga,

    I am glad it all works now, although a bit surprised that containerd did not get installed by the k8scp.sh and k8sWorker.sh scripts.

    If you look at the k8scp.sh and k8sWorker.sh script files, can you find the containerd configuration and instalation commands in each file? If they did not install containerd on either of the nodes, can you provide the content of the cp.out and worker.out files? I'd be curious to see if any errors were generated and recorded.

    Regards,
    -Chris

  • amayorga
    Options

    Hi @chrispokorni.
    I've checked the k8scp.sh script and the containerd installation section is present.

    # Install the containerd software
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
    sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
    sudo apt update
    sudo apt install containerd.io -y
    

    But not worked for me :'(

    Sorry but I don't have the output of the execution in which the installation of containerd failed

    BR
    Alberto

  • j0hns0n
    j0hns0n Posts: 5
    Options

    Hello @chrispokorni & @amayorga ,

    using LTS version 20.04 did the trick :) I dont even have problems with containerd. After running k8scp.sh my control plane is up and running! :smiley:

    ubuntu@ip-172-31-1-47:~$ kubectl get node
    NAME             STATUS   ROLES           AGE     VERSION
    ip-172-31-1-47   Ready    control-plane   3m51s   v1.24.1
    ubuntu@ip-172-31-1-47:~$ kubectl get po -n kube-system
    NAME                                       READY   STATUS    RESTARTS   AGE
    calico-kube-controllers-6799f5f4b4-w45vp   1/1     Running   0          3m36s
    calico-node-ws5dl                          1/1     Running   0          3m36s
    coredns-6d4b75cb6d-64n2n                   1/1     Running   0          3m36s
    coredns-6d4b75cb6d-w5nhv                   1/1     Running   0          3m36s
    etcd-ip-172-31-1-47                        1/1     Running   0          3m50s
    kube-apiserver-ip-172-31-1-47              1/1     Running   0          3m50s
    kube-controller-manager-ip-172-31-1-47     1/1     Running   0          3m50s
    kube-proxy-xx6tr                           1/1     Running   0          3m36s
    kube-scheduler-ip-172-31-1-47              1/1     Running   0          3m52s
    

    Thanks to both of you ;)

Categories

Upcoming Training