Welcome to the Linux Foundation Forum!

Lab 1c Step 5 - DaemonSets on my system lists 0

I ran the yaml code provided from the lab and i get "daemonset.apps/fluentd-ds created". However, when I run "kubectl get ds", i get this:
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
fluentd-ds 0 0 0 0 0 15m

Did I miss something here?

Comments

  • ChristianLacsina964
    edited May 2022

    Can you post the results of the following:

    kubectl describe ds fluentd-ds
    kubectl get pods

    I recall having to diagnose another couple of issues with the daemonset and will look in the meantime at what else could be the issue.

  • ChristianLacsina964
    edited May 2022

    Kubernetes 1.24 (specifically Kubeadm 1.24, which came out last Tuesday) did introduce another control plane node taint, which the current LFS242 docs are not equipped to handle.

    There are a couple of ways to handle this:

    • Remove the other taint with: kubectl taint node --all node-role.kubernetes.io/control-plane-
    • Add another entry under the tolerations key under the DaemonSet's pod template. Basically going from this:
        spec:
          tolerations:
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          terminationGracePeriodSeconds: 30
    

    to

        spec:
          tolerations:
          - key: node-role.kubernetes.io/master
            effect: NoSchedule
          - key: node-role.kubernetes.io/control-plane
            effect: NoSchedule        
          terminationGracePeriodSeconds: 30
    
  • joshl
    joshl Posts: 37

    kubectl describe ds fluentd-ds

    Name: fluentd-ds
    Selector: k8s-app=fluentd-logging
    Node-Selector:
    Labels: k8s-app=fluentd-logging
    version=v1
    Annotations: deprecated.daemonset.template.generation: 1
    Desired Number of Nodes Scheduled: 0
    Current Number of Nodes Scheduled: 0
    Number of Nodes Scheduled with Up-to-date Pods: 0
    Number of Nodes Scheduled with Available Pods: 0
    Number of Nodes Misscheduled: 0
    Pods Status: 0 Running / 0 Waiting / 0 Succeeded / 0 Failed
    Pod Template:
    Labels: k8s-app=fluentd-logging
    version=v1
    Containers:
    fluentd-ds:
    Image: fluent/fluentd:latest
    Port:
    Host Port:
    Limits:
    memory: 200Mi
    Environment:
    FLUENTD_CONF: fluentd-kube.conf
    Mounts:
    /fluentd/etc from fluentd-conf (rw)
    Volumes:
    fluentd-conf:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: fluentd-config
    Optional: false
    Events:

  • joshl
    joshl Posts: 37
    edited May 2022

    $ kubectl get ds

    NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
    fluentd-ds 1 1 0 1 0 36m

    $ kubectl describe ds fluentd-ds

    Name: fluentd-ds
    Selector: k8s-app=fluentd-logging
    Node-Selector:
    Labels: k8s-app=fluentd-logging
    version=v1
    Annotations: deprecated.daemonset.template.generation: 2
    Desired Number of Nodes Scheduled: 1
    Current Number of Nodes Scheduled: 1
    Number of Nodes Scheduled with Up-to-date Pods: 1
    Number of Nodes Scheduled with Available Pods: 0
    Number of Nodes Misscheduled: 0
    Pods Status: 0 Running / 1 Waiting / 0 Succeeded / 0 Failed
    Pod Template:
    Labels: k8s-app=fluentd-logging
    version=v1
    Containers:
    fluentd-ds:
    Image: fluent/fluentd:latest
    Port:
    Host Port:
    Limits:
    memory: 200Mi
    Environment:
    FLUENTD_CONF: fluentd-kube.conf
    Mounts:
    /fluentd/etc from fluentd-conf (rw)
    Volumes:
    fluentd-conf:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: fluentd-config
    Optional: false
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal SuccessfulCreate 4m34s daemonset-controller Created pod: fluentd-ds-d7dzr
    Normal SuccessfulDelete 3m12s daemonset-controller Deleted pod: fluentd-ds-d7dzr
    Normal SuccessfulCreate 3m12s daemonset-controller Created pod: fluentd-ds-tzbqw

  • joshl
    joshl Posts: 37

    This is the pod info: (Note: I have redacted the Node information)

    $ kubectl describe pods
    Name: fluentd-ds-tzbqw
    Namespace: default
    Priority: 0
    Node: ip-##-###-##-###/##.###.##.###
    Start Time: Tue, 10 May 2022 22:24:56 +0000
    Labels: controller-revision-hash=85777dbb94
    k8s-app=fluentd-logging
    pod-template-generation=2
    version=v1
    Annotations:
    Status: Pending
    IP:
    IPs:
    Controlled By: DaemonSet/fluentd-ds
    Containers:
    fluentd-ds:
    Container ID:
    Image: fluent/fluentd:latest
    Image ID:
    Port:
    Host Port:
    State: Waiting
    Reason: ContainerCreating
    Ready: False
    Restart Count: 0
    Limits:
    memory: 200Mi
    Requests:
    memory: 200Mi
    Environment:
    FLUENTD_CONF: fluentd-kube.conf
    Mounts:
    /fluentd/etc from fluentd-conf (rw)
    /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tl27j (ro)
    Conditions:
    Type Status
    Initialized True
    Ready False
    ContainersReady False
    PodScheduled True
    Volumes:
    fluentd-conf:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: fluentd-config
    Optional: false
    kube-api-access-tl27j:
    Type: Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds: 3607
    ConfigMapName: kube-root-ca.crt
    ConfigMapOptional:
    DownwardAPI: true
    QoS Class: Burstable
    Node-Selectors:
    Tolerations: node-role.kubernetes.io/control-plane:NoSchedule
    node-role.kubernetes.io/master:NoSchedule
    node.kubernetes.io/disk-pressure:NoSchedule op=Exists
    node.kubernetes.io/memory-pressure:NoSchedule op=Exists
    node.kubernetes.io/not-ready:NoExecute op=Exists
    node.kubernetes.io/pid-pressure:NoSchedule op=Exists
    node.kubernetes.io/unreachable:NoExecute op=Exists
    node.kubernetes.io/unschedulable:NoSchedule op=Exists
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Scheduled 10m default-scheduler Successfully assigned default/fluentd-ds-tzbqw to ip-##-###-##-###
    Warning FailedMount 10m kubelet MountVolume.SetUp failed for volume "fluentd-conf" : failed to sync configmap cache: timed out waiting for the condition
    Warning FailedCreatePodSandBox 10m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3ac30f1bf7ed618c3f76015d80806cb5bf5cc7714744fed696b2654f2005796f": failed to find network info for sandbox "3ac30f1bf7ed618c3f76015d80806cb5bf5cc7714744fed696b2654f2005796f"
    Warning FailedCreatePodSandBox 9m49s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f741a2b2f91f0d7e9e2b9c306c7a932d320ae6d69ba31fa70d55a973657292b2": failed to find network info for sandbox "f741a2b2f91f0d7e9e2b9c306c7a932d320ae6d69ba31fa70d55a973657292b2"
    Warning FailedCreatePodSandBox 9m36s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e768452f57755756b054aa2e700488e0ace12d3aca567ea40a7967acc13f183f": failed to find network info for sandbox "e768452f57755756b054aa2e700488e0ace12d3aca567ea40a7967acc13f183f"
    Warning FailedCreatePodSandBox 9m24s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5ee1c66e96f93eaa0a01be450ce49d6b8c3207f599d376ec4812b788f3203c1c": failed to find network info for sandbox "5ee1c66e96f93eaa0a01be450ce49d6b8c3207f599d376ec4812b788f3203c1c"
    Warning FailedCreatePodSandBox 9m10s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "194b530ee6e70e08d0f44d6ac349e78044fba6b5837bdcf6377878bc6d98d148": failed to find network info for sandbox "194b530ee6e70e08d0f44d6ac349e78044fba6b5837bdcf6377878bc6d98d148"
    Warning FailedCreatePodSandBox 8m56s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3d0532a468ca557bf72b4d3f6d0177539b077f3d7d637ef84e099cda0e379dee": failed to find network info for sandbox "3d0532a468ca557bf72b4d3f6d0177539b077f3d7d637ef84e099cda0e379dee"
    Warning FailedCreatePodSandBox 8m41s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "47948cc827e4f2a77c1cdfc36a525277595494a3dff8e73fc1a74ce58f4794ab": failed to find network info for sandbox "47948cc827e4f2a77c1cdfc36a525277595494a3dff8e73fc1a74ce58f4794ab"
    Warning FailedCreatePodSandBox 8m26s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "d92b4e4c7ae7b8e40b1c7c2ffbf6f0c4ea455654b08bcaeea20597a8bce95c2f": failed to find network info for sandbox "d92b4e4c7ae7b8e40b1c7c2ffbf6f0c4ea455654b08bcaeea20597a8bce95c2f"
    Warning FailedCreatePodSandBox 8m15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8cb48d3687dcb16e73f1b0b0b29cbe5728c3c90d42a85705485761c2cc9bbfc3": failed to find network info for sandbox "8cb48d3687dcb16e73f1b0b0b29cbe5728c3c90d42a85705485761c2cc9bbfc3"
    Warning FailedCreatePodSandBox 4m40s (x16 over 8m3s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "769e461876b24b0a83930a286cc4a9313916d85c2872a65af200c82eb7cd9857": failed to find network info for sandbox "769e461876b24b0a83930a286cc4a9313916d85c2872a65af200c82eb7cd9857"

  • ChristianLacsina964
    edited May 2022

    Ok, can you do kubectl get nodes, kubectl describe node <name of the node>, and kubectl get pods -n kube-system

  • joshl
    joshl Posts: 37

    $ kubectl get nodes
    NAME STATUS ROLES AGE VERSION
    ip-##-###-##-### Ready control-plane 125m v1.24.0

    $ kubectl describe node ip-##-###-##-###
    Name: ip-##-###-##-###
    Roles: control-plane
    Labels: beta.kubernetes.io/arch=amd64
    beta.kubernetes.io/os=linux
    kubernetes.io/arch=amd64
    kubernetes.io/hostname=ip-##-###-##-###
    kubernetes.io/os=linux
    node-role.kubernetes.io/control-plane=
    node.kubernetes.io/exclude-from-external-load-balancers=
    Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
    node.alpha.kubernetes.io/ttl: 0
    volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp: Tue, 10 May 2022 20:49:38 +0000
    Taints:
    Unschedulable: false
    Lease:
    HolderIdentity: ip-##-###-##-###
    AcquireTime:
    RenewTime: Tue, 10 May 2022 22:55:26 +0000
    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason Message
    ---- ------ ----------------- ------------------ ------ -------
    NetworkUnavailable False Tue, 10 May 2022 21:14:11 +0000 Tue, 10 May 2022 21:14:11 +0000 WeaveIsUp Weave pod has set this
    MemoryPressure False Tue, 10 May 2022 22:51:21 +0000 Tue, 10 May 2022 20:49:35 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
    DiskPressure False Tue, 10 May 2022 22:51:21 +0000 Tue, 10 May 2022 20:49:35 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
    PIDPressure False Tue, 10 May 2022 22:51:21 +0000 Tue, 10 May 2022 20:49:35 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
    Ready True Tue, 10 May 2022 22:51:21 +0000 Tue, 10 May 2022 21:14:15 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
    Addresses:
    InternalIP: ##.###.##.###
    Hostname: ip-##-###-##-###
    Capacity:
    cpu: 2
    ephemeral-storage: 30428560Ki
    hugepages-2Mi: 0
    memory: 8139472Ki
    pods: 110
    Allocatable:
    cpu: 2
    ephemeral-storage: 28042960850
    hugepages-2Mi: 0
    memory: 8037072Ki
    pods: 110
    System Info:
    Machine ID: 5a1fb6aff7f74c4c824ceecb54c86725
    System UUID: ec2423c5-ec39-cf74-f3ce-77cadc5e1cc6
    Boot ID: eda8788e-0a3c-4dd7-ba06-948853e31fee
    Kernel Version: 5.13.0-1022-aws
    OS Image: Ubuntu 20.04.4 LTS
    Operating System: linux
    Architecture: amd64
    Container Runtime Version: containerd://1.6.4
    Kubelet Version: v1.24.0
    Kube-Proxy Version: v1.24.0
    Non-terminated Pods: (9 in total)
    Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
    --------- ---- ------------ ---------- --------------- ------------- ---
    default fluentd-ds-tzbqw 0 (0%) 0 (0%) 200Mi (2%) 200Mi (2%) 30m
    kube-system coredns-6d4b75cb6d-ng4t6 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 125m
    kube-system coredns-6d4b75cb6d-rvnsh 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 125m
    kube-system etcd-ip-##-###-##-### 100m (5%) 0 (0%) 100Mi (1%) 0 (0%) 125m
    kube-system kube-apiserver-ip-##-###-##-### 250m (12%) 0 (0%) 0 (0%) 0 (0%) 125m
    kube-system kube-controller-manager-ip-##-###-##-### 200m (10%) 0 (0%) 0 (0%) 0 (0%) 125m
    kube-system kube-proxy-fjrl9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 125m
    kube-system kube-scheduler-ip-##-###-##-### 100m (5%) 0 (0%) 0 (0%) 0 (0%) 125m
    kube-system weave-net-fpmpt 100m (5%) 0 (0%) 200Mi (2%) 0 (0%) 101m
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    Resource Requests Limits
    -------- -------- ------
    cpu 950m (47%) 0 (0%)
    memory 640Mi (8%) 540Mi (6%)
    ephemeral-storage 0 (0%) 0 (0%)
    hugepages-2Mi 0 (0%) 0 (0%)
    Events:

    $ kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE
    coredns-6d4b75cb6d-ng4t6 0/1 ContainerCreating 0 125m
    coredns-6d4b75cb6d-rvnsh 0/1 ContainerCreating 0 125m
    etcd-ip-##-###-##-### 1/1 Running 0 125m
    kube-apiserver-##-###-##-### 1/1 Running 0 125m
    kube-controller-manager-ip-##-###-##-### 1/1 Running 0 125m
    kube-proxy-fjrl9 1/1 Running 0 125m
    kube-scheduler-ip-##-###-##-### 1/1 Running 0 125m
    weave-net-fpmpt 2/2 Running 1 (101m ago) 101m

  • Those coredns pods are having some trouble. Can you describe one of them with kubectl describe pod coredns-...

  • joshl
    joshl Posts: 37

    i get Error from server (NotFound): pods "coredns-6d4b75cb6d-rvnsh" not found from both.

  • Right, needed to add -n kube-system to look those pods up (they exist in the kube-system namespace)

  • joshl
    joshl Posts: 37

    $ kubectl -n kube-system describe pod coredns-6d4b75cb6d-rvnsh
    Name: coredns-6d4b75cb6d-rvnsh
    Namespace: kube-system
    Priority: 2000000000
    Priority Class Name: system-cluster-critical
    Node: ip-##-###-##-###/##.###.##.###
    Start Time: Tue, 10 May 2022 21:14:15 +0000
    Labels: k8s-app=kube-dns
    pod-template-hash=6d4b75cb6d
    Annotations:
    Status: Pending
    IP:
    IPs:
    Controlled By: ReplicaSet/coredns-6d4b75cb6d
    Containers:
    coredns:
    Container ID:
    Image: k8s.gcr.io/coredns/coredns:v1.8.6
    Image ID:
    Ports: 53/UDP, 53/TCP, 9153/TCP
    Host Ports: 0/UDP, 0/TCP, 0/TCP
    Args:
    -conf
    /etc/coredns/Corefile
    State: Waiting
    Reason: ContainerCreating
    Ready: False
    Restart Count: 0
    Limits:
    memory: 170Mi
    Requests:
    cpu: 100m
    memory: 70Mi
    Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness: http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:
    Mounts:
    /etc/coredns from config-volume (ro)
    /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-srmb5 (ro)
    Conditions:
    Type Status
    Initialized True
    Ready False
    ContainersReady False
    PodScheduled True
    Volumes:
    config-volume:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: coredns
    Optional: false
    kube-api-access-srmb5:
    Type: Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds: 3607
    ConfigMapName: kube-root-ca.crt
    ConfigMapOptional:
    DownwardAPI: true
    QoS Class: Burstable
    Node-Selectors: kubernetes.io/os=linux
    Tolerations: CriticalAddonsOnly op=Exists
    node-role.kubernetes.io/control-plane:NoSchedule
    node-role.kubernetes.io/master:NoSchedule
    node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
    node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Warning FailedCreatePodSandBox 29s (x559 over 123m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5b25c77911d0171de59c384205e0f655dee36a9b312b35beeb727391902074d8": failed to find network info for sandbox "5b25c77911d0171de59c384205e0f655dee36a9b312b35beeb727391902074d8"

  • At this point we're a bit far into a rabbit hole with containerd and coredns - I think switching back over to Docker might be a quicker path. If you can, start anew with a different machine.

    1. Reset your cluster using sudo kubeadm reset and sudo rm -drf /etc/cni/net.d/
    2. Set up docker to use systemd as its cgroup manager:
    cat <<EOF | sudo tee /etc/docker/daemon.json
    {
      "exec-opts": ["native.cgroupdriver=systemd"]
    }
    EOF
    
    sudo systemctl restart docker
    
    1. Install cri-dockerd (replaces dockershim which was removed in v1.24)
    VER=$(curl -s https://api.github.com/repos/Mirantis/cri-dockerd/releases/latest|grep tag_name | cut -d '"' -f 4)
    
    wget https://github.com/Mirantis/cri-dockerd/releases/download/${VER}/cri-dockerd-${VER}-linux-amd64.tar.gz
    
    tar xvf cri-dockerd-${VER}-linux-amd64.tar.gz
    
    sudo mv cri-dockerd /usr/local/bin/
    
    sudo wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/50c048cb54e52cd9058f044671e309e9fbda82e4/packaging/systemd/cri-docker.service
    
    sudo wget https://raw.githubusercontent.com/Mirantis/cri-dockerd/50c048cb54e52cd9058f044671e309e9fbda82e4/packaging/systemd/cri-docker.socket
    
    sudo mv cri-docker.socket cri-docker.service /etc/systemd/system/
    
    sudo sed -i -e 's,/usr/bin/cri-dockerd,/usr/local/bin/cri-dockerd,' /etc/systemd/system/cri-docker.service
    
    sudo mkdir -p /etc/systemd/system/cri-docker.service.d/
    
    cat <<EOF | sudo tee /etc/systemd/system/cri-docker.service.d/cni.conf
    [Service]
    ExecStart=
    ExecStart=/usr/local/bin/cri-dockerd --container-runtime-endpoint fd:// --network-plugin=cni --cni-bin-dir=/opt/cni/bin --cni-cache-dir=/var/lib/cni/cache --cni-conf-dir=/etc/cni/net.d
    EOF
    
    sudo systemctl daemon-reload
    
    sudo systemctl enable cri-docker.service
    
    sudo systemctl enable --now cri-docker.socket
    
    1. Re-initialize the cluster with sudo kubeadm init --cri-socket=unix:///run/cri-dockerd.sock

    Apologies I could not help you get this current cluster working - the v1.24 update introduced quite a lot of changes that affected this particular lab.

  • joshl
    joshl Posts: 37

    Thanks for the detailed steps. What steps should I take after these?

  • Complete the rest of step 3 (summarized below):

    mkdir -p $HOME/.kube
    
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    
    kubectl taint nodes --all node-role.kubernetes.io/master- node-role.kubernetes.io/control-plane- 
    
    kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
    

    Then post the results of kubectl get nodes kubectl describe node <your node> and kubectl get pods -n kube-system

  • joshl
    joshl Posts: 37

    the /etc/kubernetes/admin.conf file does not exist. what shall i do next?

  • It should exist after running sudo kubeadm init --cri-socket=unix:///run/cri-dockerd.sock, can you post the output of that command?

  • joshl
    joshl Posts: 37

    your last command did the trick. this looks much better. thank you very much for the help. if you don't mind, could you please help recap/summarize what happened and how you help resolved this? tbh, i am a bit lost. trying to figure out is i messed up on a step or is this a kube/docker thing.

    ubuntu@ip-##-###-##-###:~$ kubectl describe node ip-##-###-##-###
    Name: ip-##-###-##-###
    Roles: control-plane
    Labels: beta.kubernetes.io/arch=amd64
    beta.kubernetes.io/os=linux
    kubernetes.io/arch=amd64
    kubernetes.io/hostname=ip-##-###-##-###
    kubernetes.io/os=linux
    node-role.kubernetes.io/control-plane=
    node.kubernetes.io/exclude-from-external-load-balancers=
    Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/cri-dockerd.sock
    node.alpha.kubernetes.io/ttl: 0
    volumes.kubernetes.io/controller-managed-attach-detach: true
    CreationTimestamp: Wed, 11 May 2022 17:32:14 +0000
    Taints:
    Unschedulable: false
    Lease:
    HolderIdentity: ip-##-###-##-###
    AcquireTime:
    RenewTime: Wed, 11 May 2022 17:36:22 +0000
    Conditions:
    Type Status LastHeartbeatTime LastTransitionTime Reason Message
    ---- ------ ----------------- ------------------ ------ -------
    NetworkUnavailable False Wed, 11 May 2022 17:35:55 +0000 Wed, 11 May 2022 17:35:55 +0000 WeaveIsUp Weave pod has set this
    MemoryPressure False Wed, 11 May 2022 17:36:22 +0000 Wed, 11 May 2022 17:32:12 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
    DiskPressure False Wed, 11 May 2022 17:36:22 +0000 Wed, 11 May 2022 17:32:12 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
    PIDPressure False Wed, 11 May 2022 17:36:22 +0000 Wed, 11 May 2022 17:32:12 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
    Ready True Wed, 11 May 2022 17:36:22 +0000 Wed, 11 May 2022 17:36:02 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
    Addresses:
    InternalIP: ##.###.##.###
    Hostname: ip-##-###-##-###
    Capacity:
    cpu: 2
    ephemeral-storage: 30428560Ki
    hugepages-2Mi: 0
    memory: 8139472Ki
    pods: 110
    Allocatable:
    cpu: 2
    ephemeral-storage: 28042960850
    hugepages-2Mi: 0
    memory: 8037072Ki
    pods: 110
    System Info:
    Machine ID: 5a1fb6aff7f74c4c824ceecb54c86725
    System UUID: ec2423c5-ec39-cf74-f3ce-77cadc5e1cc6
    Boot ID: eda8788e-0a3c-4dd7-ba06-948853e31fee
    Kernel Version: 5.13.0-1022-aws
    OS Image: Ubuntu 20.04.4 LTS
    Operating System: linux
    Architecture: amd64
    Container Runtime Version: docker://20.10.15
    Kubelet Version: v1.24.0
    Kube-Proxy Version: v1.24.0
    Non-terminated Pods: (8 in total)
    Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
    --------- ---- ------------ ---------- --------------- ------------- ---
    kube-system coredns-6d4b75cb6d-2r5nk 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 4m5s
    kube-system coredns-6d4b75cb6d-c7cx2 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 4m4s
    kube-system etcd-ip-##-###-##-### 100m (5%) 0 (0%) 100Mi (1%) 0 (0%) 4m12s
    kube-system kube-apiserver-ip-##-###-##-### 250m (12%) 0 (0%) 0 (0%) 0 (0%) 4m10s
    kube-system kube-controller-manager-ip-##-###-##-### 200m (10%) 0 (0%) 0 (0%) 0 (0%) 4m10s
    kube-system kube-proxy-fm2vr 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m5s
    kube-system kube-scheduler-ip-##-###-##-### 100m (5%) 0 (0%) 0 (0%) 0 (0%) 4m12s
    kube-system weave-net-gfffz 100m (5%) 0 (0%) 200Mi (2%) 0 (0%) 41s
    Allocated resources:
    (Total limits may be over 100 percent, i.e., overcommitted.)
    Resource Requests Limits
    -------- -------- ------
    cpu 950m (47%) 0 (0%)
    memory 440Mi (5%) 340Mi (4%)
    ephemeral-storage 0 (0%) 0 (0%)
    hugepages-2Mi 0 (0%) 0 (0%)
    Events:
    Type Reason Age From Message
    ---- ------ ---- ---- -------
    Normal Starting 4m3s kube-proxy
    Normal NodeHasSufficientMemory 4m22s (x5 over 4m22s) kubelet Node ip-##-###-##-### status is now: NodeHasSufficientMemory
    Normal NodeHasNoDiskPressure 4m22s (x4 over 4m22s) kubelet Node ip-##-###-##-### status is now: NodeHasNoDiskPressure
    Normal NodeHasSufficientPID 4m22s (x4 over 4m22s) kubelet Node ip-##-###-##-### status is now: NodeHasSufficientPID
    Normal NodeHasSufficientMemory 4m11s kubelet Node ip-##-###-##-### status is now: NodeHasSufficientMemory
    Warning InvalidDiskCapacity 4m11s kubelet invalid capacity 0 on image filesystem
    Normal NodeHasNoDiskPressure 4m11s kubelet Node ip-##-###-##-### status is now: NodeHasNoDiskPressure
    Normal NodeHasSufficientPID 4m11s kubelet Node ip-##-###-##-### status is now: NodeHasSufficientPID
    Normal NodeAllocatableEnforced 4m11s kubelet Updated Node Allocatable limit across pods
    Normal Starting 4m11s kubelet Starting kubelet.
    Normal RegisteredNode 4m6s node-controller Node ip-##-###-##-### event: Registered Node ip-##-###-##-### in Controller
    Normal NodeReady 26s kubelet Node ip-##-###-##-### status is now: NodeReady

    ubuntu@ip-##-###-##-###:~$ kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE
    coredns-6d4b75cb6d-2r5nk 1/1 Running 0 4m24s
    coredns-6d4b75cb6d-c7cx2 1/1 Running 0 4m23s
    etcd-ip-##-###-##-### 1/1 Running 1 4m31s
    kube-apiserver-ip-##-###-##-### 1/1 Running 1 4m29s
    kube-controller-manager-ip-##-###-##-### 1/1 Running 1 4m29s
    kube-proxy-fm2vr 1/1 Running 0 4m24s
    kube-scheduler-ip-##-###-##-### 1/1 Running 1 4m31s
    weave-net-gfffz 2/2 Running 0 60s

    **ubuntu@ip-##-###-##-###:~$ **kubectl cluster-info
    Kubernetes control plane is running at https://##.###.##.###:6443
    CoreDNS is running at https://##.###.##.###:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

    To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

  • ChristianLacsina964
    edited May 2022

    Sure, what you faced was a Kubernetes/Docker issue. You did not miss any steps. I will try to be brief:

    • Kubernetes 1.24 removed the in-code interface (known as Dockershim) that allowed Kubernetes to use Docker, which the lab relied on
    • Your original fix, I believe, had Kubernetes use containerd, which gets installed alongside Docker when you use the get.docker.com script
    • Part of setting up a container runtime to work with Kubernetes involves telling the container runtime to implement a networking strategy (usually CNI) - I do not think that was accounted for during the fix and this caused the network to not initialize properly
    • To fix this, I had you install cri-dockerd, which is the same Dockershim code (as far as I know) but now hosted and maintained outside of the Kubernetes codebase (https://github.com/Mirantis/cri-dockerd). This allows Kubernetes to properly interact with Docker just like we originally wrote the lab to, and we also had it set up to use CNI (its service has --network-plugin=cni for this)
    • After installing cri-dockerd I had you re-init the cluster with Docker as the underlying container runtime through the cri-dockerd socket

    To summarize: It seemed like containerd was not properly configured to support the Kubernetes container network interface plugin (CNI), caused coredns to fail. Reinitializing the cluster to use Docker with CNI support properly enabled was the fix.

  • joshl
    joshl Posts: 37

    Thank you very much again for the clarification. Greatly appreciate the help and quick responses.

  • joshl
    joshl Posts: 37

    so one more thing, will the lab manual be updated? if i would like to redo the lab some time in the future, i want to make sure i perform the correct steps because i am sure i will definitely forget what happened =)
    once it's updated, i would like to get the latest copy.

  • There is a bit of a process that I need to follow in order to get an update out there, I'll see if I can get that started.

    In the meantime, I'll pin or use this post as a reference. Thanks for sticking by! If you have any more issues or quests please continue to post on this forum.

Categories

Upcoming Training