Lab 4.5 stress.yaml on worker node not working

alessandro.cinelli · August 2024

Hello,
I've been stuck for days on this one, 'cause the stress.yaml is not working on the worker node. Without editing the file with the node selector, it runs correctly on my master node, but if I put the worker node name as selector it can be deployed but the pod remains in pending.

Probably is due the fact my worker node remains in the "not-ready" state, even though from my worker node I can run all the commands against the kube-api-server without problems. On both machines I configured 8 gb of ram and 20gb of physical space.
I disabled the firewall, the ips are static, the promiscous mode is enabled and I set a single bridge adapter with "allow all" but still wasn't able to overcome this error.
If I view the logs on the worker node I get that the node was registered correctly.

Thank you

chrispokorni · August 2024

Hi @alessandro.cinelli,

A cluster node in "not-ready" state as you noted, implies a cluster that has not been fully bootstrapped and/or configured, or certain readiness conditions that are no longer met as a result of unfavorable cluster events. Being able to run certain commands from the worker node against the API server is not an indication of either nodes' readiness for scheduling purposes.

To get a better picture of you cluster please provide the outputs (as code formatted text, not as screenshots) of the following commands:

kubectl get nodes -o wide
kubectl get pods -A -o wide
kubectl describe node cp-node-name
kubectl describe node worker-node-name

Regards,
-Chris

alessandro.cinelli · August 2024

Hello Chris,
thank you for your time.

As you may see here, most of the pods are stuck both on master and worker node

Thank you

chrispokorni · August 2024

Hi @alessandro.cinelli,

Let's cleanup your cluster by removing the basicpod Pod. Also remove the nginx, try1 and stressmeout Deployments. After their respective Pods are completely removed and resources released, please provide another output as I requested earlier from the 4 commands (as code formatted text for better readability, NOT screenshots).

In addition, provide the output of
kubectl get svc,ap -A

Regards,
-Chris

alessandro.cinelli · August 2024

Hello @chrispokorni ,
I had to force delete all the pods cause they got stuck in "terminating" state for some reason.

kubectl get nodes -o wide

NAME          STATUS     ROLES           AGE   VERSION   INTERNAL-IP     EXTERNAL-IP   OS-IMAGE             KERNEL-VERSION      CONTAINER-RUNTIME
master-node   Ready      control-plane   12d   v1.30.1   192.168.1.247   <none>        Ubuntu 20.04.6 LTS   5.4.0-192-generic   containerd://1.7.19
worker-node   NotReady   <none>          12d   v1.30.1   192.168.1.246   <none>        Ubuntu 20.04.6 ```
LTS   5.4.0-192-generic   containerd://1.7.19

kubectl get pods -A -o wide

NAMESPACE     NAME                                  READY   STATUS        RESTARTS        AGE     IP              NODE          NOMINATED NODE   READINESS GATES
default       registry-7f7bcbd5bb-kjphz             0/1     Pending       0               6m20s   <none>          <none>        <none>           <none>
default       registry-7f7bcbd5bb-qlx6g             0/1     Error         0               17h     <none>          master-node   <none>           <none>
default       registry-7f7bcbd5bb-x9xg5             1/1     Terminating   0               3d20h   10.0.1.13       worker-node   <none>           <none>
kube-system   cilium-89jvg                          1/1     Running       145 (10d ago)   12d     192.168.1.246   worker-node   <none>           <none>
kube-system   cilium-envoy-mccfq                    1/1     Running       11 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   cilium-envoy-rk8jj                    1/1     Running       3 (5d21h ago)   12d     192.168.1.246   worker-node   <none>           <none>
kube-system   cilium-operator-7ddc48bb97-4b69m      1/1     Running       45 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   cilium-pv27t                          1/1     Running       12 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   coredns-7db6d8ff4d-jrn5s              1/1     Running       1 (7m3s ago)    17h     10.0.0.238      master-node   <none>           <none>
kube-system   coredns-7db6d8ff4d-pmp9m              1/1     Running       1 (7m3s ago)    17h     10.0.0.90       master-node   <none>           <none>
kube-system   etcd-master-node                      1/1     Running       17 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   kube-apiserver-master-node            1/1     Running       16 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   kube-controller-manager-master-node   1/1     Running       42 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   kube-proxy-fqxsf                      1/1     Running       11 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>
kube-system   kube-proxy-pjrjm                      1/1     Running       3 (5d21h ago)   12d     192.168.1.246   worker-node   <none>           <none>
kube-system   kube-scheduler-master-node            1/1     Running       45 (7m3s ago)   12d     192.168.1.247   master-node   <none>           <none>

alessandro.cinelli · August 2024

kubectl describe node cp-node-name

Name:               master-node
Roles:              control-plane
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=master-node
                    kubernetes.io/os=linux
                    node-role.kubernetes.io/control-plane=
                    node.kubernetes.io/exclude-from-external-load-balancers=
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 09 Aug 2024 17:07:12 +0000
Taints:             node.kubernetes.io/disk-pressure:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  master-node
  AcquireTime:     <unset>
  RenewTime:       Thu, 22 Aug 2024 13:55:45 +0000
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Fri, 09 Aug 2024 17:08:38 +0000   Fri, 09 Aug 2024 17:08:38 +0000   CiliumIsUp                   Cilium is running on this node
  MemoryPressure       False   Thu, 22 Aug 2024 13:55:35 +0000   Thu, 22 Aug 2024 13:45:53 +0000   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         True    Thu, 22 Aug 2024 13:55:35 +0000   Thu, 22 Aug 2024 13:48:11 +0000   KubeletHasDiskPressure       kubelet has disk pressure
  PIDPressure          False   Thu, 22 Aug 2024 13:55:35 +0000   Thu, 22 Aug 2024 13:45:53 +0000   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready                True    Thu, 22 Aug 2024 13:55:35 +0000   Thu, 22 Aug 2024 13:46:04 +0000   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  192.168.1.247
  Hostname:    master-node
Capacity:
  cpu:                2
  ephemeral-storage:  10218772Ki
  hugepages-2Mi:      0
  memory:             8136660Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  9417620260
  hugepages-2Mi:      0
  memory:             8034260Ki
  pods:               110
System Info:
  Machine ID:                 36c8c7fa32cb4042b079d8b23e47e39b
  System UUID:                8a163bd9-1515-0f4b-b635-f21ee64703ac
  Boot ID:                    3ef750e2-a915-4953-9a2c-15784d2a6cc8
  Kernel Version:             5.4.0-192-generic
  OS Image:                   Ubuntu 20.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.19
  Kubelet Version:            v1.30.1
  Kube-Proxy Version:         v1.30.1
PodCIDR:                      10.0.0.0/24
PodCIDRs:                     10.0.0.0/24
Non-terminated Pods:          (10 in total)
  Namespace                   Name                                   CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                                   ------------  ----------  ---------------  -------------  ---
  kube-system                 cilium-envoy-mccfq                     0 (0%)        0 (0%)      0 (0%)           0 (0%)         12d
  kube-system                 cilium-operator-7ddc48bb97-4b69m       0 (0%)        0 (0%)      0 (0%)           0 (0%)         12d
  kube-system                 cilium-pv27t                           100m (5%)     0 (0%)      10Mi (0%)        0 (0%)         12d
  kube-system                 coredns-7db6d8ff4d-jrn5s               100m (5%)     0 (0%)      70Mi (0%)        170Mi (2%)     17h
  kube-system                 coredns-7db6d8ff4d-pmp9m               100m (5%)     0 (0%)      70Mi (0%)        170Mi (2%)     17h
  kube-system                 etcd-master-node                       100m (5%)     0 (0%)      100Mi (1%)       0 (0%)         12d
  kube-system                 kube-apiserver-master-node             250m (12%)    0 (0%)      0 (0%)           0 (0%)         12d
  kube-system                 kube-controller-manager-master-node    200m (10%)    0 (0%)      0 (0%)           0 (0%)         12d
  kube-system                 kube-proxy-fqxsf                       0 (0%)        0 (0%)      0 (0%)           0 (0%)         12d
  kube-system                 kube-scheduler-master-node             100m (5%)     0 (0%)      0 (0%)           0 (0%)         12d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests    Limits
  --------           --------    ------
  cpu                950m (47%)  0 (0%)
  memory             250Mi (3%)  340Mi (4%)
  ephemeral-storage  0 (0%)      0 (0%)
  hugepages-2Mi      0 (0%)      0 (0%)
Events:
  Type     Reason                   Age                    From             Message
  ----     ------                   ----                   ----             -------
  Normal   Starting                 7m49s                  kube-proxy
  Normal   NodeHasSufficientMemory  10m (x23 over 22h)     kubelet          Node master-node status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    10m (x23 over 22h)     kubelet          Node master-node status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     10m (x23 over 22h)     kubelet          Node master-node status is now: NodeHasSufficientPID
  Warning  FreeDiskSpaceFailed      9m59s                  kubelet          Failed to garbage collect required amount of images. Attempted to free 462125465 bytes, but only found 0 bytes eligible to free.
  Warning  ImageGCFailed            9m59s                  kubelet          Failed to garbage collect required amount of images. Attempted to free 462125465 bytes, but only found 0 bytes eligible to free.
  Normal   NodeReady                9m49s (x35 over 22h)   kubelet          Node master-node status is now: NodeReady
  Warning  EvictionThresholdMet     9m41s                  kubelet          Attempting to reclaim ephemeral-storage
  Normal   Starting                 7m58s                  kubelet          Starting kubelet.
  Warning  InvalidDiskCapacity      7m58s                  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  7m58s (x8 over 7m58s)  kubelet          Node master-node status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    7m58s (x7 over 7m58s)  kubelet          Node master-node status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     7m58s (x7 over 7m58s)  kubelet          Node master-node status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  7m58s                  kubelet          Updated Node Allocatable limit across pods
  Normal   RegisteredNode           7m17s                  node-controller  Node master-node event: Registered Node master-node in Controller
  Warning  FreeDiskSpaceFailed      2m56s                  kubelet          Failed to garbage collect required amount of images. Attempted to free 614336921 bytes, but only found 0 bytes eligible to free.

alessandro.cinelli · August 2024

kubectl describe node worker-node-name

Name:               worker-node
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/os=linux
                    kubernetes.io/arch=amd64
                    kubernetes.io/hostname=worker-node
                    kubernetes.io/os=linux
Annotations:        kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
                    node.alpha.kubernetes.io/ttl: 0
                    volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp:  Fri, 09 Aug 2024 17:53:28 +0000
Taints:             node.kubernetes.io/unreachable:NoExecute
                    node.cilium.io/agent-not-ready:NoSchedule
                    node.kubernetes.io/unreachable:NoSchedule
Unschedulable:      false
Lease:
  HolderIdentity:  worker-node
  AcquireTime:     <unset>
  RenewTime:       Sun, 18 Aug 2024 20:02:33 +0000
Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Fri, 09 Aug 2024 17:54:51 +0000   Fri, 09 Aug 2024 17:54:51 +0000   CiliumIsUp          Cilium is running on this node
  MemoryPressure       Unknown   Sun, 18 Aug 2024 19:59:12 +0000   Sun, 18 Aug 2024 20:06:59 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Sun, 18 Aug 2024 19:59:12 +0000   Sun, 18 Aug 2024 20:06:59 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Sun, 18 Aug 2024 19:59:12 +0000   Sun, 18 Aug 2024 20:06:59 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Sun, 18 Aug 2024 19:59:12 +0000   Sun, 18 Aug 2024 20:06:59 +0000   NodeStatusUnknown   Kubelet stopped posting node status.
Addresses:
  InternalIP:  192.168.1.246
  Hostname:    worker-node
Capacity:
  cpu:                2
  ephemeral-storage:  10206484Ki
  hugepages-2Mi:      0
  memory:             4014036Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  9406295639
  hugepages-2Mi:      0
  memory:             3911636Ki
  pods:               110
System Info:
  Machine ID:                 082e100535c54c5986ddff0a8176ab60
  System UUID:                ffedeeca-323b-884d-a0fe-9218f3961f9a
  Boot ID:                    0ef64691-265d-4e12-bbbc-46a80c288f22
  Kernel Version:             5.4.0-192-generic
  OS Image:                   Ubuntu 20.04.6 LTS
  Operating System:           linux
  Architecture:               amd64
  Container Runtime Version:  containerd://1.7.19
  Kubelet Version:            v1.30.1
  Kube-Proxy Version:         v1.30.1
PodCIDR:                      10.0.1.0/24
PodCIDRs:                     10.0.1.0/24
Non-terminated Pods:          (4 in total)
  Namespace                   Name                         CPU Requests  CPU Limits  Memory Requests  Memory Limits  Age
  ---------                   ----                         ------------  ----------  ---------------  -------------  ---
  default                     registry-7f7bcbd5bb-x9xg5    0 (0%)        0 (0%)      0 (0%)           0 (0%)         3d20h
  kube-system                 cilium-89jvg                 100m (5%)     0 (0%)      10Mi (0%)        0 (0%)         12d
  kube-system                 cilium-envoy-rk8jj           0 (0%)        0 (0%)      0 (0%)           0 (0%)         12d
  kube-system                 kube-proxy-pjrjm             0 (0%)        0 (0%)      0 (0%)           0 (0%)         12d
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  Resource           Requests   Limits
  --------           --------   ------
  cpu                100m (5%)  0 (0%)
  memory             10Mi (0%)  0 (0%)
  ephemeral-storage  0 (0%)     0 (0%)
  hugepages-2Mi      0 (0%)     0 (0%)
Events:
  Type    Reason          Age   From             Message
  ----    ------          ----  ----             -------
  Normal  RegisteredNode  8m3s  node-controller  Node worker-node event: Registered Node worker-node in Controller

kubectl get svc,ap -A
error: the server doesn't have a resource type "ap" (probably was typo?)

chrispokorni · August 2024

Hi @alessandro.cinelli,

Yes, it was a typo on my part... it was meant to be svc,ep for Services and Endpoints

It is unclear to me why the registry Deployment shows 3 replicas, the same as the nginx Deployment we removed earlier. Perhaps a describe of those Pods reveals why they are not in Running state; then try removing the registry Deployment as well (it may require a force deletion of some of its Pods).

It is also odd that Kubelet is not able to complete its Garbage collection cycle to reclaim disk space. What do the Kubelet logs show on the control plane node?
journalctl -u kubelet | less

What images are stored on the control plane node?
sudo podman images
sudo crictl images

How is disk allocated to the VMs (pre-allocated full size)?

Regards,
-Chris

alessandro.cinelli · August 2024

Hi @chrispokorni
journalctl -u kubelet | less
I took the last lines
d" interval="800ms" Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.611796 684 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"fca7dd05dc29cb147dc0e5115690f4a297601411d4d8fd37919c7e9a19b4b212\": not found" podSandboxID="fca7dd05dc29cb147dc0e5115690f4a297601411d4d8fd37919c7e9a19b4b212" Aug 10 09:06:06 master-node kubelet[684]: I0810 09:06:06.613663 684 kubelet_node_status.go:73] "Attempting to register node" node="master-node" Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.613868 684 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://192.168.1.247:6443/api/v1/nodes\": dial tcp 192.168.1.247:6443: connect: connection refused" node="master-node" Aug 10 09:06:06 master-node kubelet[684]: W0810 09:06:06.681079 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Service: Get "https://192.168.1.247:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.681147 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.247:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: W0810 09:06:06.994227 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.994318 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: W0810 09:06:07.191887 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.CSIDriver: Get "https://192.168.1.247:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.199179 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://192.168.1.247:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.287256 684 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.1.247:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-node?timeout=10s\": dial tcp 192.168.1.247:6443: connect: connection refused" interval="1.6s" Aug 10 09:06:07 master-node kubelet[684]: W0810 09:06:07.292373 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.RuntimeClass: Get "https://192.168.1.247:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.292469 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://192.168.1.247:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.419159 684 kubelet_node_status.go:73] "Attempting to register node" node="master-node" Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.419953 684 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://192.168.1.247:6443/api/v1/nodes\": dial tcp 192.168.1.247:6443: connect: connection refused" node="master-node" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.587543 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-controller-manager-master-node" containerName="kube-controller-manager" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.588138 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-apiserver-master-node" containerName="kube-apiserver" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.602022 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/etcd-master-node" containerName="etcd" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.619495 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-scheduler-master-node" containerName="kube-scheduler" Aug 10 09:06:08 master-node kubelet[684]: W0810 09:06:08.636848 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:08 master-node kubelet[684]: E0810 09:06:08.636933 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name=master-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused
Aug 10 09:06:08 master-node kubelet[684]: E0810 09:06:08.888865 684 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.1.247:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-node?timeout=10s\": dial tcp 192.168.1.247:6443:

alessandro.cinelli · August 2024

sudo podman images

WARN[0000] Using cgroups-v1 which is deprecated in favor of cgroups-v2 with Podman v5 and will be removed in a future version. Set environment variable `PODMAN_IGNORE_CGROUPSV1_WARNING` to hide this warning.
REPOSITORY                  TAG         IMAGE ID      CREATED      SIZE
10.97.40.62:5000/simpleapp  latest      ad2f4faa05bd  4 days ago   1.04 GB
localhost/simpleapp         latest      ad2f4faa05bd  4 days ago   1.04 GB
docker.io/library/python    3           0218518c77be  2 weeks ago  1.04 GB
10.97.40.62:5000/tagtest    latest      324bc02ae123  4 weeks ago  8.08 MB

sudo crictl images

IMAGE                                     TAG                 IMAGE ID            SIZE
quay.io/cilium/cilium-envoy               <none>              b9d596d6e2d4f       62.1MB
quay.io/cilium/cilium                     <none>              1e01581279341       223MB
quay.io/cilium/operator-generic           <none>              e7e6117055af8       31.1MB
registry.k8s.io/coredns/coredns           v1.11.1             cbb01a7bd410d       18.2MB
registry.k8s.io/etcd                      3.5.12-0            3861cfcd7c04c       57.2MB
registry.k8s.io/kube-apiserver            v1.30.1             91be940803172       32.8MB
registry.k8s.io/kube-controller-manager   v1.30.1             25a1387cdab82       31.1MB
registry.k8s.io/kube-proxy                v1.30.1             747097150317f       29MB
registry.k8s.io/kube-scheduler            v1.30.1             a52dc94f0a912       19.3MB
registry.k8s.io/pause                     3.8                 4873874c08efc       311kB

The disk is supposed to be dynamically allocated actually.

Thank you again

chrispokorni · August 2024

Hi @alessandro.cinelli,

The disk is supposed to be dynamically allocated actually.

This is your problem. Kubelet only sees the allocated disk space, not what it can receive altogether.
Please fully allocate the disk for both VMs to prevent Kubelet panics.

Regards,
-Chris

alessandro.cinelli · August 2024

Hi @chrispokorni,
as you suggested, I changed the disk to pre-allocated one but still getting the same issue. Is that something else I'm supposed to do?

chrispokorni · August 2024

Hi @alessandro.cinelli,

Have the VMs been restarted? What are the ephemeral-storage values under Capacity and Allocatable respectively when describing the 2 nodes?

Regards,
-Chris

alessandro.cinelli · September 2024

Hi @chrispokorni ,
yes, the VMs have been restarted.

Worker node:

Capacity:
  cpu:                2
  ephemeral-storage:  10206484Ki
  hugepages-2Mi:      0
  memory:             4014036Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  9406295639
  hugepages-2Mi:      0
  memory:             3911636Ki
  pods:               110

Master Node

Capacity:
  cpu:                2
  ephemeral-storage:  10218772Ki
  hugepages-2Mi:      0
  memory:             8136664Ki
  pods:               110
Allocatable:
  cpu:                2
  ephemeral-storage:  9417620260
  hugepages-2Mi:      0
  memory:             8034264Ki
  pods:               110

But if I look to the latest events in master node i can see this, and probably I shouldn't.

  Warning  InvalidDiskCapacity      5m32s                  kubelet          invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  5m32s (x8 over 5m32s)  kubelet          Node master-node status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    5m32s (x7 over 5m32s)  kubelet          Node master-node status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     5m32s (x7 over 5m32s)  kubelet          Node master-node status is now: NodeHasSufficientPID
  Normal   NodeAllocatableEnforced  5m32s                  kubelet          Updated Node Allocatable limit across pods
  Normal   RegisteredNode           2m38s                  node-controller  Node master-node event: Registered Node master-node in Controller
  Warning  FreeDiskSpaceFailed      30s                    kubelet          Failed to garbage collect required amount of images. Attempted to free 722049433 bytes, but only found 0 bytes eligible to free.

chrispokorni · September 2024

Hi @alessandro.cinelli,

What is the size of the vdisk on each VM?
According to your output, the vdisks seem to be about 10 GB each. The lab guide recommendation is 20+ GB per VM. For earlier Kubernetes releases 10 GB per VM used to be just enough to run the lab exercises - Kubernetes was requiring less disk space, and the container images were somewhat smaller in size.

Regards,
-Chris

alessandro.cinelli · September 2024

Hi @chrispokorni
It's actually 20 gbs for both master and worker node but still it says 10 gbs of space available.

chrispokorni · September 2024

Hi @alessandro.cinelli,

Key here is what the kubelet node agent sees. If it sees 10 GB, it only works with 10 GB. It seems to be unaware of the additional 10 GB of storage.

I'd be curious what does the regular user (non-root) see on the guest OS df -h --total.

If the vdisks were extended after the OS installation, then this behavior would be expected, as the file system would be unaware of the additional vdisk space as well, requiring a file system resize to expand to the available vdisk size.

Regards,
-Chris

alessandro.cinelli · September 2024

Hello @chrispokorni ,
Thank you for helping me out again. I ended up doing all over again from scratch since there was something wrong with my worker vm and virtual drive but I finally managed to solve the issue and moving forward the remaining steps of the lab.

Thank you

Lab 4.5 stress.yaml on worker node not working

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)