Welcome to the Linux Foundation Forum!

Lab 4.5 stress.yaml on worker node not working

Hello,
I've been stuck for days on this one, 'cause the stress.yaml is not working on the worker node. Without editing the file with the node selector, it runs correctly on my master node, but if I put the worker node name as selector it can be deployed but the pod remains in pending.

Probably is due the fact my worker node remains in the "not-ready" state, even though from my worker node I can run all the commands against the kube-api-server without problems. On both machines I configured 8 gb of ram and 20gb of physical space.
I disabled the firewall, the ips are static, the promiscous mode is enabled and I set a single bridge adapter with "allow all" but still wasn't able to overcome this error.
If I view the logs on the worker node I get that the node was registered correctly.

Thank you

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,441
    edited August 2024

    Hi @alessandro.cinelli,

    A cluster node in "not-ready" state as you noted, implies a cluster that has not been fully bootstrapped and/or configured, or certain readiness conditions that are no longer met as a result of unfavorable cluster events. Being able to run certain commands from the worker node against the API server is not an indication of either nodes' readiness for scheduling purposes.

    To get a better picture of you cluster please provide the outputs (as code formatted text, not as screenshots) of the following commands:

    1. kubectl get nodes -o wide
    2. kubectl get pods -A -o wide
    3. kubectl describe node cp-node-name
    4. kubectl describe node worker-node-name

    Regards,
    -Chris

  • Hello Chris,
    thank you for your time.


    As you may see here, most of the pods are stuck both on master and worker node


    Thank you

  • Posts: 2,441

    Hi @alessandro.cinelli,

    Let's cleanup your cluster by removing the basicpod Pod. Also remove the nginx, try1 and stressmeout Deployments. After their respective Pods are completely removed and resources released, please provide another output as I requested earlier from the 4 commands (as code formatted text for better readability, NOT screenshots).

    In addition, provide the output of
    kubectl get svc,ap -A

    Regards,
    -Chris

  • Hello @chrispokorni ,
    I had to force delete all the pods cause they got stuck in "terminating" state for some reason.

    kubectl get nodes -o wide

    1. NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
    2. master-node Ready control-plane 12d v1.30.1 192.168.1.247 <none> Ubuntu 20.04.6 LTS 5.4.0-192-generic containerd://1.7.19
    3. worker-node NotReady <none> 12d v1.30.1 192.168.1.246 <none> Ubuntu 20.04.6 ```
    4. LTS 5.4.0-192-generic containerd://1.7.19

    kubectl get pods -A -o wide

    1. NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
    2. default registry-7f7bcbd5bb-kjphz 0/1 Pending 0 6m20s <none> <none> <none> <none>
    3. default registry-7f7bcbd5bb-qlx6g 0/1 Error 0 17h <none> master-node <none> <none>
    4. default registry-7f7bcbd5bb-x9xg5 1/1 Terminating 0 3d20h 10.0.1.13 worker-node <none> <none>
    5. kube-system cilium-89jvg 1/1 Running 145 (10d ago) 12d 192.168.1.246 worker-node <none> <none>
    6. kube-system cilium-envoy-mccfq 1/1 Running 11 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    7. kube-system cilium-envoy-rk8jj 1/1 Running 3 (5d21h ago) 12d 192.168.1.246 worker-node <none> <none>
    8. kube-system cilium-operator-7ddc48bb97-4b69m 1/1 Running 45 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    9. kube-system cilium-pv27t 1/1 Running 12 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    10. kube-system coredns-7db6d8ff4d-jrn5s 1/1 Running 1 (7m3s ago) 17h 10.0.0.238 master-node <none> <none>
    11. kube-system coredns-7db6d8ff4d-pmp9m 1/1 Running 1 (7m3s ago) 17h 10.0.0.90 master-node <none> <none>
    12. kube-system etcd-master-node 1/1 Running 17 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    13. kube-system kube-apiserver-master-node 1/1 Running 16 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    14. kube-system kube-controller-manager-master-node 1/1 Running 42 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    15. kube-system kube-proxy-fqxsf 1/1 Running 11 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
    16. kube-system kube-proxy-pjrjm 1/1 Running 3 (5d21h ago) 12d 192.168.1.246 worker-node <none> <none>
    17. kube-system kube-scheduler-master-node 1/1 Running 45 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
  • kubectl describe node cp-node-name

    1. Name: master-node
    2. Roles: control-plane
    3. Labels: beta.kubernetes.io/arch=amd64
    4. beta.kubernetes.io/os=linux
    5. kubernetes.io/arch=amd64
    6. kubernetes.io/hostname=master-node
    7. kubernetes.io/os=linux
    8. node-role.kubernetes.io/control-plane=
    9. node.kubernetes.io/exclude-from-external-load-balancers=
    10. Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
    11. node.alpha.kubernetes.io/ttl: 0
    12. volumes.kubernetes.io/controller-managed-attach-detach: true
    13. CreationTimestamp: Fri, 09 Aug 2024 17:07:12 +0000
    14. Taints: node.kubernetes.io/disk-pressure:NoSchedule
    15. Unschedulable: false
    16. Lease:
    17. HolderIdentity: master-node
    18. AcquireTime: <unset>
    19. RenewTime: Thu, 22 Aug 2024 13:55:45 +0000
    20. Conditions:
    21. Type Status LastHeartbeatTime LastTransitionTime Reason Message
    22. ---- ------ ----------------- ------------------ ------ -------
    23. NetworkUnavailable False Fri, 09 Aug 2024 17:08:38 +0000 Fri, 09 Aug 2024 17:08:38 +0000 CiliumIsUp Cilium is running on this node
    24. MemoryPressure False Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:45:53 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
    25. DiskPressure True Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:48:11 +0000 KubeletHasDiskPressure kubelet has disk pressure
    26. PIDPressure False Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:45:53 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
    27. Ready True Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:46:04 +0000 KubeletReady kubelet is posting ready status
    28. Addresses:
    29. InternalIP: 192.168.1.247
    30. Hostname: master-node
    31. Capacity:
    32. cpu: 2
    33. ephemeral-storage: 10218772Ki
    34. hugepages-2Mi: 0
    35. memory: 8136660Ki
    36. pods: 110
    37. Allocatable:
    38. cpu: 2
    39. ephemeral-storage: 9417620260
    40. hugepages-2Mi: 0
    41. memory: 8034260Ki
    42. pods: 110
    43. System Info:
    44. Machine ID: 36c8c7fa32cb4042b079d8b23e47e39b
    45. System UUID: 8a163bd9-1515-0f4b-b635-f21ee64703ac
    46. Boot ID: 3ef750e2-a915-4953-9a2c-15784d2a6cc8
    47. Kernel Version: 5.4.0-192-generic
    48. OS Image: Ubuntu 20.04.6 LTS
    49. Operating System: linux
    50. Architecture: amd64
    51. Container Runtime Version: containerd://1.7.19
    52. Kubelet Version: v1.30.1
    53. Kube-Proxy Version: v1.30.1
    54. PodCIDR: 10.0.0.0/24
    55. PodCIDRs: 10.0.0.0/24
    56. Non-terminated Pods: (10 in total)
    57. Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
    58. --------- ---- ------------ ---------- --------------- ------------- ---
    59. kube-system cilium-envoy-mccfq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
    60. kube-system cilium-operator-7ddc48bb97-4b69m 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
    61. kube-system cilium-pv27t 100m (5%) 0 (0%) 10Mi (0%) 0 (0%) 12d
    62. kube-system coredns-7db6d8ff4d-jrn5s 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 17h
    63. kube-system coredns-7db6d8ff4d-pmp9m 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 17h
    64. kube-system etcd-master-node 100m (5%) 0 (0%) 100Mi (1%) 0 (0%) 12d
    65. kube-system kube-apiserver-master-node 250m (12%) 0 (0%) 0 (0%) 0 (0%) 12d
    66. kube-system kube-controller-manager-master-node 200m (10%) 0 (0%) 0 (0%) 0 (0%) 12d
    67. kube-system kube-proxy-fqxsf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
    68. kube-system kube-scheduler-master-node 100m (5%) 0 (0%) 0 (0%) 0 (0%) 12d
    69. Allocated resources:
    70. (Total limits may be over 100 percent, i.e., overcommitted.)
    71. Resource Requests Limits
    72. -------- -------- ------
    73. cpu 950m (47%) 0 (0%)
    74. memory 250Mi (3%) 340Mi (4%)
    75. ephemeral-storage 0 (0%) 0 (0%)
    76. hugepages-2Mi 0 (0%) 0 (0%)
    77. Events:
    78. Type Reason Age From Message
    79. ---- ------ ---- ---- -------
    80. Normal Starting 7m49s kube-proxy
    81. Normal NodeHasSufficientMemory 10m (x23 over 22h) kubelet Node master-node status is now: NodeHasSufficientMemory
    82. Normal NodeHasNoDiskPressure 10m (x23 over 22h) kubelet Node master-node status is now: NodeHasNoDiskPressure
    83. Normal NodeHasSufficientPID 10m (x23 over 22h) kubelet Node master-node status is now: NodeHasSufficientPID
    84. Warning FreeDiskSpaceFailed 9m59s kubelet Failed to garbage collect required amount of images. Attempted to free 462125465 bytes, but only found 0 bytes eligible to free.
    85. Warning ImageGCFailed 9m59s kubelet Failed to garbage collect required amount of images. Attempted to free 462125465 bytes, but only found 0 bytes eligible to free.
    86. Normal NodeReady 9m49s (x35 over 22h) kubelet Node master-node status is now: NodeReady
    87. Warning EvictionThresholdMet 9m41s kubelet Attempting to reclaim ephemeral-storage
    88. Normal Starting 7m58s kubelet Starting kubelet.
    89. Warning InvalidDiskCapacity 7m58s kubelet invalid capacity 0 on image filesystem
    90. Normal NodeHasSufficientMemory 7m58s (x8 over 7m58s) kubelet Node master-node status is now: NodeHasSufficientMemory
    91. Normal NodeHasNoDiskPressure 7m58s (x7 over 7m58s) kubelet Node master-node status is now: NodeHasNoDiskPressure
    92. Normal NodeHasSufficientPID 7m58s (x7 over 7m58s) kubelet Node master-node status is now: NodeHasSufficientPID
    93. Normal NodeAllocatableEnforced 7m58s kubelet Updated Node Allocatable limit across pods
    94. Normal RegisteredNode 7m17s node-controller Node master-node event: Registered Node master-node in Controller
    95. Warning FreeDiskSpaceFailed 2m56s kubelet Failed to garbage collect required amount of images. Attempted to free 614336921 bytes, but only found 0 bytes eligible to free.
  • kubectl describe node worker-node-name

    1. Name: worker-node
    2. Roles: <none>
    3. Labels: beta.kubernetes.io/arch=amd64
    4. beta.kubernetes.io/os=linux
    5. kubernetes.io/arch=amd64
    6. kubernetes.io/hostname=worker-node
    7. kubernetes.io/os=linux
    8. Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock
    9. node.alpha.kubernetes.io/ttl: 0
    10. volumes.kubernetes.io/controller-managed-attach-detach: true
    11. CreationTimestamp: Fri, 09 Aug 2024 17:53:28 +0000
    12. Taints: node.kubernetes.io/unreachable:NoExecute
    13. node.cilium.io/agent-not-ready:NoSchedule
    14. node.kubernetes.io/unreachable:NoSchedule
    15. Unschedulable: false
    16. Lease:
    17. HolderIdentity: worker-node
    18. AcquireTime: <unset>
    19. RenewTime: Sun, 18 Aug 2024 20:02:33 +0000
    20. Conditions:
    21. Type Status LastHeartbeatTime LastTransitionTime Reason Message
    22. ---- ------ ----------------- ------------------ ------ -------
    23. NetworkUnavailable False Fri, 09 Aug 2024 17:54:51 +0000 Fri, 09 Aug 2024 17:54:51 +0000 CiliumIsUp Cilium is running on this node
    24. MemoryPressure Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status.
    25. DiskPressure Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status.
    26. PIDPressure Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status.
    27. Ready Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status.
    28. Addresses:
    29. InternalIP: 192.168.1.246
    30. Hostname: worker-node
    31. Capacity:
    32. cpu: 2
    33. ephemeral-storage: 10206484Ki
    34. hugepages-2Mi: 0
    35. memory: 4014036Ki
    36. pods: 110
    37. Allocatable:
    38. cpu: 2
    39. ephemeral-storage: 9406295639
    40. hugepages-2Mi: 0
    41. memory: 3911636Ki
    42. pods: 110
    43. System Info:
    44. Machine ID: 082e100535c54c5986ddff0a8176ab60
    45. System UUID: ffedeeca-323b-884d-a0fe-9218f3961f9a
    46. Boot ID: 0ef64691-265d-4e12-bbbc-46a80c288f22
    47. Kernel Version: 5.4.0-192-generic
    48. OS Image: Ubuntu 20.04.6 LTS
    49. Operating System: linux
    50. Architecture: amd64
    51. Container Runtime Version: containerd://1.7.19
    52. Kubelet Version: v1.30.1
    53. Kube-Proxy Version: v1.30.1
    54. PodCIDR: 10.0.1.0/24
    55. PodCIDRs: 10.0.1.0/24
    56. Non-terminated Pods: (4 in total)
    57. Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
    58. --------- ---- ------------ ---------- --------------- ------------- ---
    59. default registry-7f7bcbd5bb-x9xg5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d20h
    60. kube-system cilium-89jvg 100m (5%) 0 (0%) 10Mi (0%) 0 (0%) 12d
    61. kube-system cilium-envoy-rk8jj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
    62. kube-system kube-proxy-pjrjm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
    63. Allocated resources:
    64. (Total limits may be over 100 percent, i.e., overcommitted.)
    65. Resource Requests Limits
    66. -------- -------- ------
    67. cpu 100m (5%) 0 (0%)
    68. memory 10Mi (0%) 0 (0%)
    69. ephemeral-storage 0 (0%) 0 (0%)
    70. hugepages-2Mi 0 (0%) 0 (0%)
    71. Events:
    72. Type Reason Age From Message
    73. ---- ------ ---- ---- -------
    74. Normal RegisteredNode 8m3s node-controller Node worker-node event: Registered Node worker-node in Controller

    kubectl get svc,ap -A
    error: the server doesn't have a resource type "ap" (probably was typo?)

  • Posts: 2,441

    Hi @alessandro.cinelli,

    Yes, it was a typo on my part... it was meant to be svc,ep for Services and Endpoints :)

    It is unclear to me why the registry Deployment shows 3 replicas, the same as the nginx Deployment we removed earlier. Perhaps a describe of those Pods reveals why they are not in Running state; then try removing the registry Deployment as well (it may require a force deletion of some of its Pods).

    It is also odd that Kubelet is not able to complete its Garbage collection cycle to reclaim disk space. What do the Kubelet logs show on the control plane node?
    journalctl -u kubelet | less

    What images are stored on the control plane node?
    sudo podman images
    sudo crictl images

    How is disk allocated to the VMs (pre-allocated full size)?

    Regards,
    -Chris

  • Hi @chrispokorni
    journalctl -u kubelet | less
    I took the last lines
    d" interval="800ms" Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.611796 684 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"fca7dd05dc29cb147dc0e5115690f4a297601411d4d8fd37919c7e9a19b4b212\": not found" podSandboxID="fca7dd05dc29cb147dc0e5115690f4a297601411d4d8fd37919c7e9a19b4b212" Aug 10 09:06:06 master-node kubelet[684]: I0810 09:06:06.613663 684 kubelet_node_status.go:73] "Attempting to register node" node="master-node" Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.613868 684 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://192.168.1.247:6443/api/v1/nodes\": dial tcp 192.168.1.247:6443: connect: connection refused" node="master-node" Aug 10 09:06:06 master-node kubelet[684]: W0810 09:06:06.681079 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Service: Get "https://192.168.1.247:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.681147 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.247:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: W0810 09:06:06.994227 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.994318 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: W0810 09:06:07.191887 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.CSIDriver: Get "https://192.168.1.247:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.199179 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://192.168.1.247:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.287256 684 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.1.247:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-node?timeout=10s\": dial tcp 192.168.1.247:6443: connect: connection refused" interval="1.6s" Aug 10 09:06:07 master-node kubelet[684]: W0810 09:06:07.292373 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.RuntimeClass: Get "https://192.168.1.247:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.292469 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://192.168.1.247:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.419159 684 kubelet_node_status.go:73] "Attempting to register node" node="master-node" Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.419953 684 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://192.168.1.247:6443/api/v1/nodes\": dial tcp 192.168.1.247:6443: connect: connection refused" node="master-node" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.587543 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-controller-manager-master-node" containerName="kube-controller-manager" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.588138 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-apiserver-master-node" containerName="kube-apiserver" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.602022 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/etcd-master-node" containerName="etcd" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.619495 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-scheduler-master-node" containerName="kube-scheduler" Aug 10 09:06:08 master-node kubelet[684]: W0810 09:06:08.636848 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:08 master-node kubelet[684]: E0810 09:06:08.636933 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name=master-node&amp;limit=500&amp;resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused
    Aug 10 09:06:08 master-node kubelet[684]: E0810 09:06:08.888865 684 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.1.247:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-node?timeout=10s\": dial tcp 192.168.1.247:6443:

  • Posts: 17
    edited August 2024

    sudo podman images

    1. WARN[0000] Using cgroups-v1 which is deprecated in favor of cgroups-v2 with Podman v5 and will be removed in a future version. Set environment variable `PODMAN_IGNORE_CGROUPSV1_WARNING` to hide this warning.
    2. REPOSITORY TAG IMAGE ID CREATED SIZE
    3. 10.97.40.62:5000/simpleapp latest ad2f4faa05bd 4 days ago 1.04 GB
    4. localhost/simpleapp latest ad2f4faa05bd 4 days ago 1.04 GB
    5. docker.io/library/python 3 0218518c77be 2 weeks ago 1.04 GB
    6. 10.97.40.62:5000/tagtest latest 324bc02ae123 4 weeks ago 8.08 MB
    7.  

    sudo crictl images

    1. IMAGE TAG IMAGE ID SIZE
    2. quay.io/cilium/cilium-envoy <none> b9d596d6e2d4f 62.1MB
    3. quay.io/cilium/cilium <none> 1e01581279341 223MB
    4. quay.io/cilium/operator-generic <none> e7e6117055af8 31.1MB
    5. registry.k8s.io/coredns/coredns v1.11.1 cbb01a7bd410d 18.2MB
    6. registry.k8s.io/etcd 3.5.12-0 3861cfcd7c04c 57.2MB
    7. registry.k8s.io/kube-apiserver v1.30.1 91be940803172 32.8MB
    8. registry.k8s.io/kube-controller-manager v1.30.1 25a1387cdab82 31.1MB
    9. registry.k8s.io/kube-proxy v1.30.1 747097150317f 29MB
    10. registry.k8s.io/kube-scheduler v1.30.1 a52dc94f0a912 19.3MB
    11. registry.k8s.io/pause 3.8 4873874c08efc 311kB

    The disk is supposed to be dynamically allocated actually.

    Thank you again

  • Posts: 2,441

    Hi @alessandro.cinelli,

    The disk is supposed to be dynamically allocated actually.

    This is your problem. Kubelet only sees the allocated disk space, not what it can receive altogether.
    Please fully allocate the disk for both VMs to prevent Kubelet panics.

    Regards,
    -Chris

  • Hi @chrispokorni,
    as you suggested, I changed the disk to pre-allocated one but still getting the same issue. Is that something else I'm supposed to do?

  • Posts: 2,441

    Hi @alessandro.cinelli,

    Have the VMs been restarted? What are the ephemeral-storage values under Capacity and Allocatable respectively when describing the 2 nodes?

    Regards,
    -Chris

  • Hi @chrispokorni ,
    yes, the VMs have been restarted.

    Worker node:

    1. Capacity:
    2. cpu: 2
    3. ephemeral-storage: 10206484Ki
    4. hugepages-2Mi: 0
    5. memory: 4014036Ki
    6. pods: 110
    7. Allocatable:
    8. cpu: 2
    9. ephemeral-storage: 9406295639
    10. hugepages-2Mi: 0
    11. memory: 3911636Ki
    12. pods: 110

    Master Node

    1. Capacity:
    2. cpu: 2
    3. ephemeral-storage: 10218772Ki
    4. hugepages-2Mi: 0
    5. memory: 8136664Ki
    6. pods: 110
    7. Allocatable:
    8. cpu: 2
    9. ephemeral-storage: 9417620260
    10. hugepages-2Mi: 0
    11. memory: 8034264Ki
    12. pods: 110

    But if I look to the latest events in master node i can see this, and probably I shouldn't.

    1. Warning InvalidDiskCapacity 5m32s kubelet invalid capacity 0 on image filesystem
    2. Normal NodeHasSufficientMemory 5m32s (x8 over 5m32s) kubelet Node master-node status is now: NodeHasSufficientMemory
    3. Normal NodeHasNoDiskPressure 5m32s (x7 over 5m32s) kubelet Node master-node status is now: NodeHasNoDiskPressure
    4. Normal NodeHasSufficientPID 5m32s (x7 over 5m32s) kubelet Node master-node status is now: NodeHasSufficientPID
    5. Normal NodeAllocatableEnforced 5m32s kubelet Updated Node Allocatable limit across pods
    6. Normal RegisteredNode 2m38s node-controller Node master-node event: Registered Node master-node in Controller
    7. Warning FreeDiskSpaceFailed 30s kubelet Failed to garbage collect required amount of images. Attempted to free 722049433 bytes, but only found 0 bytes eligible to free.
  • Hi @alessandro.cinelli,

    What is the size of the vdisk on each VM?
    According to your output, the vdisks seem to be about 10 GB each. The lab guide recommendation is 20+ GB per VM. For earlier Kubernetes releases 10 GB per VM used to be just enough to run the lab exercises - Kubernetes was requiring less disk space, and the container images were somewhat smaller in size.

    Regards,
    -Chris

  • Hi @chrispokorni
    It's actually 20 gbs for both master and worker node but still it says 10 gbs of space available.

  • Hi @alessandro.cinelli,

    Key here is what the kubelet node agent sees. If it sees 10 GB, it only works with 10 GB. It seems to be unaware of the additional 10 GB of storage.

    I'd be curious what does the regular user (non-root) see on the guest OS df -h --total.

    If the vdisks were extended after the OS installation, then this behavior would be expected, as the file system would be unaware of the additional vdisk space as well, requiring a file system resize to expand to the available vdisk size.

    Regards,
    -Chris

  • Hello @chrispokorni ,
    Thank you for helping me out again. I ended up doing all over again from scratch since there was something wrong with my worker vm and virtual drive but I finally managed to solve the issue and moving forward the remaining steps of the lab.

    Thank you

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training