Lab 4.5 stress.yaml on worker node not working
Hello,
I've been stuck for days on this one, 'cause the stress.yaml is not working on the worker node. Without editing the file with the node selector, it runs correctly on my master node, but if I put the worker node name as selector it can be deployed but the pod remains in pending.
Probably is due the fact my worker node remains in the "not-ready" state, even though from my worker node I can run all the commands against the kube-api-server without problems. On both machines I configured 8 gb of ram and 20gb of physical space.
I disabled the firewall, the ips are static, the promiscous mode is enabled and I set a single bridge adapter with "allow all" but still wasn't able to overcome this error.
If I view the logs on the worker node I get that the node was registered correctly.
Thank you
Comments
-
A cluster node in "not-ready" state as you noted, implies a cluster that has not been fully bootstrapped and/or configured, or certain readiness conditions that are no longer met as a result of unfavorable cluster events. Being able to run certain commands from the worker node against the API server is not an indication of either nodes' readiness for scheduling purposes.
To get a better picture of you cluster please provide the outputs (as code formatted text, not as screenshots) of the following commands:
kubectl get nodes -o wide kubectl get pods -A -o wide kubectl describe node cp-node-name kubectl describe node worker-node-name
Regards,
-Chris0 -
Hello Chris,
thank you for your time.
As you may see here, most of the pods are stuck both on master and worker nodeThank you
0 -
Let's cleanup your cluster by removing the
basicpod
Pod. Also remove thenginx
,try1
andstressmeout
Deployments. After their respective Pods are completely removed and resources released, please provide another output as I requested earlier from the 4 commands (as code formatted text for better readability, NOT screenshots).In addition, provide the output of
kubectl get svc,ap -A
Regards,
-Chris0 -
Hello @chrispokorni ,
I had to force delete all the pods cause they got stuck in "terminating" state for some reason.kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME master-node Ready control-plane 12d v1.30.1 192.168.1.247 <none> Ubuntu 20.04.6 LTS 5.4.0-192-generic containerd://1.7.19 worker-node NotReady <none> 12d v1.30.1 192.168.1.246 <none> Ubuntu 20.04.6 ``` LTS 5.4.0-192-generic containerd://1.7.19
kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default registry-7f7bcbd5bb-kjphz 0/1 Pending 0 6m20s <none> <none> <none> <none> default registry-7f7bcbd5bb-qlx6g 0/1 Error 0 17h <none> master-node <none> <none> default registry-7f7bcbd5bb-x9xg5 1/1 Terminating 0 3d20h 10.0.1.13 worker-node <none> <none> kube-system cilium-89jvg 1/1 Running 145 (10d ago) 12d 192.168.1.246 worker-node <none> <none> kube-system cilium-envoy-mccfq 1/1 Running 11 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system cilium-envoy-rk8jj 1/1 Running 3 (5d21h ago) 12d 192.168.1.246 worker-node <none> <none> kube-system cilium-operator-7ddc48bb97-4b69m 1/1 Running 45 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system cilium-pv27t 1/1 Running 12 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system coredns-7db6d8ff4d-jrn5s 1/1 Running 1 (7m3s ago) 17h 10.0.0.238 master-node <none> <none> kube-system coredns-7db6d8ff4d-pmp9m 1/1 Running 1 (7m3s ago) 17h 10.0.0.90 master-node <none> <none> kube-system etcd-master-node 1/1 Running 17 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system kube-apiserver-master-node 1/1 Running 16 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system kube-controller-manager-master-node 1/1 Running 42 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system kube-proxy-fqxsf 1/1 Running 11 (7m3s ago) 12d 192.168.1.247 master-node <none> <none> kube-system kube-proxy-pjrjm 1/1 Running 3 (5d21h ago) 12d 192.168.1.246 worker-node <none> <none> kube-system kube-scheduler-master-node 1/1 Running 45 (7m3s ago) 12d 192.168.1.247 master-node <none> <none>
0 -
kubectl describe node cp-node-name
Name: master-node Roles: control-plane Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=master-node kubernetes.io/os=linux node-role.kubernetes.io/control-plane= node.kubernetes.io/exclude-from-external-load-balancers= Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Fri, 09 Aug 2024 17:07:12 +0000 Taints: node.kubernetes.io/disk-pressure:NoSchedule Unschedulable: false Lease: HolderIdentity: master-node AcquireTime: <unset> RenewTime: Thu, 22 Aug 2024 13:55:45 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Fri, 09 Aug 2024 17:08:38 +0000 Fri, 09 Aug 2024 17:08:38 +0000 CiliumIsUp Cilium is running on this node MemoryPressure False Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:45:53 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure True Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:48:11 +0000 KubeletHasDiskPressure kubelet has disk pressure PIDPressure False Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:45:53 +0000 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Thu, 22 Aug 2024 13:55:35 +0000 Thu, 22 Aug 2024 13:46:04 +0000 KubeletReady kubelet is posting ready status Addresses: InternalIP: 192.168.1.247 Hostname: master-node Capacity: cpu: 2 ephemeral-storage: 10218772Ki hugepages-2Mi: 0 memory: 8136660Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 9417620260 hugepages-2Mi: 0 memory: 8034260Ki pods: 110 System Info: Machine ID: 36c8c7fa32cb4042b079d8b23e47e39b System UUID: 8a163bd9-1515-0f4b-b635-f21ee64703ac Boot ID: 3ef750e2-a915-4953-9a2c-15784d2a6cc8 Kernel Version: 5.4.0-192-generic OS Image: Ubuntu 20.04.6 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.19 Kubelet Version: v1.30.1 Kube-Proxy Version: v1.30.1 PodCIDR: 10.0.0.0/24 PodCIDRs: 10.0.0.0/24 Non-terminated Pods: (10 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- kube-system cilium-envoy-mccfq 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system cilium-operator-7ddc48bb97-4b69m 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system cilium-pv27t 100m (5%) 0 (0%) 10Mi (0%) 0 (0%) 12d kube-system coredns-7db6d8ff4d-jrn5s 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 17h kube-system coredns-7db6d8ff4d-pmp9m 100m (5%) 0 (0%) 70Mi (0%) 170Mi (2%) 17h kube-system etcd-master-node 100m (5%) 0 (0%) 100Mi (1%) 0 (0%) 12d kube-system kube-apiserver-master-node 250m (12%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-controller-manager-master-node 200m (10%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-proxy-fqxsf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-scheduler-master-node 100m (5%) 0 (0%) 0 (0%) 0 (0%) 12d Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 950m (47%) 0 (0%) memory 250Mi (3%) 340Mi (4%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Starting 7m49s kube-proxy Normal NodeHasSufficientMemory 10m (x23 over 22h) kubelet Node master-node status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 10m (x23 over 22h) kubelet Node master-node status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 10m (x23 over 22h) kubelet Node master-node status is now: NodeHasSufficientPID Warning FreeDiskSpaceFailed 9m59s kubelet Failed to garbage collect required amount of images. Attempted to free 462125465 bytes, but only found 0 bytes eligible to free. Warning ImageGCFailed 9m59s kubelet Failed to garbage collect required amount of images. Attempted to free 462125465 bytes, but only found 0 bytes eligible to free. Normal NodeReady 9m49s (x35 over 22h) kubelet Node master-node status is now: NodeReady Warning EvictionThresholdMet 9m41s kubelet Attempting to reclaim ephemeral-storage Normal Starting 7m58s kubelet Starting kubelet. Warning InvalidDiskCapacity 7m58s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory 7m58s (x8 over 7m58s) kubelet Node master-node status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 7m58s (x7 over 7m58s) kubelet Node master-node status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 7m58s (x7 over 7m58s) kubelet Node master-node status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 7m58s kubelet Updated Node Allocatable limit across pods Normal RegisteredNode 7m17s node-controller Node master-node event: Registered Node master-node in Controller Warning FreeDiskSpaceFailed 2m56s kubelet Failed to garbage collect required amount of images. Attempted to free 614336921 bytes, but only found 0 bytes eligible to free.
0 -
kubectl describe node worker-node-name
Name: worker-node Roles: <none> Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=worker-node kubernetes.io/os=linux Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///var/run/containerd/containerd.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Fri, 09 Aug 2024 17:53:28 +0000 Taints: node.kubernetes.io/unreachable:NoExecute node.cilium.io/agent-not-ready:NoSchedule node.kubernetes.io/unreachable:NoSchedule Unschedulable: false Lease: HolderIdentity: worker-node AcquireTime: <unset> RenewTime: Sun, 18 Aug 2024 20:02:33 +0000 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message ---- ------ ----------------- ------------------ ------ ------- NetworkUnavailable False Fri, 09 Aug 2024 17:54:51 +0000 Fri, 09 Aug 2024 17:54:51 +0000 CiliumIsUp Cilium is running on this node MemoryPressure Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status. DiskPressure Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status. PIDPressure Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status. Ready Unknown Sun, 18 Aug 2024 19:59:12 +0000 Sun, 18 Aug 2024 20:06:59 +0000 NodeStatusUnknown Kubelet stopped posting node status. Addresses: InternalIP: 192.168.1.246 Hostname: worker-node Capacity: cpu: 2 ephemeral-storage: 10206484Ki hugepages-2Mi: 0 memory: 4014036Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 9406295639 hugepages-2Mi: 0 memory: 3911636Ki pods: 110 System Info: Machine ID: 082e100535c54c5986ddff0a8176ab60 System UUID: ffedeeca-323b-884d-a0fe-9218f3961f9a Boot ID: 0ef64691-265d-4e12-bbbc-46a80c288f22 Kernel Version: 5.4.0-192-generic OS Image: Ubuntu 20.04.6 LTS Operating System: linux Architecture: amd64 Container Runtime Version: containerd://1.7.19 Kubelet Version: v1.30.1 Kube-Proxy Version: v1.30.1 PodCIDR: 10.0.1.0/24 PodCIDRs: 10.0.1.0/24 Non-terminated Pods: (4 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age --------- ---- ------------ ---------- --------------- ------------- --- default registry-7f7bcbd5bb-x9xg5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d20h kube-system cilium-89jvg 100m (5%) 0 (0%) 10Mi (0%) 0 (0%) 12d kube-system cilium-envoy-rk8jj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d kube-system kube-proxy-pjrjm 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 100m (5%) 0 (0%) memory 10Mi (0%) 0 (0%) ephemeral-storage 0 (0%) 0 (0%) hugepages-2Mi 0 (0%) 0 (0%) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RegisteredNode 8m3s node-controller Node worker-node event: Registered Node worker-node in Controller
kubectl get svc,ap -A
error: the server doesn't have a resource type "ap" (probably was typo?)0 -
Yes, it was a typo on my part... it was meant to be
svc,ep
for Services and EndpointsIt is unclear to me why the registry Deployment shows 3 replicas, the same as the nginx Deployment we removed earlier. Perhaps a
describe
of those Pods reveals why they are not in Running state; then try removing the registry Deployment as well (it may require a force deletion of some of its Pods).It is also odd that Kubelet is not able to complete its Garbage collection cycle to reclaim disk space. What do the Kubelet logs show on the control plane node?
journalctl -u kubelet | less
What images are stored on the control plane node?
sudo podman images
sudo crictl images
How is disk allocated to the VMs (pre-allocated full size)?
Regards,
-Chris0 -
Hi @chrispokorni
journalctl -u kubelet | less
I took the last linesd" interval="800ms" Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.611796 684 remote_runtime.go:222] "StopPodSandbox from runtime service failed" err="rpc error: code = NotFound desc = an error occurred when try to find sandbox \"fca7dd05dc29cb147dc0e5115690f4a297601411d4d8fd37919c7e9a19b4b212\": not found" podSandboxID="fca7dd05dc29cb147dc0e5115690f4a297601411d4d8fd37919c7e9a19b4b212" Aug 10 09:06:06 master-node kubelet[684]: I0810 09:06:06.613663 684 kubelet_node_status.go:73] "Attempting to register node" node="master-node" Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.613868 684 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://192.168.1.247:6443/api/v1/nodes\": dial tcp 192.168.1.247:6443: connect: connection refused" node="master-node" Aug 10 09:06:06 master-node kubelet[684]: W0810 09:06:06.681079 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Service: Get "https://192.168.1.247:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.681147 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Service: failed to list *v1.Service: Get "https://192.168.1.247:6443/api/v1/services?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: W0810 09:06:06.994227 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:06 master-node kubelet[684]: E0810 09:06:06.994318 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Node: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: W0810 09:06:07.191887 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.CSIDriver: Get "https://192.168.1.247:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.199179 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://192.168.1.247:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.287256 684 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.1.247:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-node?timeout=10s\": dial tcp 192.168.1.247:6443: connect: connection refused" interval="1.6s" Aug 10 09:06:07 master-node kubelet[684]: W0810 09:06:07.292373 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.RuntimeClass: Get "https://192.168.1.247:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.292469 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.RuntimeClass: failed to list *v1.RuntimeClass: Get "https://192.168.1.247:6443/apis/node.k8s.io/v1/runtimeclasses?limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.419159 684 kubelet_node_status.go:73] "Attempting to register node" node="master-node" Aug 10 09:06:07 master-node kubelet[684]: E0810 09:06:07.419953 684 kubelet_node_status.go:96] "Unable to register node with API server" err="Post \"https://192.168.1.247:6443/api/v1/nodes\": dial tcp 192.168.1.247:6443: connect: connection refused" node="master-node" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.587543 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-controller-manager-master-node" containerName="kube-controller-manager" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.588138 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-apiserver-master-node" containerName="kube-apiserver" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.602022 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/etcd-master-node" containerName="etcd" Aug 10 09:06:07 master-node kubelet[684]: I0810 09:06:07.619495 684 kuberuntime_container_linux.go:167] "No swap cgroup controller present" swapBehavior="" pod="kube-system/kube-scheduler-master-node" containerName="kube-scheduler" Aug 10 09:06:08 master-node kubelet[684]: W0810 09:06:08.636848 684 reflector.go:547] k8s.io/client-go/informers/factory.go:160: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name%3Dmaster-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused Aug 10 09:06:08 master-node kubelet[684]: E0810 09:06:08.636933 684 reflector.go:150] k8s.io/client-go/informers/factory.go:160: Failed to watch *v1.Nod
e: failed to list *v1.Node: Get "https://192.168.1.247:6443/api/v1/nodes?fieldSelector=metadata.name=master-node&limit=500&resourceVersion=0": dial tcp 192.168.1.247:6443: connect: connection refused
Aug 10 09:06:08 master-node kubelet[684]: E0810 09:06:08.888865 684 controller.go:145] "Failed to ensure lease exists, will retry" err="Get \"https://192.168.1.247:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/master-node?timeout=10s\": dial tcp 192.168.1.247:6443:0 -
sudo podman images
WARN[0000] Using cgroups-v1 which is deprecated in favor of cgroups-v2 with Podman v5 and will be removed in a future version. Set environment variable `PODMAN_IGNORE_CGROUPSV1_WARNING` to hide this warning. REPOSITORY TAG IMAGE ID CREATED SIZE 10.97.40.62:5000/simpleapp latest ad2f4faa05bd 4 days ago 1.04 GB localhost/simpleapp latest ad2f4faa05bd 4 days ago 1.04 GB docker.io/library/python 3 0218518c77be 2 weeks ago 1.04 GB 10.97.40.62:5000/tagtest latest 324bc02ae123 4 weeks ago 8.08 MB
sudo crictl images
IMAGE TAG IMAGE ID SIZE quay.io/cilium/cilium-envoy <none> b9d596d6e2d4f 62.1MB quay.io/cilium/cilium <none> 1e01581279341 223MB quay.io/cilium/operator-generic <none> e7e6117055af8 31.1MB registry.k8s.io/coredns/coredns v1.11.1 cbb01a7bd410d 18.2MB registry.k8s.io/etcd 3.5.12-0 3861cfcd7c04c 57.2MB registry.k8s.io/kube-apiserver v1.30.1 91be940803172 32.8MB registry.k8s.io/kube-controller-manager v1.30.1 25a1387cdab82 31.1MB registry.k8s.io/kube-proxy v1.30.1 747097150317f 29MB registry.k8s.io/kube-scheduler v1.30.1 a52dc94f0a912 19.3MB registry.k8s.io/pause 3.8 4873874c08efc 311kB
The disk is supposed to be dynamically allocated actually.
Thank you again
0 -
The disk is supposed to be dynamically allocated actually.
This is your problem. Kubelet only sees the allocated disk space, not what it can receive altogether.
Please fully allocate the disk for both VMs to prevent Kubelet panics.Regards,
-Chris0 -
Hi @chrispokorni,
as you suggested, I changed the disk to pre-allocated one but still getting the same issue. Is that something else I'm supposed to do?0 -
Have the VMs been restarted? What are the ephemeral-storage values under Capacity and Allocatable respectively when describing the 2 nodes?
Regards,
-Chris0 -
Hi @chrispokorni ,
yes, the VMs have been restarted.Worker node:
Capacity: cpu: 2 ephemeral-storage: 10206484Ki hugepages-2Mi: 0 memory: 4014036Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 9406295639 hugepages-2Mi: 0 memory: 3911636Ki pods: 110
Master Node
Capacity: cpu: 2 ephemeral-storage: 10218772Ki hugepages-2Mi: 0 memory: 8136664Ki pods: 110 Allocatable: cpu: 2 ephemeral-storage: 9417620260 hugepages-2Mi: 0 memory: 8034264Ki pods: 110
But if I look to the latest events in master node i can see this, and probably I shouldn't.
Warning InvalidDiskCapacity 5m32s kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory 5m32s (x8 over 5m32s) kubelet Node master-node status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 5m32s (x7 over 5m32s) kubelet Node master-node status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 5m32s (x7 over 5m32s) kubelet Node master-node status is now: NodeHasSufficientPID Normal NodeAllocatableEnforced 5m32s kubelet Updated Node Allocatable limit across pods Normal RegisteredNode 2m38s node-controller Node master-node event: Registered Node master-node in Controller Warning FreeDiskSpaceFailed 30s kubelet Failed to garbage collect required amount of images. Attempted to free 722049433 bytes, but only found 0 bytes eligible to free.
0 -
What is the size of the vdisk on each VM?
According to your output, the vdisks seem to be about 10 GB each. The lab guide recommendation is 20+ GB per VM. For earlier Kubernetes releases 10 GB per VM used to be just enough to run the lab exercises - Kubernetes was requiring less disk space, and the container images were somewhat smaller in size.Regards,
-Chris0 -
Hi @chrispokorni
It's actually 20 gbs for both master and worker node but still it says 10 gbs of space available.0 -
Key here is what the kubelet node agent sees. If it sees 10 GB, it only works with 10 GB. It seems to be unaware of the additional 10 GB of storage.
I'd be curious what does the regular user (non-root) see on the guest OS
df -h --total
.If the vdisks were extended after the OS installation, then this behavior would be expected, as the file system would be unaware of the additional vdisk space as well, requiring a file system resize to expand to the available vdisk size.
Regards,
-Chris0 -
Hello @chrispokorni ,
Thank you for helping me out again. I ended up doing all over again from scratch since there was something wrong with my worker vm and virtual drive but I finally managed to solve the issue and moving forward the remaining steps of the lab.Thank you
0
Categories
- All Categories
- 167 LFX Mentorship
- 219 LFX Mentorship: Linux Kernel
- 795 Linux Foundation IT Professional Programs
- 355 Cloud Engineer IT Professional Program
- 179 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 127 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 112 Express Courses - Discussion Forum
- 6.2K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 35 LFD102 Class Forum
- 227 LFD103 Class Forum
- 14 LFD110 Class Forum
- 39 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 697 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 152 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 1 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 7 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 33 LFS158 Class Forum
- 8 LFS162 Class Forum
- 1 LFS166 Class Forum
- 1 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 1 LFS178 Class Forum
- 1 LFS180 Class Forum
- 1 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 102 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 48 LFS241 Class Forum
- 42 LFS242 Class Forum
- 37 LFS243 Class Forum
- 15 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- 50 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 154 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 5 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 111 LFS260 Class Forum
- 159 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 24 LFS268 Class Forum
- 29 LFS269 Class Forum
- 1 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 9 LFW111 Class Forum
- 260 LFW211 Class Forum
- 182 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 96 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 743 Linux Distributions
- 80 Debian
- 67 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 143 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 461 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 64 Mobile Computing
- 17 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 19 Study Material
- 507 Programming and Development
- 285 Kernel Development
- 204 Software Development
- 1.8K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 309 Installation
- 97 All In Program
- 97 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)