unable go on LFS258 LAB 3.1
hi all,
i m having big problems in the lab 3.1, in the meanwhile i m settling the cluster.
after a full reset:
sudo kubeadm reset sudo kubeadm init
the prompt out is:
our Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config Alternatively, if you are the root user, you can run: export KUBECONFIG=/etc/kubernetes/admin.conf You should now deploy a pod network to the cluster. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: https://kubernetes.io/docs/concepts/cluster-administration/addons/ Then you can join any number of worker nodes by running the following on each as root: kubeadm join 192.168.193.156:6443 --token
so i wrote:
sudo su root@luigispi:/home/luigi# export KUBECONFIG=/etc/kubernetes/admin.conf
and after continued with the step 22:
luigi@luigispi:~$ find $HOME -name cilium-cni.yaml /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml
so, applied it:
luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml error: error validating "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": error validating data: failed to download openapi: Get "https://k8scp:6443/openapi/v2?timeout=32s": dial tcp 192.168.193.156:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
hence i skipped the tests for now:
luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml --validate=false error when retrieving current configuration of: Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount" Name: "cilium", Namespace: "kube-system" from server for: "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": Get "https://k8scp:6443/api/v1/namespaces/kube-system/serviceaccounts/cilium": dial tcp 192.168.193.156:6443: connect: connection refused ...
and there is no way to go on!
tried to reset,
to set again anything,
to check if the apiserver is on,
if the IPs are ok...
no firewall in the middle...
any idea to solve it?
sudo crictl ps -a | grep kube-apiserver WARN[0000] Config "/etc/crictl.yaml" does not exist, trying next: "/usr/bin/crictl.yaml" WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. WARN[0000] Image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. fc8e8e0f14e1b edd0d4592f909 9 seconds ago Running kube-apiserver 3 91ee85dcd862a kube-apiserver-luigispi kube-system 61f63e5faf3db edd0d4592f909 About a minute ago Exited kube-apiserver 2 dba265bd82779 kube-apiserver-luigispi kube-system
Comments
-
I highly recommend respecting the commands' syntax as closely to the lab guide as possible.
After a full control plane reset execute the new init with all the options and flags as presented in the lab guide. Re-setting the kubeconfig is also essential after a new init.A concerning aspect of your setup is the IP address of the VM (possibly the control plane node) 192.168.x.y. Overlapping VM IP addresses with the Pod subnet will eventually cause routing issues within your cluster. The are two possible solutions:
1. Reconfigure the hypervisor DHCP server to use a different IP range for VMs (for example 10.200.0.0/16) and validate or edit if necessary both thekubeadm-config.yamlandcilium-cni.yamlto have the pod subnet set to 192.168.0.0/16
2. Leave the hypervisor DHCP as is on 192.168.x.y/z range and validate or edit if necessary both thekubeadm-config.yamlandcilium-cni.yamlto have the pod subnet set to 10.200.0.0/16Regards,
-Chris0 -
hi @chrispokorni ,
thank you for answering.adjust subnets
After applying the following reset procedure, i applied the subnets you suggested (10.200.0.0/16) both on cilium-cni and kubeadm-config.
but no changes in the result
.new kubeadm command
hence, inspired by similar thread i launched again the reset procedure and used the command:
kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs
with that i been able to complete the cluster initialization and the laboratory
.I'm attaching below the prompt collected during the troubleshooting, hopefully gonna be useful for somebody else.
If you (or somebody else) wanna spent two words for the reset procedure, would be really appreciable.
thank you for the support.
see you next problem.
Reset procedure
sudo kubeadm reset --force sudo rm -rf /etc/cni/net.d sudo rm -rf /var/lib/etcd sudo rm -rf /var/lib/kubelet/* sudo rm -rf /etc/kubernetes/* rm -rf $HOME/.kube sudo rm -rf /etc/cni/net.d sudo systemctl stop kubelet sudo systemctl stop containerd sudo iptables -F sudo ipvsadm --clear sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube* sudo apt-get autoremove sudo ctr containers list | awk '{print $1}' | xargs -r sudo ctr containers delete sudo ctr snapshots list | awk '{print $1}' | xargs -r sudo ctr snapshots remove sudo iptables -F && sudo iptables -X sudo iptables -t nat -F && sudo iptables -t nat -X sudo iptables -t raw -F && sudo iptables -t raw -X sudo iptables -t mangle -F && sudo iptables -t mangle -X sudo rm -rf /var/lib/docker /etc/docker /var/run/docker.sock sudo rm -rf /var/lib/containerd sudo rm -rf /run/containerd/containerd.sock sudo rm -rf /run/docker.sock rm -rf ~/.kube rm -rf ~/.dockerprompts from subnets adjusting
kubeadm-config.yaml
substitute "10.200.0.0/16" instead of "192.168.0.0/16"
in /root/kubeadm-config.yamlapiVersion: kubeadm.k8s.io/v1beta4 kind: ClusterConfiguration kubernetesVersion: 1.32.1 # <-- Use the word stable for newest version controlPlaneEndpoint: "k8scp:6443" #<-- Use the alias we put in /etc/hosts not > networking: podSubnet: 10.200.0.0/16 #192.168.0.0/16
cilium-cni.yaml
substitute "10.200.0.0/16" instead of "192.168.0.0/16"
in /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yamlhubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt ipam: "cluster-pool" ipam-cilium-node-update-rate: "15s" cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16" cluster-pool-ipv4-mask-size: "24" egress-gateway-reconciliation-trigger-interval: "1s" enable-vtep: "false"
0 -
You seem to be using conflicting configuration details in your commands.
What course are you enrolled in? Is it LFS258 Kubernetes Fundamentals or LFD259 Kubernetes for Developers?
Also, what hypervisor are you using? How many network interfaces are set per VM (and what types)? Is all inbound traffic allowed to the VMs by the hypervisor firewall (promiscuous mode enabled)?
Regards,
-Chris0 -
Good morning @chrispokorni ,
Actually enrolled in LFS258 Kubernetes Fundamentals,
I'm running anything on a raspberry pi 5 (16GB ram + 512 GB ssd) with ubuntu on it.
Thank you for the support
0 -
The content has not been tested on Raspberry pi, so I cannot comment on any issues or limitations that may be related to the hardware. The content was tested on cloud VMs - Google cloud GCE, AWS EC2, Azure VM, DigitalOcean Droplet; and local VMs provisioned through VirtualBox, QEMU/KVM, VMware Workstation/Player. Each system (server, VM, etc...) that acts as a Kubernetes node needs 2 CPUs, 8 GB RAM, and 20+ GB disk. While you may be able to work with less RAM (4 or 6 GB) this may slow down your cluster.
However, I highly recommend following the installation and configuration instructions from the LFS258 lab guide. From your notes it seems you are mixing in external content, from other sources, leading to an inconsistent environment.
More specifically:- After setting the desired pod subnet (10.200.0.0/16) in the
kubeadm-config.yamlmanifest, use theinitcommand as it is shown in the lab guide, step 20. The way you customized the command defeats the purpose of the earlier prep step, and subsequent configuration. - Make sure that
k8scpis only configured as an alias for the control plane node, not the actual hostname. - Edit the
cilium-cni.yamlmanifest to use the same pod CIDR (10.200.0.0/16) set earlier for theinitphase. - Mixing in steps from other threads with the instructions from the lab guide may introduce conflicting configuration in your setup, much more difficult to troubleshoot.
Regards,
-Chris0 - After setting the desired pod subnet (10.200.0.0/16) in the
-
Hi @chrispokorni ,
Resume of today's troubleshooting in the intent of grow the cluster (LAB3.2):
at new power on of the machine i found:- kubelet down
- Api-server down
- etcd down
after some troubleshooting, i found the swap on.
so, resettledswapoff -a, now anything (etcd/api-server/kubelet) on and ready to work.i been learning and reading all the commands used for that troubleshooting.
will continue tomorrow with the lab.Will follow some prompt for checks:
hostname
root@luigispi:~# hostname luigispi
Node up
luigi@luigispi:~$ sudo kubectl get nodes NAME STATUS ROLES AGE VERSION k8scp Ready control-plane 26h v1.32.1
checking yaml files
cilium-cni.yaml
[...] ipam: "cluster-pool" ipam-cilium-node-update-rate: "15s" cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16" [...]
kubeadm-config.yaml
root@luigispi:~# cat kubeadm-config.yaml apiVersion: kubeadm.k8s.io/v1beta4 kind: ClusterConfiguration kubernetesVersion: 1.32.1 controlPlaneEndpoint: "k8scp:6443" networking: podSubnet: 10.200.0.0/16 # instead of 192.168.0.0/16
Conclusions
That troubleshooting is definitely interesting and will improve the knowledge retain and understanding,
but i m not sure will continue on the single node (same physical machine) cluster,
it takes a lot of time to troubleshoot.What do you suggests?
Better follow the Labs and move on GCE?Thank you for the support.
0 -
The
swapoff -acan be found in step 7 of the lab guide. Please follow carefully the exercises as they are presented in the guide.On a different note, I still see
k8scpused as a node name, instead of an alias. Runningkubectlwithsudo(or asroot) is not recommended, and it is not how it is presented in the lab guide.What are the outputs of:
kubectl get nodes -o wide kubectl get pods -A -o wide
Regards,
-Chris0 -
Hi @chrispokorni ,
here the results of today troubleshooting, with the help of chatgpt (probably gonna move to claude soon).At power on of the machine, etcd was again down

so, i settled a 3 terminals view (down the promt out):
- terminal 1: kubectl get pods -A -o wide
- terminal 2: crictl events
- terminal 3: sudo lsof -i :6443
i saw the etcd container go up and down,
SYN_SENT on port 6443 and timeout during dial tcp 192.168.105.156:6443.so i started a post mortem analysis, etcd exited, containerd collected the garbage (
) and crashloopbackoff reported from jounalctl.so, diving in the etcd manifest i found:
[...] - --initial-advertise-peer-urls=https://192.168.105.156:2380 - --initial-cluster=k8scp=https://192.168.105.156:2380 - --key-file=/etc/kubernetes/pki/etcd/server.key - --listen-client-urls=https://127.0.0.1:2379,https://192.168.105.156:2379 - --listen-metrics-urls=http://127.0.0.1:2381 - --listen-peer-urls=https://192.168.105.156:2380 [...]but,
ip a | grep inetreturned:[...] inet 192.168.237.156/24 brd 192.168.237.255 scope global dynamic noprefixroute wlan0 [...]hence, seems i have problems with ip addressing of all the jazz...
moreover, i started the cluster (see above) with:
kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs
not sure about that,
but my opinion is that etcd is looking for 192.168.105.156 but it does not answer because of new ip.
Am i right?
Am i missing something else?In the attached file 'prompt-out-commands.txt' you can find all the output of the following commands :
- for the 3 terminals view:
kubectl get pods -A -o wide sudo crictl events sudo lsof -i :6443
- for the post mortem analysis:
sudo crictl ps -a | grep etcd sudo crictl logs 45d40286716af sudo journalctl -u kubelet -xe | grep etcd sudo cat /etc/kubernetes/manifests/etcd.yaml ip a | grep inet
0 -
Consistency in the control plane endpoint is key in a Kubernetes cluster. A static IP address assigned to the instances acting as Kubernetes nodes and/or a correctly applied
initcommand should fix some of the issues encountered.
So far theinitcommands applied seem to be consistently incorrect - the IP address is not the one recommended to you earlier, the node name is not the one from the lab guide (but the alias that should be just that - an alias), and there's the absence of the kubeadm-config.yaml manifest. This consistently yields improperly configured Kubernetes control plane components.
I would recommend starting from scratch, following the lab guide and applying the earlier suggestions and recommendations that were provided to you in this thread.The purpose of lab 3 is to help the learner effortlessly initialize a working Kubernetes cluster. For this course, please avoid custom approaches to cluster initialization, as their troubleshooting is out of scope.
Regards,
-Chris0 -
Hi @chrispokorni ,
Sorry for the late answer...
Thank you for the support,
Definitely got your point in the reapply anything...
Probably the static IP and the correct init procedure would be enough to start the cluster.But, just decided that will post pone the implementation in the single node.
Really interesting the troubleshooting and the "side" understanding,
But will need tons of time more,
and actually i'm running a bit out of it.To close the post,
I also found online a guide about a single node rpi implementation,
hoping gonna be useful for somebody link here.See you next problem!
take care...0 -
Keep in mind that the course was designed for a multi-node Kubernetes cluster to showcase the complexity of the networking, and the implementation of load-balancing and high-availability for application and control plane fault tolerance in a distributed multi-system environment.
Regards,
-Chris0 -
hi @chrispokorni ,
All good here.
VMs up and nginx running a lot
i been able to complete all the labs from chapter 3.
just one remark, i was not able to access to the control-plane' nginx from outside (point 6 of LAB 3.5).instead, i was able to curl from localhost and from the worker node (using the clusterIP).
having a look on the web, seems that the GCP firewall is blocking the request,
so i tried a port-forward, but no luck.
any suggestions?
curl from control-plane
lc@control-plane-1-lfs258-n2:~$ curl localhost:32341 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body> </html>curl from worker node
lc@wn1-lfs258:~$ curl 10.198.0.2:32341 <!DOCTYPE html> <html> <head> <title>Welcome to nginx!</title> <style> html { color-scheme: light dark; } body { width: 35em; margin: 0 auto; font-family: Tahoma, Verdana, Arial, sans-serif; } </style> </head> <body> <h1>Welcome to nginx!</h1> <p>If you see this page, the nginx web server is successfully installed and working. Further configuration is required.</p> <p>For online documentation and support please refer to <a href="http://nginx.org/">nginx.org</a>.<br/> Commercial support is available at <a href="http://nginx.com/">nginx.com</a>.</p> <p><em>Thank you for using nginx.</em></p> </body>port forwarding
kubectl port-forward service/nginx 32341:80 Forwarding from 127.0.0.1:32341 -> 80 Forwarding from [::1]:32341 -> 80
0 -
For a simple VPC firewall configuration I recommend watching the demo video for GCE from the "Course Introduction" chapter.
Regards,
-Chris0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 3 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 1 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 983 Software
- 375 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)