unable go on LFS258 LAB 3.1
hi all,
i m having big problems in the lab 3.1, in the meanwhile i m settling the cluster.
after a full reset:
- sudo kubeadm reset
- sudo kubeadm init
the prompt out is:
- our Kubernetes control-plane has initialized successfully!
- To start using your cluster, you need to run the following as a regular user:
- mkdir -p $HOME/.kube
- sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
- sudo chown $(id -u):$(id -g) $HOME/.kube/config
- Alternatively, if you are the root user, you can run:
- export KUBECONFIG=/etc/kubernetes/admin.conf
- You should now deploy a pod network to the cluster.
- Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
- https://kubernetes.io/docs/concepts/cluster-administration/addons/
- Then you can join any number of worker nodes by running the following on each as root:
- kubeadm join 192.168.193.156:6443 --token
so i wrote:
- sudo su
- root@luigispi:/home/luigi# export KUBECONFIG=/etc/kubernetes/admin.conf
and after continued with the step 22:
- luigi@luigispi:~$ find $HOME -name cilium-cni.yaml
- /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml
so, applied it:
- luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml
- error: error validating "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": error validating data: failed to download openapi: Get "https://k8scp:6443/openapi/v2?timeout=32s": dial tcp 192.168.193.156:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false
hence i skipped the tests for now:
- luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml --validate=false
- error when retrieving current configuration of:
- Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount"
- Name: "cilium", Namespace: "kube-system"
- from server for: "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": Get "https://k8scp:6443/api/v1/namespaces/kube-system/serviceaccounts/cilium": dial tcp 192.168.193.156:6443: connect: connection refused
- ...
and there is no way to go on!
tried to reset,
to set again anything,
to check if the apiserver is on,
if the IPs are ok...
no firewall in the middle...
any idea to solve it?
- sudo crictl ps -a | grep kube-apiserver
- WARN[0000] Config "/etc/crictl.yaml" does not exist, trying next: "/usr/bin/crictl.yaml"
- WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
- WARN[0000] Image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
- fc8e8e0f14e1b edd0d4592f909 9 seconds ago Running kube-apiserver 3 91ee85dcd862a kube-apiserver-luigispi kube-system
- 61f63e5faf3db edd0d4592f909 About a minute ago Exited kube-apiserver 2 dba265bd82779 kube-apiserver-luigispi kube-system
Comments
-
I highly recommend respecting the commands' syntax as closely to the lab guide as possible.
After a full control plane reset execute the new init with all the options and flags as presented in the lab guide. Re-setting the kubeconfig is also essential after a new init.A concerning aspect of your setup is the IP address of the VM (possibly the control plane node) 192.168.x.y. Overlapping VM IP addresses with the Pod subnet will eventually cause routing issues within your cluster. The are two possible solutions:
1. Reconfigure the hypervisor DHCP server to use a different IP range for VMs (for example 10.200.0.0/16) and validate or edit if necessary both thekubeadm-config.yaml
andcilium-cni.yaml
to have the pod subnet set to 192.168.0.0/16
2. Leave the hypervisor DHCP as is on 192.168.x.y/z range and validate or edit if necessary both thekubeadm-config.yaml
andcilium-cni.yaml
to have the pod subnet set to 10.200.0.0/16Regards,
-Chris0 -
hi @chrispokorni ,
thank you for answering.adjust subnets
After applying the following reset procedure, i applied the subnets you suggested (10.200.0.0/16) both on cilium-cni and kubeadm-config.
but no changes in the result.
new kubeadm command
hence, inspired by similar thread i launched again the reset procedure and used the command:
- kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs
with that i been able to complete the cluster initialization and the laboratory
.
I'm attaching below the prompt collected during the troubleshooting, hopefully gonna be useful for somebody else.
If you (or somebody else) wanna spent two words for the reset procedure, would be really appreciable.
thank you for the support.
see you next problem.
Reset procedure
- sudo kubeadm reset --force
- sudo rm -rf /etc/cni/net.d
- sudo rm -rf /var/lib/etcd
- sudo rm -rf /var/lib/kubelet/*
- sudo rm -rf /etc/kubernetes/*
- rm -rf $HOME/.kube
- sudo rm -rf /etc/cni/net.d
- sudo systemctl stop kubelet
- sudo systemctl stop containerd
- sudo iptables -F
- sudo ipvsadm --clear
- sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
- sudo apt-get autoremove
- sudo ctr containers list | awk '{print $1}' | xargs -r sudo ctr containers delete
- sudo ctr snapshots list | awk '{print $1}' | xargs -r sudo ctr snapshots remove
- sudo iptables -F && sudo iptables -X
- sudo iptables -t nat -F && sudo iptables -t nat -X
- sudo iptables -t raw -F && sudo iptables -t raw -X
- sudo iptables -t mangle -F && sudo iptables -t mangle -X
- sudo rm -rf /var/lib/docker /etc/docker /var/run/docker.sock
- sudo rm -rf /var/lib/containerd
- sudo rm -rf /run/containerd/containerd.sock
- sudo rm -rf /run/docker.sock
- rm -rf ~/.kube
- rm -rf ~/.docker
prompts from subnets adjusting
kubeadm-config.yaml
substitute "10.200.0.0/16" instead of "192.168.0.0/16"
in /root/kubeadm-config.yaml- apiVersion: kubeadm.k8s.io/v1beta4
- kind: ClusterConfiguration
- kubernetesVersion: 1.32.1 # <-- Use the word stable for newest version
- controlPlaneEndpoint: "k8scp:6443" #<-- Use the alias we put in /etc/hosts not >
- networking:
- podSubnet: 10.200.0.0/16 #192.168.0.0/16
cilium-cni.yaml
substitute "10.200.0.0/16" instead of "192.168.0.0/16"
in /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml- hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
- hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
- ipam: "cluster-pool"
- ipam-cilium-node-update-rate: "15s"
- cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16"
- cluster-pool-ipv4-mask-size: "24"
- egress-gateway-reconciliation-trigger-interval: "1s"
- enable-vtep: "false"
0 -
You seem to be using conflicting configuration details in your commands.
What course are you enrolled in? Is it LFS258 Kubernetes Fundamentals or LFD259 Kubernetes for Developers?
Also, what hypervisor are you using? How many network interfaces are set per VM (and what types)? Is all inbound traffic allowed to the VMs by the hypervisor firewall (promiscuous mode enabled)?
Regards,
-Chris0 -
Good morning @chrispokorni ,
Actually enrolled in LFS258 Kubernetes Fundamentals,
I'm running anything on a raspberry pi 5 (16GB ram + 512 GB ssd) with ubuntu on it.
Thank you for the support
0 -
The content has not been tested on Raspberry pi, so I cannot comment on any issues or limitations that may be related to the hardware. The content was tested on cloud VMs - Google cloud GCE, AWS EC2, Azure VM, DigitalOcean Droplet; and local VMs provisioned through VirtualBox, QEMU/KVM, VMware Workstation/Player. Each system (server, VM, etc...) that acts as a Kubernetes node needs 2 CPUs, 8 GB RAM, and 20+ GB disk. While you may be able to work with less RAM (4 or 6 GB) this may slow down your cluster.
However, I highly recommend following the installation and configuration instructions from the LFS258 lab guide. From your notes it seems you are mixing in external content, from other sources, leading to an inconsistent environment.
More specifically:- After setting the desired pod subnet (10.200.0.0/16) in the
kubeadm-config.yaml
manifest, use theinit
command as it is shown in the lab guide, step 20. The way you customized the command defeats the purpose of the earlier prep step, and subsequent configuration. - Make sure that
k8scp
is only configured as an alias for the control plane node, not the actual hostname. - Edit the
cilium-cni.yaml
manifest to use the same pod CIDR (10.200.0.0/16) set earlier for theinit
phase. - Mixing in steps from other threads with the instructions from the lab guide may introduce conflicting configuration in your setup, much more difficult to troubleshoot.
Regards,
-Chris0 - After setting the desired pod subnet (10.200.0.0/16) in the
-
Hi @chrispokorni ,
Resume of today's troubleshooting in the intent of grow the cluster (LAB3.2):
at new power on of the machine i found:- kubelet down
- Api-server down
- etcd down
after some troubleshooting, i found the swap on.
so, resettledswapoff -a
, now anything (etcd/api-server/kubelet) on and ready to work.i been learning and reading all the commands used for that troubleshooting.
will continue tomorrow with the lab.Will follow some prompt for checks:
hostname
- root@luigispi:~# hostname
- luigispi
Node up
- luigi@luigispi:~$ sudo kubectl get nodes
- NAME STATUS ROLES AGE VERSION
- k8scp Ready control-plane 26h v1.32.1
checking yaml files
cilium-cni.yaml
- [...]
- ipam: "cluster-pool"
- ipam-cilium-node-update-rate: "15s"
- cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16"
- [...]
kubeadm-config.yaml
- root@luigispi:~# cat kubeadm-config.yaml
- apiVersion: kubeadm.k8s.io/v1beta4
- kind: ClusterConfiguration
- kubernetesVersion: 1.32.1
- controlPlaneEndpoint: "k8scp:6443"
- networking:
- podSubnet: 10.200.0.0/16 # instead of 192.168.0.0/16
Conclusions
That troubleshooting is definitely interesting and will improve the knowledge retain and understanding,
but i m not sure will continue on the single node (same physical machine) cluster,
it takes a lot of time to troubleshoot.What do you suggests?
Better follow the Labs and move on GCE?Thank you for the support.
0 -
The
swapoff -a
can be found in step 7 of the lab guide. Please follow carefully the exercises as they are presented in the guide.On a different note, I still see
k8scp
used as a node name, instead of an alias. Runningkubectl
withsudo
(or asroot
) is not recommended, and it is not how it is presented in the lab guide.What are the outputs of:
- kubectl get nodes -o wide
- kubectl get pods -A -o wide
Regards,
-Chris0 -
Hi @chrispokorni ,
here the results of today troubleshooting, with the help of chatgpt (probably gonna move to claude soon).At power on of the machine, etcd was again down
so, i settled a 3 terminals view (down the promt out):
- terminal 1: kubectl get pods -A -o wide
- terminal 2: crictl events
- terminal 3: sudo lsof -i :6443
i saw the etcd container go up and down,
SYN_SENT on port 6443 and timeout during dial tcp 192.168.105.156:6443.so i started a post mortem analysis, etcd exited, containerd collected the garbage (
) and crashloopbackoff reported from jounalctl.
so, diving in the etcd manifest i found:
- [...]
- - --initial-advertise-peer-urls=https://192.168.105.156:2380
- - --initial-cluster=k8scp=https://192.168.105.156:2380
- - --key-file=/etc/kubernetes/pki/etcd/server.key
- - --listen-client-urls=https://127.0.0.1:2379,https://192.168.105.156:2379
- - --listen-metrics-urls=http://127.0.0.1:2381
- - --listen-peer-urls=https://192.168.105.156:2380
- [...]
but,
ip a | grep inet
returned:- [...]
- inet 192.168.237.156/24 brd 192.168.237.255 scope global dynamic noprefixroute wlan0
- [...]
hence, seems i have problems with ip addressing of all the jazz...
moreover, i started the cluster (see above) with:
- kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs
not sure about that,
but my opinion is that etcd is looking for 192.168.105.156 but it does not answer because of new ip.
Am i right?
Am i missing something else?In the attached file 'prompt-out-commands.txt' you can find all the output of the following commands :
- for the 3 terminals view:
- kubectl get pods -A -o wide
- sudo crictl events
- sudo lsof -i :6443
- for the post mortem analysis:
- sudo crictl ps -a | grep etcd
- sudo crictl logs 45d40286716af
- sudo journalctl -u kubelet -xe | grep etcd
- sudo cat /etc/kubernetes/manifests/etcd.yaml
- ip a | grep inet
0 -
Consistency in the control plane endpoint is key in a Kubernetes cluster. A static IP address assigned to the instances acting as Kubernetes nodes and/or a correctly applied
init
command should fix some of the issues encountered.
So far theinit
commands applied seem to be consistently incorrect - the IP address is not the one recommended to you earlier, the node name is not the one from the lab guide (but the alias that should be just that - an alias), and there's the absence of the kubeadm-config.yaml manifest. This consistently yields improperly configured Kubernetes control plane components.
I would recommend starting from scratch, following the lab guide and applying the earlier suggestions and recommendations that were provided to you in this thread.The purpose of lab 3 is to help the learner effortlessly initialize a working Kubernetes cluster. For this course, please avoid custom approaches to cluster initialization, as their troubleshooting is out of scope.
Regards,
-Chris0 -
Hi @chrispokorni ,
Sorry for the late answer...
Thank you for the support,
Definitely got your point in the reapply anything...
Probably the static IP and the correct init procedure would be enough to start the cluster.But, just decided that will post pone the implementation in the single node.
Really interesting the troubleshooting and the "side" understanding,
But will need tons of time more,
and actually i'm running a bit out of it.To close the post,
I also found online a guide about a single node rpi implementation,
hoping gonna be useful for somebody link here.See you next problem!
take care...0 -
Keep in mind that the course was designed for a multi-node Kubernetes cluster to showcase the complexity of the networking, and the implementation of load-balancing and high-availability for application and control plane fault tolerance in a distributed multi-system environment.
Regards,
-Chris0 -
hi @chrispokorni ,
All good here.
VMs up and nginx running a lot
i been able to complete all the labs from chapter 3.
just one remark, i was not able to access to the control-plane' nginx from outside (point 6 of LAB 3.5).instead, i was able to curl from localhost and from the worker node (using the clusterIP).
having a look on the web, seems that the GCP firewall is blocking the request,
so i tried a port-forward, but no luck.
any suggestions?
curl from control-plane
- lc@control-plane-1-lfs258-n2:~$ curl localhost:32341
- <!DOCTYPE html>
- <html>
- <head>
- <title>Welcome to nginx!</title>
- <style>
- html { color-scheme: light dark; }
- body { width: 35em; margin: 0 auto;
- font-family: Tahoma, Verdana, Arial, sans-serif; }
- </style>
- </head>
- <body>
- <h1>Welcome to nginx!</h1>
- <p>If you see this page, the nginx web server is successfully installed and
- working. Further configuration is required.</p>
- <p>For online documentation and support please refer to
- <a href="http://nginx.org/">nginx.org</a>.<br/>
- Commercial support is available at
- <a href="http://nginx.com/">nginx.com</a>.</p>
- <p><em>Thank you for using nginx.</em></p>
- </body>
- </html>
curl from worker node
- lc@wn1-lfs258:~$ curl 10.198.0.2:32341
- <!DOCTYPE html>
- <html>
- <head>
- <title>Welcome to nginx!</title>
- <style>
- html { color-scheme: light dark; }
- body { width: 35em; margin: 0 auto;
- font-family: Tahoma, Verdana, Arial, sans-serif; }
- </style>
- </head>
- <body>
- <h1>Welcome to nginx!</h1>
- <p>If you see this page, the nginx web server is successfully installed and
- working. Further configuration is required.</p>
- <p>For online documentation and support please refer to
- <a href="http://nginx.org/">nginx.org</a>.<br/>
- Commercial support is available at
- <a href="http://nginx.com/">nginx.com</a>.</p>
- <p><em>Thank you for using nginx.</em></p>
- </body>
port forwarding
- kubectl port-forward service/nginx 32341:80
- Forwarding from 127.0.0.1:32341 -> 80
- Forwarding from [::1]:32341 -> 80
0 -
For a simple VPC firewall configuration I recommend watching the demo video for GCE from the "Course Introduction" chapter.
Regards,
-Chris0
Categories
- All Categories
- 153 LFX Mentorship
- 153 LFX Mentorship: Linux Kernel
- 839 Linux Foundation IT Professional Programs
- 383 Cloud Engineer IT Professional Program
- 185 Advanced Cloud Engineer IT Professional Program
- 86 DevOps Engineer IT Professional Program
- 154 Cloud Native Developer IT Professional Program
- 151 Express Training Courses & Microlearning
- 149 Express Courses - Discussion Forum
- 2 Microlearning - Discussion Forum
- 6.9K Training Courses
- 49 LFC110 Class Forum - Discontinued
- 74 LFC131 Class Forum
- 55 LFD102 Class Forum
- 248 LFD103 Class Forum
- 25 LFD110 Class Forum
- 49 LFD121 Class Forum
- 2 LFD123 Class Forum
- 1 LFD125 Class Forum
- 19 LFD133 Class Forum
- 10 LFD134 Class Forum
- 19 LFD137 Class Forum
- 1 LFD140 Class Forum
- 73 LFD201 Class Forum
- 8 LFD210 Class Forum
- 6 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 3 LFD233 Class Forum
- 5 LFD237 Class Forum
- 25 LFD254 Class Forum
- 733 LFD259 Class Forum
- 111 LFD272 Class Forum - Discontinued
- 4 LFD272-JP クラス フォーラム - Discontinued
- 15 LFD273 Class Forum
- 355 LFS101 Class Forum
- 3 LFS111 Class Forum
- 4 LFS112 Class Forum
- 4 LFS116 Class Forum
- 9 LFS118 Class Forum
- 2 LFS120 Class Forum
- 11 LFS142 Class Forum
- 9 LFS144 Class Forum
- 5 LFS145 Class Forum
- 6 LFS146 Class Forum
- 5 LFS147 Class Forum
- 20 LFS148 Class Forum
- 17 LFS151 Class Forum
- 6 LFS157 Class Forum
- 78 LFS158 Class Forum
- 1 LFS158-JP クラス フォーラム
- 13 LFS162 Class Forum
- 2 LFS166 Class Forum - Discontinued
- 8 LFS167 Class Forum
- 4 LFS170 Class Forum
- 2 LFS171 Class Forum - Discontinued
- 4 LFS178 Class Forum - Discontinued
- 4 LFS180 Class Forum
- 3 LFS182 Class Forum
- 6 LFS183 Class Forum
- 1 LFS184 Class Forum
- 36 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム - Discontinued
- 22 LFS203 Class Forum
- 141 LFS207 Class Forum
- 3 LFS207-DE-Klassenforum
- 3 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum - Discontinued
- 56 LFS216 Class Forum - Discontinued
- 58 LFS241 Class Forum
- 51 LFS242 Class Forum
- 39 LFS243 Class Forum
- 17 LFS244 Class Forum
- 7 LFS245 Class Forum
- 1 LFS246 Class Forum
- 1 LFS248 Class Forum
- 120 LFS250 Class Forum
- 3 LFS250-JP クラス フォーラム
- 2 LFS251 Class Forum
- 160 LFS253 Class Forum
- 1 LFS254 Class Forum - Discontinued
- 3 LFS255 Class Forum
- 14 LFS256 Class Forum
- 2 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 12 LFS258-JP クラス フォーラム
- 139 LFS260 Class Forum
- 165 LFS261 Class Forum
- 44 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 25 LFS267 Class Forum
- 26 LFS268 Class Forum
- 38 LFS269 Class Forum
- 11 LFS270 Class Forum
- 202 LFS272 Class Forum - Discontinued
- 2 LFS272-JP クラス フォーラム - Discontinued
- 2 LFS274 Class Forum - Discontinued
- 4 LFS281 Class Forum - Discontinued
- 30 LFW111 Class Forum
- 263 LFW211 Class Forum
- 187 LFW212 Class Forum
- 16 SKF100 Class Forum
- 2 SKF200 Class Forum
- 3 SKF201 Class Forum
- 799 Hardware
- 200 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 175 Networking
- 91 Printers & Scanners
- 85 Storage
- 763 Linux Distributions
- 82 Debian
- 67 Fedora
- 18 Linux Mint
- 13 Mageia
- 23 openSUSE
- 149 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 472 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 96 Linux Security
- 78 Network Management
- 102 System Management
- 48 Web Management
- 71 Mobile Computing
- 19 Android
- 39 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 382 Off Topic
- 116 Introductions
- 178 Small Talk
- 27 Study Material
- 814 Programming and Development
- 307 Kernel Development
- 489 Software Development
- 1.8K Software
- 263 Applications
- 183 Command Line
- 4 Compiling/Installing
- 988 Games
- 317 Installation
- 106 All In Program
- 106 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)