Welcome to the Linux Foundation Forum!

unable go on LFS258 LAB 3.1

Posts: 7
edited July 16 in LFS258 Class Forum

hi all,
i m having big problems in the lab 3.1, in the meanwhile i m settling the cluster.
after a full reset:

  1. sudo kubeadm reset
  2. sudo kubeadm init

the prompt out is:

  1. our Kubernetes control-plane has initialized successfully!
  2. To start using your cluster, you need to run the following as a regular user:
  3. mkdir -p $HOME/.kube
  4. sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  5. sudo chown $(id -u):$(id -g) $HOME/.kube/config
  6. Alternatively, if you are the root user, you can run:
  7. export KUBECONFIG=/etc/kubernetes/admin.conf
  8. You should now deploy a pod network to the cluster.
  9. Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  10. https://kubernetes.io/docs/concepts/cluster-administration/addons/
  11. Then you can join any number of worker nodes by running the following on each as root:
  12. kubeadm join 192.168.193.156:6443 --token

so i wrote:

  1. sudo su
  2. root@luigispi:/home/luigi# export KUBECONFIG=/etc/kubernetes/admin.conf

and after continued with the step 22:

  1. luigi@luigispi:~$ find $HOME -name cilium-cni.yaml
  2. /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

so, applied it:

  1. luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml
  2. error: error validating "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": error validating data: failed to download openapi: Get "https://k8scp:6443/openapi/v2?timeout=32s": dial tcp 192.168.193.156:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false

hence i skipped the tests for now:

  1. luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml --validate=false
  2. error when retrieving current configuration of:
  3. Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount"
  4. Name: "cilium", Namespace: "kube-system"
  5. from server for: "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": Get "https://k8scp:6443/api/v1/namespaces/kube-system/serviceaccounts/cilium": dial tcp 192.168.193.156:6443: connect: connection refused
  6.  
  7. ...
  8.  

and there is no way to go on!

tried to reset,
to set again anything,
to check if the apiserver is on,
if the IPs are ok...
no firewall in the middle...

any idea to solve it?

  1. sudo crictl ps -a | grep kube-apiserver
  2. WARN[0000] Config "/etc/crictl.yaml" does not exist, trying next: "/usr/bin/crictl.yaml"
  3. WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
  4. WARN[0000] Image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.
  5. fc8e8e0f14e1b edd0d4592f909 9 seconds ago Running kube-apiserver 3 91ee85dcd862a kube-apiserver-luigispi kube-system
  6. 61f63e5faf3db edd0d4592f909 About a minute ago Exited kube-apiserver 2 dba265bd82779 kube-apiserver-luigispi kube-system

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 2,483

    Hi @luigicucciolillo,

    I highly recommend respecting the commands' syntax as closely to the lab guide as possible.
    After a full control plane reset execute the new init with all the options and flags as presented in the lab guide. Re-setting the kubeconfig is also essential after a new init.

    A concerning aspect of your setup is the IP address of the VM (possibly the control plane node) 192.168.x.y. Overlapping VM IP addresses with the Pod subnet will eventually cause routing issues within your cluster. The are two possible solutions:
    1. Reconfigure the hypervisor DHCP server to use a different IP range for VMs (for example 10.200.0.0/16) and validate or edit if necessary both the kubeadm-config.yaml and cilium-cni.yaml to have the pod subnet set to 192.168.0.0/16
    2. Leave the hypervisor DHCP as is on 192.168.x.y/z range and validate or edit if necessary both the kubeadm-config.yaml and cilium-cni.yaml to have the pod subnet set to 10.200.0.0/16

    Regards,
    -Chris

  • hi @chrispokorni ,
    thank you for answering.

    adjust subnets

    After applying the following reset procedure, i applied the subnets you suggested (10.200.0.0/16) both on cilium-cni and kubeadm-config.
    but no changes in the result :( .

    new kubeadm command

    hence, inspired by similar thread i launched again the reset procedure and used the command:

    1. kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs

    with that i been able to complete the cluster initialization and the laboratory :) .

    I'm attaching below the prompt collected during the troubleshooting, hopefully gonna be useful for somebody else.

    If you (or somebody else) wanna spent two words for the reset procedure, would be really appreciable.

    thank you for the support.

    see you next problem.

    Reset procedure

    1. sudo kubeadm reset --force
    2. sudo rm -rf /etc/cni/net.d
    3. sudo rm -rf /var/lib/etcd
    4. sudo rm -rf /var/lib/kubelet/*
    5. sudo rm -rf /etc/kubernetes/*
    6. rm -rf $HOME/.kube
    7. sudo rm -rf /etc/cni/net.d
    8. sudo systemctl stop kubelet
    9. sudo systemctl stop containerd
    10. sudo iptables -F
    11. sudo ipvsadm --clear
    12. sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
    13. sudo apt-get autoremove
    14. sudo ctr containers list | awk '{print $1}' | xargs -r sudo ctr containers delete
    15. sudo ctr snapshots list | awk '{print $1}' | xargs -r sudo ctr snapshots remove
    16. sudo iptables -F && sudo iptables -X
    17. sudo iptables -t nat -F && sudo iptables -t nat -X
    18. sudo iptables -t raw -F && sudo iptables -t raw -X
    19. sudo iptables -t mangle -F && sudo iptables -t mangle -X
    20. sudo rm -rf /var/lib/docker /etc/docker /var/run/docker.sock
    21. sudo rm -rf /var/lib/containerd
    22. sudo rm -rf /run/containerd/containerd.sock
    23. sudo rm -rf /run/docker.sock
    24. rm -rf ~/.kube
    25. rm -rf ~/.docker

    prompts from subnets adjusting

    kubeadm-config.yaml

    substitute "10.200.0.0/16" instead of "192.168.0.0/16"
    in /root/kubeadm-config.yaml

    1. apiVersion: kubeadm.k8s.io/v1beta4
    2. kind: ClusterConfiguration
    3. kubernetesVersion: 1.32.1 # <-- Use the word stable for newest version
    4. controlPlaneEndpoint: "k8scp:6443" #<-- Use the alias we put in /etc/hosts not >
    5. networking:
    6. podSubnet: 10.200.0.0/16 #192.168.0.0/16

    cilium-cni.yaml

    substitute "10.200.0.0/16" instead of "192.168.0.0/16"
    in /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

    1. hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
    2. hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
    3. ipam: "cluster-pool"
    4. ipam-cilium-node-update-rate: "15s"
    5. cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16"
    6. cluster-pool-ipv4-mask-size: "24"
    7. egress-gateway-reconciliation-trigger-interval: "1s"
    8. enable-vtep: "false"
  • Posts: 2,483
    edited July 17

    Hi @luigicucciolillo,

    You seem to be using conflicting configuration details in your commands.

    What course are you enrolled in? Is it LFS258 Kubernetes Fundamentals or LFD259 Kubernetes for Developers?

    Also, what hypervisor are you using? How many network interfaces are set per VM (and what types)? Is all inbound traffic allowed to the VMs by the hypervisor firewall (promiscuous mode enabled)?

    Regards,
    -Chris

  • Posts: 7
    edited July 18

    Good morning @chrispokorni ,

    Actually enrolled in LFS258 Kubernetes Fundamentals,

    I'm running anything on a raspberry pi 5 (16GB ram + 512 GB ssd) with ubuntu on it.

    Thank you for the support

  • Posts: 2,483

    Hi @luigicucciolillo,

    The content has not been tested on Raspberry pi, so I cannot comment on any issues or limitations that may be related to the hardware. The content was tested on cloud VMs - Google cloud GCE, AWS EC2, Azure VM, DigitalOcean Droplet; and local VMs provisioned through VirtualBox, QEMU/KVM, VMware Workstation/Player. Each system (server, VM, etc...) that acts as a Kubernetes node needs 2 CPUs, 8 GB RAM, and 20+ GB disk. While you may be able to work with less RAM (4 or 6 GB) this may slow down your cluster.
    However, I highly recommend following the installation and configuration instructions from the LFS258 lab guide. From your notes it seems you are mixing in external content, from other sources, leading to an inconsistent environment.
    More specifically:

    • After setting the desired pod subnet (10.200.0.0/16) in the kubeadm-config.yaml manifest, use the init command as it is shown in the lab guide, step 20. The way you customized the command defeats the purpose of the earlier prep step, and subsequent configuration.
    • Make sure that k8scp is only configured as an alias for the control plane node, not the actual hostname.
    • Edit the cilium-cni.yaml manifest to use the same pod CIDR (10.200.0.0/16) set earlier for the init phase.
    • Mixing in steps from other threads with the instructions from the lab guide may introduce conflicting configuration in your setup, much more difficult to troubleshoot.

    Regards,
    -Chris

  • Hi @chrispokorni ,
    Resume of today's troubleshooting in the intent of grow the cluster (LAB3.2):
    at new power on of the machine i found:

    • kubelet down
    • Api-server down
    • etcd down

    after some troubleshooting, i found the swap on.
    so, resettled swapoff -a, now anything (etcd/api-server/kubelet) on and ready to work.

    i been learning and reading all the commands used for that troubleshooting.
    will continue tomorrow with the lab.


    Will follow some prompt for checks:

    hostname

    1. root@luigispi:~# hostname
    2. luigispi

    Node up

    1. luigi@luigispi:~$ sudo kubectl get nodes
    2. NAME STATUS ROLES AGE VERSION
    3. k8scp Ready control-plane 26h v1.32.1

    checking yaml files

    cilium-cni.yaml

    1. [...]
    2. ipam: "cluster-pool"
    3. ipam-cilium-node-update-rate: "15s"
    4. cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16"
    5. [...]

    kubeadm-config.yaml

    1. root@luigispi:~# cat kubeadm-config.yaml
    2. apiVersion: kubeadm.k8s.io/v1beta4
    3. kind: ClusterConfiguration
    4. kubernetesVersion: 1.32.1
    5. controlPlaneEndpoint: "k8scp:6443"
    6. networking:
    7. podSubnet: 10.200.0.0/16 # instead of 192.168.0.0/16

    Conclusions

    That troubleshooting is definitely interesting and will improve the knowledge retain and understanding,
    but i m not sure will continue on the single node (same physical machine) cluster,
    it takes a lot of time to troubleshoot.


    What do you suggests?
    Better follow the Labs and move on GCE?

    Thank you for the support.

  • Posts: 2,483

    Hi @luigicucciolillo,

    The swapoff -a can be found in step 7 of the lab guide. Please follow carefully the exercises as they are presented in the guide.

    On a different note, I still see k8scp used as a node name, instead of an alias. Running kubectl with sudo (or as root) is not recommended, and it is not how it is presented in the lab guide.

    What are the outputs of:

    1. kubectl get nodes -o wide
    2. kubectl get pods -A -o wide

    Regards,
    -Chris

  • Hi @chrispokorni ,
    here the results of today troubleshooting, with the help of chatgpt (probably gonna move to claude soon).

    At power on of the machine, etcd was again down :s

    so, i settled a 3 terminals view (down the promt out):

    • terminal 1: kubectl get pods -A -o wide
    • terminal 2: crictl events
    • terminal 3: sudo lsof -i :6443

    i saw the etcd container go up and down,
    SYN_SENT on port 6443 and timeout during dial tcp 192.168.105.156:6443.

    so i started a post mortem analysis, etcd exited, containerd collected the garbage ( :o ) and crashloopbackoff reported from jounalctl.

    so, diving in the etcd manifest i found:

    1. [...]
    2. - --initial-advertise-peer-urls=https://192.168.105.156:2380
    3. - --initial-cluster=k8scp=https://192.168.105.156:2380
    4. - --key-file=/etc/kubernetes/pki/etcd/server.key
    5. - --listen-client-urls=https://127.0.0.1:2379,https://192.168.105.156:2379
    6. - --listen-metrics-urls=http://127.0.0.1:2381
    7. - --listen-peer-urls=https://192.168.105.156:2380
    8. [...]

    but, ip a | grep inet returned:

    1. [...]
    2. inet 192.168.237.156/24 brd 192.168.237.255 scope global dynamic noprefixroute wlan0
    3. [...]

    hence, seems i have problems with ip addressing of all the jazz...

    moreover, i started the cluster (see above) with:

    1. kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs

    not sure about that,
    but my opinion is that etcd is looking for 192.168.105.156 but it does not answer because of new ip.
    Am i right?
    Am i missing something else?


    In the attached file 'prompt-out-commands.txt' you can find all the output of the following commands :

    • for the 3 terminals view:
    1. kubectl get pods -A -o wide
    2. sudo crictl events
    3. sudo lsof -i :6443
    • for the post mortem analysis:
    1. sudo crictl ps -a | grep etcd
    2. sudo crictl logs 45d40286716af
    3. sudo journalctl -u kubelet -xe | grep etcd
    4. sudo cat /etc/kubernetes/manifests/etcd.yaml
    5. ip a | grep inet
  • Posts: 2,483

    Hi @luigicucciolillo,

    Consistency in the control plane endpoint is key in a Kubernetes cluster. A static IP address assigned to the instances acting as Kubernetes nodes and/or a correctly applied init command should fix some of the issues encountered.
    So far the init commands applied seem to be consistently incorrect - the IP address is not the one recommended to you earlier, the node name is not the one from the lab guide (but the alias that should be just that - an alias), and there's the absence of the kubeadm-config.yaml manifest. This consistently yields improperly configured Kubernetes control plane components.
    I would recommend starting from scratch, following the lab guide and applying the earlier suggestions and recommendations that were provided to you in this thread.

    The purpose of lab 3 is to help the learner effortlessly initialize a working Kubernetes cluster. For this course, please avoid custom approaches to cluster initialization, as their troubleshooting is out of scope.

    Regards,
    -Chris

  • Hi @chrispokorni ,
    Sorry for the late answer...
    Thank you for the support,
    Definitely got your point in the reapply anything...
    Probably the static IP and the correct init procedure would be enough to start the cluster.

    But, just decided that will post pone the implementation in the single node.
    Really interesting the troubleshooting and the "side" understanding,
    But will need tons of time more,
    and actually i'm running a bit out of it.

    To close the post,
    I also found online a guide about a single node rpi implementation,
    hoping gonna be useful for somebody link here.

    See you next problem!
    take care...

  • Posts: 2,483

    Hi @luigicucciolillo,

    Keep in mind that the course was designed for a multi-node Kubernetes cluster to showcase the complexity of the networking, and the implementation of load-balancing and high-availability for application and control plane fault tolerance in a distributed multi-system environment.

    Regards,
    -Chris

  • hi @chrispokorni ,
    All good here.
    VMs up and nginx running a lot :)
    i been able to complete all the labs from chapter 3.
    just one remark, i was not able to access to the control-plane' nginx from outside (point 6 of LAB 3.5).

    instead, i was able to curl from localhost and from the worker node (using the clusterIP).

    having a look on the web, seems that the GCP firewall is blocking the request,

    so i tried a port-forward, but no luck.

    any suggestions?

    curl from control-plane

    1. lc@control-plane-1-lfs258-n2:~$ curl localhost:32341
    2. <!DOCTYPE html>
    3. <html>
    4. <head>
    5. <title>Welcome to nginx!</title>
    6. <style>
    7. html { color-scheme: light dark; }
    8. body { width: 35em; margin: 0 auto;
    9. font-family: Tahoma, Verdana, Arial, sans-serif; }
    10. </style>
    11. </head>
    12. <body>
    13. <h1>Welcome to nginx!</h1>
    14. <p>If you see this page, the nginx web server is successfully installed and
    15. working. Further configuration is required.</p>
    16.  
    17. <p>For online documentation and support please refer to
    18. <a href="http://nginx.org/">nginx.org</a>.<br/>
    19. Commercial support is available at
    20. <a href="http://nginx.com/">nginx.com</a>.</p>
    21.  
    22. <p><em>Thank you for using nginx.</em></p>
    23. </body>
    24. </html>

    curl from worker node

    1. lc@wn1-lfs258:~$ curl 10.198.0.2:32341
    2. <!DOCTYPE html>
    3. <html>
    4. <head>
    5. <title>Welcome to nginx!</title>
    6. <style>
    7. html { color-scheme: light dark; }
    8. body { width: 35em; margin: 0 auto;
    9. font-family: Tahoma, Verdana, Arial, sans-serif; }
    10. </style>
    11. </head>
    12. <body>
    13. <h1>Welcome to nginx!</h1>
    14. <p>If you see this page, the nginx web server is successfully installed and
    15. working. Further configuration is required.</p>
    16.  
    17. <p>For online documentation and support please refer to
    18. <a href="http://nginx.org/">nginx.org</a>.<br/>
    19. Commercial support is available at
    20. <a href="http://nginx.com/">nginx.com</a>.</p>
    21.  
    22. <p><em>Thank you for using nginx.</em></p>
    23. </body>

    port forwarding

    1. kubectl port-forward service/nginx 32341:80
    2. Forwarding from 127.0.0.1:32341 -> 80
    3. Forwarding from [::1]:32341 -> 80
  • Posts: 2,483

    Hi @luigicucciolillo,

    For a simple VPC firewall configuration I recommend watching the demo video for GCE from the "Course Introduction" chapter.

    Regards,
    -Chris

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training