Welcome to the Linux Foundation Forum!

unable go on LFS258 LAB 3.1

luigicucciolillo
luigicucciolillo Posts: 9
edited July 16 in LFS258 Class Forum

hi all,
i m having big problems in the lab 3.1, in the meanwhile i m settling the cluster.
after a full reset:

sudo kubeadm reset
sudo kubeadm init

the prompt out is:

our Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
  export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.193.156:6443 --token 

so i wrote:

sudo su
root@luigispi:/home/luigi# export KUBECONFIG=/etc/kubernetes/admin.conf

and after continued with the step 22:

luigi@luigispi:~$ find $HOME -name cilium-cni.yaml
/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

so, applied it:

luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml
error: error validating "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": error validating data: failed to download openapi: Get "https://k8scp:6443/openapi/v2?timeout=32s": dial tcp 192.168.193.156:6443: connect: connection refused; if you choose to ignore these errors, turn validation off with --validate=false

hence i skipped the tests for now:

luigi@luigispi:~$ sudo kubectl apply -f /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml --validate=false
error when retrieving current configuration of:
Resource: "/v1, Resource=serviceaccounts", GroupVersionKind: "/v1, Kind=ServiceAccount"
Name: "cilium", Namespace: "kube-system"
from server for: "/home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml": Get "https://k8scp:6443/api/v1/namespaces/kube-system/serviceaccounts/cilium": dial tcp 192.168.193.156:6443: connect: connection refused

...

and there is no way to go on!

tried to reset,
to set again anything,
to check if the apiserver is on,
if the IPs are ok...
no firewall in the middle...

any idea to solve it?

sudo crictl ps -a | grep kube-apiserver
WARN[0000] Config "/etc/crictl.yaml" does not exist, trying next: "/usr/bin/crictl.yaml" 
WARN[0000] runtime connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
WARN[0000] Image connect using default endpoints: [unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead. 
fc8e8e0f14e1b       edd0d4592f909       9 seconds ago        Running             kube-apiserver            3                   91ee85dcd862a       kube-apiserver-luigispi            kube-system
61f63e5faf3db       edd0d4592f909       About a minute ago   Exited              kube-apiserver            2                   dba265bd82779       kube-apiserver-luigispi            kube-system

Comments

  • chrispokorni
    chrispokorni Posts: 2,487

    Hi @luigicucciolillo,

    I highly recommend respecting the commands' syntax as closely to the lab guide as possible.
    After a full control plane reset execute the new init with all the options and flags as presented in the lab guide. Re-setting the kubeconfig is also essential after a new init.

    A concerning aspect of your setup is the IP address of the VM (possibly the control plane node) 192.168.x.y. Overlapping VM IP addresses with the Pod subnet will eventually cause routing issues within your cluster. The are two possible solutions:
    1. Reconfigure the hypervisor DHCP server to use a different IP range for VMs (for example 10.200.0.0/16) and validate or edit if necessary both the kubeadm-config.yaml and cilium-cni.yaml to have the pod subnet set to 192.168.0.0/16
    2. Leave the hypervisor DHCP as is on 192.168.x.y/z range and validate or edit if necessary both the kubeadm-config.yaml and cilium-cni.yaml to have the pod subnet set to 10.200.0.0/16

    Regards,
    -Chris

  • hi @chrispokorni ,
    thank you for answering.

    adjust subnets

    After applying the following reset procedure, i applied the subnets you suggested (10.200.0.0/16) both on cilium-cni and kubeadm-config.
    but no changes in the result :( .

    new kubeadm command

    hence, inspired by similar thread i launched again the reset procedure and used the command:

    kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs
    

    with that i been able to complete the cluster initialization and the laboratory :) .

    I'm attaching below the prompt collected during the troubleshooting, hopefully gonna be useful for somebody else.

    If you (or somebody else) wanna spent two words for the reset procedure, would be really appreciable.

    thank you for the support.

    see you next problem.

    Reset procedure

    sudo kubeadm reset --force
    sudo rm -rf /etc/cni/net.d
    sudo rm -rf /var/lib/etcd
    sudo rm -rf /var/lib/kubelet/*
    sudo rm -rf /etc/kubernetes/*
    rm -rf $HOME/.kube
    sudo rm -rf /etc/cni/net.d
    sudo systemctl stop kubelet
    sudo systemctl stop containerd
    sudo iptables -F
    sudo ipvsadm --clear
    sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
    sudo apt-get autoremove
    sudo ctr containers list | awk '{print $1}' | xargs -r sudo ctr containers delete
    sudo ctr snapshots list | awk '{print $1}' | xargs -r sudo ctr snapshots remove
    sudo iptables -F && sudo iptables -X
    sudo iptables -t nat -F && sudo iptables -t nat -X
    sudo iptables -t raw -F && sudo iptables -t raw -X
    sudo iptables -t mangle -F && sudo iptables -t mangle -X
    sudo rm -rf /var/lib/docker /etc/docker /var/run/docker.sock
    sudo rm -rf /var/lib/containerd
    sudo rm -rf /run/containerd/containerd.sock
    sudo rm -rf /run/docker.sock
    rm -rf ~/.kube
    rm -rf ~/.docker
    

    prompts from subnets adjusting

    kubeadm-config.yaml

    substitute "10.200.0.0/16" instead of "192.168.0.0/16"
    in /root/kubeadm-config.yaml

    apiVersion: kubeadm.k8s.io/v1beta4
    kind: ClusterConfiguration
    kubernetesVersion: 1.32.1 # <-- Use the word stable for newest version
    controlPlaneEndpoint: "k8scp:6443" #<-- Use the alias we put in /etc/hosts not >
    networking:
     podSubnet: 10.200.0.0/16 #192.168.0.0/16
    

    cilium-cni.yaml

    substitute "10.200.0.0/16" instead of "192.168.0.0/16"
    in /home/luigi/LFS258/SOLUTIONS/s_03/cilium-cni.yaml

      hubble-tls-key-file: /var/lib/cilium/tls/hubble/server.key
      hubble-tls-client-ca-files: /var/lib/cilium/tls/hubble/client-ca.crt
      ipam: "cluster-pool"
      ipam-cilium-node-update-rate: "15s"
      cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16"  
      cluster-pool-ipv4-mask-size: "24"
      egress-gateway-reconciliation-trigger-interval: "1s"
      enable-vtep: "false"
    
  • chrispokorni
    chrispokorni Posts: 2,487
    edited July 17

    Hi @luigicucciolillo,

    You seem to be using conflicting configuration details in your commands.

    What course are you enrolled in? Is it LFS258 Kubernetes Fundamentals or LFD259 Kubernetes for Developers?

    Also, what hypervisor are you using? How many network interfaces are set per VM (and what types)? Is all inbound traffic allowed to the VMs by the hypervisor firewall (promiscuous mode enabled)?

    Regards,
    -Chris

  • luigicucciolillo
    luigicucciolillo Posts: 9
    edited July 18

    Good morning @chrispokorni ,

    Actually enrolled in LFS258 Kubernetes Fundamentals,

    I'm running anything on a raspberry pi 5 (16GB ram + 512 GB ssd) with ubuntu on it.

    Thank you for the support

  • chrispokorni
    chrispokorni Posts: 2,487

    Hi @luigicucciolillo,

    The content has not been tested on Raspberry pi, so I cannot comment on any issues or limitations that may be related to the hardware. The content was tested on cloud VMs - Google cloud GCE, AWS EC2, Azure VM, DigitalOcean Droplet; and local VMs provisioned through VirtualBox, QEMU/KVM, VMware Workstation/Player. Each system (server, VM, etc...) that acts as a Kubernetes node needs 2 CPUs, 8 GB RAM, and 20+ GB disk. While you may be able to work with less RAM (4 or 6 GB) this may slow down your cluster.
    However, I highly recommend following the installation and configuration instructions from the LFS258 lab guide. From your notes it seems you are mixing in external content, from other sources, leading to an inconsistent environment.
    More specifically:

    • After setting the desired pod subnet (10.200.0.0/16) in the kubeadm-config.yaml manifest, use the init command as it is shown in the lab guide, step 20. The way you customized the command defeats the purpose of the earlier prep step, and subsequent configuration.
    • Make sure that k8scp is only configured as an alias for the control plane node, not the actual hostname.
    • Edit the cilium-cni.yaml manifest to use the same pod CIDR (10.200.0.0/16) set earlier for the init phase.
    • Mixing in steps from other threads with the instructions from the lab guide may introduce conflicting configuration in your setup, much more difficult to troubleshoot.

    Regards,
    -Chris

  • Hi @chrispokorni ,
    Resume of today's troubleshooting in the intent of grow the cluster (LAB3.2):
    at new power on of the machine i found:

    • kubelet down
    • Api-server down
    • etcd down

    after some troubleshooting, i found the swap on.
    so, resettled swapoff -a, now anything (etcd/api-server/kubelet) on and ready to work.

    i been learning and reading all the commands used for that troubleshooting.
    will continue tomorrow with the lab.


    Will follow some prompt for checks:

    hostname

    root@luigispi:~# hostname
    luigispi
    

    Node up

    luigi@luigispi:~$ sudo kubectl get nodes
    NAME    STATUS   ROLES           AGE   VERSION
    k8scp   Ready    control-plane   26h   v1.32.1
    

    checking yaml files

    cilium-cni.yaml

    [...]
      ipam: "cluster-pool"
      ipam-cilium-node-update-rate: "15s"
      cluster-pool-ipv4-cidr: "10.200.0.0/16" #instead of "192.168.0.0/16"
    [...]
    

    kubeadm-config.yaml

    root@luigispi:~# cat kubeadm-config.yaml 
    apiVersion: kubeadm.k8s.io/v1beta4
    kind: ClusterConfiguration
    kubernetesVersion: 1.32.1
    controlPlaneEndpoint: "k8scp:6443"
    networking:
      podSubnet: 10.200.0.0/16 # instead of 192.168.0.0/16
    

    Conclusions

    That troubleshooting is definitely interesting and will improve the knowledge retain and understanding,
    but i m not sure will continue on the single node (same physical machine) cluster,
    it takes a lot of time to troubleshoot.


    What do you suggests?
    Better follow the Labs and move on GCE?

    Thank you for the support.

  • chrispokorni
    chrispokorni Posts: 2,487

    Hi @luigicucciolillo,

    The swapoff -a can be found in step 7 of the lab guide. Please follow carefully the exercises as they are presented in the guide.

    On a different note, I still see k8scp used as a node name, instead of an alias. Running kubectl with sudo (or as root) is not recommended, and it is not how it is presented in the lab guide.

    What are the outputs of:

    kubectl get nodes -o wide
    kubectl get pods -A -o wide
    

    Regards,
    -Chris

  • Hi @chrispokorni ,
    here the results of today troubleshooting, with the help of chatgpt (probably gonna move to claude soon).

    At power on of the machine, etcd was again down :s

    so, i settled a 3 terminals view (down the promt out):

    • terminal 1: kubectl get pods -A -o wide
    • terminal 2: crictl events
    • terminal 3: sudo lsof -i :6443

    i saw the etcd container go up and down,
    SYN_SENT on port 6443 and timeout during dial tcp 192.168.105.156:6443.

    so i started a post mortem analysis, etcd exited, containerd collected the garbage ( :o ) and crashloopbackoff reported from jounalctl.

    so, diving in the etcd manifest i found:

    [...]
        - --initial-advertise-peer-urls=https://192.168.105.156:2380
        - --initial-cluster=k8scp=https://192.168.105.156:2380
        - --key-file=/etc/kubernetes/pki/etcd/server.key
        - --listen-client-urls=https://127.0.0.1:2379,https://192.168.105.156:2379
        - --listen-metrics-urls=http://127.0.0.1:2381
        - --listen-peer-urls=https://192.168.105.156:2380
    [...]
    

    but, ip a | grep inet returned:

    [...]
        inet 192.168.237.156/24 brd 192.168.237.255 scope global dynamic noprefixroute wlan0
    [...]
    

    hence, seems i have problems with ip addressing of all the jazz...

    moreover, i started the cluster (see above) with:

    kubeadm init --pod-network-cidr 192.168.0.0/16 --node-name k8scp --upload-certs
    

    not sure about that,
    but my opinion is that etcd is looking for 192.168.105.156 but it does not answer because of new ip.
    Am i right?
    Am i missing something else?


    In the attached file 'prompt-out-commands.txt' you can find all the output of the following commands :

    • for the 3 terminals view:
    kubectl get pods -A -o wide
    sudo crictl events
    sudo lsof -i :6443
    
    • for the post mortem analysis:
    sudo crictl ps -a | grep etcd
    sudo crictl logs 45d40286716af
    sudo journalctl -u kubelet -xe | grep etcd
    sudo cat /etc/kubernetes/manifests/etcd.yaml
    ip a | grep inet
    
  • chrispokorni
    chrispokorni Posts: 2,487

    Hi @luigicucciolillo,

    Consistency in the control plane endpoint is key in a Kubernetes cluster. A static IP address assigned to the instances acting as Kubernetes nodes and/or a correctly applied init command should fix some of the issues encountered.
    So far the init commands applied seem to be consistently incorrect - the IP address is not the one recommended to you earlier, the node name is not the one from the lab guide (but the alias that should be just that - an alias), and there's the absence of the kubeadm-config.yaml manifest. This consistently yields improperly configured Kubernetes control plane components.
    I would recommend starting from scratch, following the lab guide and applying the earlier suggestions and recommendations that were provided to you in this thread.

    The purpose of lab 3 is to help the learner effortlessly initialize a working Kubernetes cluster. For this course, please avoid custom approaches to cluster initialization, as their troubleshooting is out of scope.

    Regards,
    -Chris

  • Hi @chrispokorni ,
    Sorry for the late answer...
    Thank you for the support,
    Definitely got your point in the reapply anything...
    Probably the static IP and the correct init procedure would be enough to start the cluster.

    But, just decided that will post pone the implementation in the single node.
    Really interesting the troubleshooting and the "side" understanding,
    But will need tons of time more,
    and actually i'm running a bit out of it.

    To close the post,
    I also found online a guide about a single node rpi implementation,
    hoping gonna be useful for somebody link here.

    See you next problem!
    take care...

  • chrispokorni
    chrispokorni Posts: 2,487

    Hi @luigicucciolillo,

    Keep in mind that the course was designed for a multi-node Kubernetes cluster to showcase the complexity of the networking, and the implementation of load-balancing and high-availability for application and control plane fault tolerance in a distributed multi-system environment.

    Regards,
    -Chris

  • hi @chrispokorni ,
    All good here.
    VMs up and nginx running a lot :)
    i been able to complete all the labs from chapter 3.
    just one remark, i was not able to access to the control-plane' nginx from outside (point 6 of LAB 3.5).

    instead, i was able to curl from localhost and from the worker node (using the clusterIP).

    having a look on the web, seems that the GCP firewall is blocking the request,

    so i tried a port-forward, but no luck.

    any suggestions?

    curl from control-plane

    lc@control-plane-1-lfs258-n2:~$ curl localhost:32341
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
    html { color-scheme: light dark; }
    body { width: 35em; margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif; }
    </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    
    <p><em>Thank you for using nginx.</em></p>
    </body>
    </html>
    

    curl from worker node

    lc@wn1-lfs258:~$ curl 10.198.0.2:32341
    <!DOCTYPE html>
    <html>
    <head>
    <title>Welcome to nginx!</title>
    <style>
    html { color-scheme: light dark; }
    body { width: 35em; margin: 0 auto;
    font-family: Tahoma, Verdana, Arial, sans-serif; }
    </style>
    </head>
    <body>
    <h1>Welcome to nginx!</h1>
    <p>If you see this page, the nginx web server is successfully installed and
    working. Further configuration is required.</p>
    
    <p>For online documentation and support please refer to
    <a href="http://nginx.org/">nginx.org</a>.<br/>
    Commercial support is available at
    <a href="http://nginx.com/">nginx.com</a>.</p>
    
    <p><em>Thank you for using nginx.</em></p>
    </body>
    

    port forwarding

    kubectl port-forward service/nginx 32341:80
    Forwarding from 127.0.0.1:32341 -> 80
    Forwarding from [::1]:32341 -> 80
    
  • chrispokorni
    chrispokorni Posts: 2,487

    Hi @luigicucciolillo,

    For a simple VPC firewall configuration I recommend watching the demo video for GCE from the "Course Introduction" chapter.

    Regards,
    -Chris

Categories

Upcoming Training