Welcome to the Linux Foundation Forum!

Unable to setup cluster on Lab 3.1

As I'm following the instructions in the book, the setup for kubeadm init --config=kubeadm-config.yaml --upload-certs --v=5 | tee kubeadm-init.out fails with API Server not being available.

On running systemctl status kubelet the output is

and on running journalctl -xeu kubelet the output is

I've tried debugging by using other forums online but to no use. If I try to figure out the logs for the containers, the output is

Please help!

«1

Comments

  • serewicz
    serewicz Posts: 1,000

    Hello,

    I see a lot of errors saying "pranaynada2c.mylabserver.com not found". If you edited /etc/hosts and the kubeadm-config.yaml files properly, and only used k8smaster as the server name, kubelet shouldn't be asking for the actual host name.

    1. I would first make sure you are not changing the hostname, just adding an alias to your primary IP to /etc/hosts.
    2. Double check the name and other values in kubeadm-config.yaml
    3. Ensure you are using a fresh instance, not one that failed,edited and tried again. Not everything is cleaned out by kubeadm reset
    4. Test that the alias works using ping prior to running kubeadm init on the fresh system. Double check you are using your primary interface.

    Regards,

  • mlgajjar
    mlgajjar Posts: 4

    I have similar kind of issue.
    root@master:~/LFS258/SOLUTIONS/s_03# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    W0502 18:10:58.821199 3699 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
    [init] Using Kubernetes version: v1.18.1
    [preflight] Running pre-flight checks
    [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    [preflight] Pulling images required for setting up a Kubernetes cluster
    [preflight] This might take a minute or two, depending on the speed of your internet connection
    [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
    [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
    [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
    [kubelet-start] Starting the kubelet
    [certs] Using certificateDir folder "/etc/kubernetes/pki"
    [certs] Generating "ca" certificate and key
    [certs] Generating "apiserver" certificate and key
    [certs] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local k8smaster] and IPs [10.96.0.1 10.0.0.2]
    [certs] Generating "apiserver-kubelet-client" certificate and key
    [certs] Generating "front-proxy-ca" certificate and key
    [certs] Generating "front-proxy-client" certificate and key
    [certs] Generating "etcd/ca" certificate and key
    [certs] Generating "etcd/server" certificate and key
    [certs] etcd/server serving cert is signed for DNS names [master localhost] and IPs [10.0.0.2 127.0.0.1 ::1]
    [certs] Generating "etcd/peer" certificate and key
    [certs] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [10.0.0.2 127.0.0.1 ::1]
    [certs] Generating "etcd/healthcheck-client" certificate and key
    [certs] Generating "apiserver-etcd-client" certificate and key
    [certs] Generating "sa" key and public key
    [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
    [kubeconfig] Writing "admin.conf" kubeconfig file
    [kubeconfig] Writing "kubelet.conf" kubeconfig file
    [kubeconfig] Writing "controller-manager.conf" kubeconfig file
    [kubeconfig] Writing "scheduler.conf" kubeconfig file
    [control-plane] Using manifest folder "/etc/kubernetes/manifests"
    [control-plane] Creating static Pod manifest for "kube-apiserver"
    [control-plane] Creating static Pod manifest for "kube-controller-manager"
    W0502 18:11:08.853546 3699 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
    [control-plane] Creating static Pod manifest for "kube-scheduler"
    W0502 18:11:08.856794 3699 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
    [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] Initial timeout of 40s passed.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

        Unfortunately, an error has occurred:
                error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
    

    To see the stack trace of this error execute with --v=5 or higher
    timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
    
        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'
    
        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.
    
        Here is one example how you may list all Kubernetes containers running in docker:
                - 'docker ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'docker logs CONTAINERID'
    
  • mlgajjar
    mlgajjar Posts: 4

    other information from enviroment. I am using GSK node.
    root@master:~/LFS258/SOLUTIONS/s_03# cat kubeadm-config.yaml
    apiVersion: kubeadm.k8s.io/v1beta2
    kind: ClusterConfiguration
    kubernetesVersion: 1.18.1
    controlPlaneEndpoint: "k8smaster:6443"
    networking:
    podSubnet: 192.168.0.0/16
    root@master:~/LFS258/SOLUTIONS/s_03# nslookup k8master
    Server: 127.0.0.53
    Address: 127.0.0.53#53

    Non-authoritative answer:
    Name: k8master
    Address: 10.0.0.2

    root@master:~/LFS258/SOLUTIONS/s_03# telnet k8master 6443
    Trying 10.0.0.2...
    Connected to k8master.
    Escape character is '^]'.
    ^CConnection closed by foreign host.
    root@master:~/LFS258/SOLUTIONS/s_03# systemctl status kubelet
    ● kubelet.service - kubelet: The Kubernetes Node Agent
    Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/kubelet.service.d
    └─10-kubeadm.conf
    Active: active (running) since Sat 2020-05-02 18:24:59 UTC; 13s ago
    Docs: https://kubernetes.io/docs/home/
    Main PID: 19838 (kubelet)
    Tasks: 16 (limit: 4915)
    CGroup: /system.slice/kubelet.service
    └─19838 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config

    May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.523688 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.623972 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.724250 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.824483 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.924786 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.025039 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.125323 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.225569 19838 kubelet.go:2267] node "master" not found
    May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.306152 19838 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: Get https://k8smaster:6
    May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.325826 19838 kubelet.go:2267] node "master" not found

  • mlgajjar
    mlgajjar Posts: 4

    root@master:~/LFS258/SOLUTIONS/s_03# docker ps -a | grep kube | grep -v pause
    1958f65e62ab d1ccdd18e6ed "kube-controller-man…" 14 minutes ago Up 14 minutes k8s_kube-controller-manager_kube-controller-manager-master_kube-system_a2e7dbae641996802ce46175f4f5c5dc_0
    08f86fdddaf7 6c9320041a7b "kube-scheduler --au…" 14 minutes ago Up 14 minutes k8s_kube-scheduler_kube-scheduler-master_kube-system_363a5bee1d59c51a98e345162db75755_0
    7b9ae64e82b3 a595af0107f9 "kube-apiserver --ad…" 14 minutes ago Up 14 minutes k8s_kube-apiserver_kube-apiserver-master_kube-system_e26867be8b93ee68c10a8808e67e6488_0
    6654e9244a6f 303ce5db0e90 "etcd --advertise-cl…" 14 minutes ago Up 14 minutes k8s_etcd_etcd-master_kube-system_66dbf808af4751d2cf0d4dad30261e40_0
    root@master:~/LFS258/SOLUTIONS/s_03# journalctl -xeu kubelet
    May 02 18:25:44 master kubelet[20628]: I0502 18:25:44.047630 20628 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
    May 02 18:25:44 master kubelet[20628]: I0502 18:25:44.071348 20628 kubelet_node_status.go:70] Attempting to register node master
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.095585 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.194055 20628 controller.go:136] failed to ensure node lease exists, will retry in 3.2s, error: Get https://k8smaster
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.195805 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.247486 20628 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiti
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.296035 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.396259 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.447478 20628 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:526: Failed to list *v1.Node: Get https://k8
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.496920 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.597158 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.647446 20628 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://k8smaster:644
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.697391 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.797591 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.847352 20628 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: Get https://
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.897815 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.998056 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.098263 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.198443 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.298636 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.398840 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.447164 20628 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.499052 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.599301 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.699542 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.799738 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.899939 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.915224 20628 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiti
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.000127 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.100367 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.200623 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: I0502 18:25:46.247711 20628 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.250685 20628 event.go:269] Unable to write event: 'Post https://k8smaster:6443/api/v1/namespaces/default/events: dia
    May 02 18:25:46 master kubelet[20628]: I0502 18:25:46.272767 20628 kubelet_node_status.go:70] Attempting to register node master
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.274607 20628 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://k8smaster:644
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.300868 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.401202 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.501406 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.601625 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.701864 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.802124 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.902328 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.914745 20628 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiti
    May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.002541 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.102852 20628 kubelet.go:2267] node "master" not found
    May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.152316 20628 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: Get https:/
    May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.203059 20628 kubelet.go:2267] node "master" not found

  • mlgajjar
    mlgajjar Posts: 4

    I have tried this multiple times but its failing on same point. step 14 in LAB3.1. I am using google cloud. Let me know if any known solution out there or any steps missing in LAB 3.1

  • serewicz
    serewicz Posts: 1,000
    edited May 2020

    Hello,

    I note that most errors say "node "master" not found". If you have edited the kubeadm-config.yaml file it should be requesting the alias k8smaster instead. From the prompt it seems that the node name is "master". Is this the original node name, or did you change the prompt?

    Also, you mention you are using GSK environment. I am not familiar with this environment. We suggest GCE. Also, how many networkinterfaces do you have attached to your instances? Try it with a single interface and see if the node can be found.

    Regards,

  • serewicz
    serewicz Posts: 1,000

    Hello again,
    I just ran the scripts using GCE nodes, and a single interface. Worked as expected and as found in the book. This looks to be something with the network configuration you are using for you nodes. This is where the output of my kubeadm init and yours diverge, near the end, the last common line starts with [wait-control-plane]:

    ....
    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [apiclient] All control plane components are healthy after 18.003142 seconds
    [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
    [kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
    [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
    [upload-certs] Using certificate key:
    686d3e6e9eed5baa0229498667e145ef504023324f6a293385514176b13a8b71
    [mark-control-plane] Marking the node test-hkbb as control-plane by adding the label "node-role.kubernetes.io/master=''"
    [mark-control-plane] Marking the node test-hkbb as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
    [bootstrap-token] Using token: r4jk9i.er3823rtldnlcasf
    [bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
    [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
    [bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
    [bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
    [bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
    [bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
    [kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
    [addons] Applied essential addon: CoreDNS
    [addons] Applied essential addon: kube-proxy

    Your Kubernetes control-plane has initialized successfully!
    ....

    As such the communication is not finding master where expected. Was your hostname or IP something else when the node was installed and you changed it? Do you have multiple interfaces and the IP you used is not the primary interface? Do you have a typo in your /etc/hosts file. I notice you show the nslookup output, but not the file. Did you edit /etc/hosts?

    This appears to be an issue with the network configuration and/or DNS environment, not Kubernetes, not the exercise.

    Regards,

  • chrispokorni
    chrispokorni Posts: 2,372

    Hi @mlgajjar,

    You mentioned a GSK environment, which I am not sure what it is. If I assume correctly, and it is a managed Kubernetes service, trying to install Kubernetes again, as instructed in lab exercise 3, then you would end up with an already preconfigured Kubernetes cluster with an additional cluster installed in top of it. This is similar to tossing a second engine in the backseat of a running vehicle and expect that 2nd engine to work.

    Regards,
    -Chris

  • @serewicz said:
    Did you edit /etc/hosts?

    FYI, I ran into this exact issue and a check of the quoted file revealed the inclusion of the IP sub-net mask (*/32). Once removed, worked perfectly. Recommend a check of the config files, as per @serewicz recommendation.

  • oisin
    oisin Posts: 7

    Hi,
    I have encountered a similar issue at lab 3.1.14. - kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out

    This results in an error log like the following:

    [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
    [kubelet-check] Initial timeout of 40s passed.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    [kubelet-check] It seems like the kubelet isn't running or healthy.
    [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
    

    If I try to manually run curl -sSL http://localhost:10248/healthz in the terminal, the command runs without error.

    systemctl status kubelet shows it as active but with a consistent error messsage of kubelet.go:2267] node "ip-172-31-60-156" not found

    I have configured my /etc/hosts with the following lines

    172.31.63.255 k8smaster
    127.0.0.1 localhost
    

    This IP is taken from the output of a ip addr show command, which lists an ens5 interface as opposed to the ens4 used in the lab example

    2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
        link/ether 16:b5:4d:43:c0:c1 brd ff:ff:ff:ff:ff:ff
        inet 172.31.60.156/20 brd 172.31.63.255 scope global dynamic ens5
           valid_lft 2428sec preferred_lft 2428sec
        inet6 fe80::14b5:4dff:fe43:c0c1/64 scope link 
           valid_lft forever preferred_lft forever
    

    I notice that the IP range referenced here appears to be referenced in the kubelet error logs.
    I am using an AWS EC2 instance which is running Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-00ddb0e5626798373 (64-bit x86).

    Appreciate any advice in troubleshooting this further.
    Thanks

  • serewicz
    serewicz Posts: 1,000

    Hello,

    Is the kubeadm-config.yaml file in your current directory? Did you edit /etc/hosts and add your alias properly?

    Perhaps you can share your kubeadm-config.yaml as code or an image so we can notice indentation issues.


    Regards,

  • oisin
    oisin Posts: 7

    Thanks for the reply Tim, here is what I'm working with:

  • serewicz
    serewicz Posts: 1,000

    Hello,

    It looks like you edited the IP range to 172.31 range, leave it as it was, so that it matches what Calico uses, and is in the example YAML file from the course tarball. If you are using 192.168 network already, then edit both the calico and the kubeadm-config.yaml so that they match each other, but do not overlap an IP range in use elsewhere in your environment.

    Regards,

  • oisin
    oisin Posts: 7

    ah yes, I actually changed it in both locations previously while troubleshooting.
    I was encountering the same errors previously with the default 192.168.0.0/16 range that comes with wget https://docs.projectcalico.org/manifests/calico.yaml.

    Two discrepancies that I noticed with the calico.yaml file in the example are that the following lines are commented out by default with different indentation to that shown in the pdf for lab 3.1.

    What I had tested previously was uncommenting the relevant section with the following indentation

    I have tested again just now to be sure. Here I am testing again with the 192.168.0.0/16 range

    I reset kubeadm with kubeadm reset and re-ran kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out to validate. I again got the same error

    and the same logs

  • serewicz
    serewicz Posts: 1,000

    When you say edit, there is no need to edit the calico.yaml file and uncomment anything. You'd need to edit if you were already using 192.168 elsewhere. If you follow the lab exactly as written, what happens?

  • oisin
    oisin Posts: 7
    edited January 2021

    Hi Tim,
    I reset kubeadm, downloaded a fresh version of the calico.yaml file and tried again without editing the file, but got the same results as previously. All other files are the same as prior to your last message and I am running docker.io

    .

  • serewicz
    serewicz Posts: 1,000

    Your errors list the hostname instead if the alias, as it should. You should have an alias that sets k8smaster to your IP of 172.31.60.156. If you created the alias properly in /etc/hosts then are you sure your kubeadm-config.yaml file is in your current working directory?

    It appears the file is not being read. When you run kubeadm init and pass the filename the lab syntax is for the file to be in the current directory. When you type ls, do you see it and is it readable by student, or your non-root user?


    Regards,

  • oisin
    oisin Posts: 7
    edited January 2021

    File permissions had been -rw-r--r-- but I brute forced it with a chmod 777 for troubleshooting purposes.
    The output below shows the kubeadm-config.yaml file in my working directory and with full rwx permissions for all users.

    In spite of that, I still had the same kubelet error of node "ip-172-31-60-156" not found was thrown.

    One interesting difference is that on this occasion, the kubeadm init command failed but did not include the error that was thrown on previous attempts (though it did still fail) - [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

  • oisin
    oisin Posts: 7

    For completeness' sake I ran through a fresh install. Again Ubuntu 18 on AWS EC2 as before.
    ip addr show gives the same IP (172.31.63.255)

    I pulled the calico.yaml file and did not edit it.
    Created /etc/hosts file and, in current direcory created kubeadm-config.yaml.

    Ran kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out and first attempt failed with error The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.

    Ran chmod 777 kubeadm-config.yaml and kubeadm reset before again running kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out. This time it failed again with the same error as the immediately previous attempt.

    Hopefully this fresh attempt helps in the troubleshooting process. Please let me know your thoughts and happy to troubleshoot further.

  • chrispokorni
    chrispokorni Posts: 2,372

    Hi @oisin,

    From your detailed outputs it seems that in both scenarios you misconfiguring the k8smaster alias in the /etc/hosts file. The first step in resolving your issue is to pay close attention to the solution suggested by @serewicz above. Since you are on AWS, thanks to their default EC2 hostname naming convention, the IP of your master node can be extracted from the master node's hostname itself - and that is the IP address that needs to be added to the /etc/hosts file.

    Also, make sure that your AWS VPC has a firewall rule (SG) to allow all ingress traffic - from all sources, to all destination ports, all protocols.

    Regards,
    -Chris

  • oisin
    oisin Posts: 7
    edited January 2021

    Thanks Chris and Tim for the troubleshooting help. Much appreciated!

    I have gotten it working and have realized my oversight. I will outline it here in case it helps any future students.
    In the documentation for this lab, the output of ip addr show includes

    inet 10.128.0.3/32 brd 10.128.0.3 scope global ens4
    

    The /etc/hosts file is then configured with a line reading

    10.128.0.3 k8smaster 
    

    My presumption was that the brd address was being used but it was the inet address.
    In my previous messages, it was failing because of this.
    In the case of my first message on this thread, I listed the output of ip addr show as including

    inet 172.31.60.156/20 brd 172.31.63.255 scope global dynamic ens5
    

    If I had added 172.31.60.156 k8smaster as opposed to 172.31.63.255 k8smaster it would have worked.
    Chris's suggestion to get the IP directly from the EC2 hostname is also a good solution.

    Presuming that it is always the inet address that we should be using, would it be possible to update the documentation to avoid confusion and specifically advise people to use the inet address? I presume that many would instinctively know which address to use, but it could help to limit the scope of ambiguity.

    Thanks again for the help.

  • It seems the problem has been solved. I just want to offer another solution to 'kubeadm init' problem when you see:

    And when I did a 'journalctl -u kubelet.service' I saw:
    Jan 04 20:31:57 master kubelet[23220]: E0104 20:31:57.126622 23220 kubelet.go:2267] node "master" not found

    Solution:
    In my case I just had a typo in the /etc/hosts.
    10.2.0.4 k8master instead of
    10.2.0.4 k8smaster

  • Hello everyone,
    I have the same problem. My environment is on the Virtual Box. I installed two Ubuntu 18.04 servers with 4xCPU, 6GB RAM. I followed all the instructions but when I run 'kubeadm --v = 5 init --config = kubeadm-config.yaml --upload-certs | tee kubeadm-init.out 'I get the following error(see below). I checked and tried everything you suggested to the other students but without success. Below you can find screenshots of files, services and journalctl. Can you please help ?








  • Hi @asmoljo,

    Thank you for providing such detailed outputs. It seems that the first error is at a timeout, when either port 6443 is not accessible or k8smaster cannot be resolved. Can you verify that port 6443 of the VM is open, and what application is listening on it? Also, can you verify that k8smaster can be resolved to the expected IP address?

    What is your host OS and what is the size of your host machine? Are there any firewalls active on your host?
    What is your guest OS full release? Are there any firewalls active on your guest?
    Aside from the CPU and memory mentioned above, how much disk space did you configure for each VM? Is it dynamic?
    What type of networking did you configure on the virtualbox VM?

    Regards,
    -Chris

  • @chrispokorni said:
    Hi @asmoljo,

    Thank you for providing such detailed outputs. It seems that the first error is at a timeout, when either port 6443 is not accessible or k8smaster cannot be resolved. Can you verify that port 6443 of the VM is open, and what application is listening on it? Also, can you verify that k8smaster can be resolved to the expected IP address?

    What is your host OS and what is the size of your host machine? Are there any firewalls active on your host?
    What is your guest OS full release? Are there any firewalls active on your guest?
    Aside from the CPU and memory mentioned above, how much disk space did you configure for each VM? Is it dynamic?
    What type of networking did you configure on the virtualbox VM?

    Regards,
    -Chris

    Hi Chris,
    I'm sure the port is open, in fact the uwf is completely stopped. k8smaster is resolved to the expected IP address.Port 6443 is not used by any other application, if that's what you mean.

    Host: Windows 10 Pro, amd ryzen 7 2700x 8x cores, 32 GB RAM. Windows on SSD, VMs on HDD.
    Firewall is active on my host for private and public networks.
    Guest full release: Operating System: Ubuntu 18.04.5 LTS,Kernel: Linux 4.15.0-135-generic,Architecture: x86-64.
    Guest firewalls are disabled.
    Guest disk space is dynamic and for both VM is:
    udev 2.9G 0 2.9G 0% /dev
    tmpfs 597M 1.1M 595M 1% /run
    /dev/mapper/ubuntu--vg-ubuntu--lv 19G 7.2G 11G 41% /
    tmpfs 3.0G 0 3.0G 0% /dev/shm
    tmpfs 5.0M 0 5.0M 0% /run/lock
    tmpfs 3.0G 0 3.0G 0% /sys/fs/cgroup
    /dev/sda2 976M 78M 832M 9% /boot
    tmpfs 597M 0 597M 0% /run/user/1000

    Network type for VM is Bridged adapter

    regards,
    Antonio

  • @asmoljo said:

    @chrispokorni said:
    Hi @asmoljo,

    Thank you for providing such detailed outputs. It seems that the first error is at a timeout, when either port 6443 is not accessible or k8smaster cannot be resolved. Can you verify that port 6443 of the VM is open, and what application is listening on it? Also, can you verify that k8smaster can be resolved to the expected IP address?

    What is your host OS and what is the size of your host machine? Are there any firewalls active on your host?
    What is your guest OS full release? Are there any firewalls active on your guest?
    Aside from the CPU and memory mentioned above, how much disk space did you configure for each VM? Is it dynamic?
    What type of networking did you configure on the virtualbox VM?

    Regards,
    -Chris

    Hi Chris,
    I'm sure the port is open, in fact the uwf is completely stopped. k8smaster is resolved to the expected IP address.Port 6443 is not used by any other application, if that's what you mean.

    Host: Windows 10 Pro, amd ryzen 7 2700x 8x cores, 32 GB RAM. Windows on SSD, VMs on HDD.
    Firewall is active on my host for private and public networks.
    Guest full release: Operating System: Ubuntu 18.04.5 LTS,Kernel: Linux 4.15.0-135-generic,Architecture: x86-64.
    Guest firewalls are disabled.
    Guest disk space is dynamic and for both VM is:
    udev 2.9G 0 2.9G 0% /dev
    tmpfs 597M 1.1M 595M 1% /run
    /dev/mapper/ubuntu--vg-ubuntu--lv 19G 7.2G 11G 41% /
    tmpfs 3.0G 0 3.0G 0% /dev/shm
    tmpfs 5.0M 0 5.0M 0% /run/lock
    tmpfs 3.0G 0 3.0G 0% /sys/fs/cgroup
    /dev/sda2 976M 78M 832M 9% /boot
    tmpfs 597M 0 597M 0% /run/user/1000

    Network type for VM is Bridged adapter

    regards,
    Antonio

    Hi,
    now i tried with the windows firewall turned off and everything went smoothly. Very strange. I recently tried to install k8s with kubespray and rke and everything went smoothly. I created an inbound rule for the Virtual Box and continuing with the course. Thanks for your help.

  • Hi,
    now I'm stuck on the next step.please look below.

    asmoljo@lfc1master1:~$ kubectl apply -f calico.yaml
    The connection to the server k8smaster:6443 was refused - did you specify the right host or port?

    I noticed that something is wrong with the master plane pods.

    asmoljo@lfc1master1:~$ sudo docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    daf87d762c09 ce0df89806bb "kube-apiserver --ad…" 4 minutes ago Exited (255) 4 minutes ago k8s_kube-apiserver_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_42
    a310a9cffe85 538929063f23 "kube-controller-man…" About an hour ago Up About an hour k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_11
    7bd01ec286af 538929063f23 "kube-controller-man…" About an hour ago Exited (255) About an hour ago k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_10
    5d2115572946 49eb8a235d05 "kube-scheduler --au…" 3 hours ago Up 3 hours k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_1
    a79e30ebec7c 49eb8a235d05 "kube-scheduler --au…" 3 hours ago Exited (255) 3 hours ago k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
    f174fb6df5c1 0369cf4303ff "etcd --advertise-cl…" 3 hours ago Up 3 hours k8s_etcd_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
    26b49f7b5f1f k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_0
    afcc9028738b k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
    342a2102c126 k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
    4fedc2b653af k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_0

    asmoljo@lfc1master1:~$ sudo docker version
    Client:
    Version: 19.03.6
    API version: 1.40
    Go version: go1.12.17
    Git commit: 369ce74a3c
    Built: Fri Dec 18 12:21:44 2020
    OS/Arch: linux/amd64
    Experimental: false

    Server:
    Engine:
    Version: 19.03.6
    API version: 1.40 (minimum version 1.12)
    Go version: go1.12.17
    Git commit: 369ce74a3c
    Built: Thu Dec 10 13:23:49 2020
    OS/Arch: linux/amd64
    Experimental: false
    containerd:
    Version: 1.3.3-0ubuntu1~18.04.4
    GitCommit:
    runc:
    Version: spec: 1.0.1-dev
    GitCommit:
    docker-init:
    Version: 0.18.0
    GitCommit:

  • Hi @asmoljo,

    Going back to my prior comment: "resources".

    You assigned 4 CPUs to each VM, you have 2 VMs, that sums up to 8 CPUs assigned to the VMs alone. Your host seems to have 8 cores altogether, which seem to be over committed, considering that your Windows host OS needs CPU to run, and the hypervisor needs resources as well.

    Dynamic VM disk management may also be an issue with VBox. Your cluster does not know that additional disk space can and will be allocated. Instead, it only sees the current usage as a fraction of currently allocated disk space - high. This is where your cluster panics.

    So the over committed CPU together with dynamic disk management - a recipe for cluster failure.

    I would recommend revisiting the VM sizing guide in the Overview section of Lab 3.1, and fit that into the physical resources available on your host machine, ensuring all running components have a fair amount of resources to operate.

    Regards,
    -Chris

  • Hi Chris,

    I recreated VMs so they have fixed size disks and 2 CPUs, but I still have the same problem with master plane containers.
    It seems like the containers are constantly restarting.

    asmoljo@lfc1master1:~$ sudo docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    a35b0a749b14 538929063f23 "kube-controller-man…" 9 seconds ago Up 9 seconds k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_4
    0177c47e5607 ce0df89806bb "kube-apiserver --ad…" About a minute ago Exited (255) 39 seconds ago k8s_kube-apiserver_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_6
    b987dfbe5393 49eb8a235d05 "kube-scheduler --au…" About a minute ago Up About a minute k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_1
    7c6cc53ad60d 538929063f23 "kube-controller-man…" About a minute ago Exited (255) 29 seconds ago k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_3
    b58c6f7b060a 49eb8a235d05 "kube-scheduler --au…" 3 minutes ago Exited (255) About a minute ago k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
    3f10dc09d4c7 0369cf4303ff "etcd --advertise-cl…" 3 minutes ago Up 3 minutes k8s_etcd_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
    20d9211b0e6a k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
    de77e4ef7180 k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_0
    aa3ae9c1f522 k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_0
    73f054ec8d53 k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0

    six minutes later
    asmoljo@lfc1master1:~$ sudo docker ps -a
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    1cd8da1e0dab ce0df89806bb "kube-apiserver --ad…" 30 seconds ago Exited (255) 5 seconds ago k8s_kube-apiserver_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_9
    1b0341052fd4 538929063f23 "kube-controller-man…" About a minute ago Exited (255) 4 seconds ago k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_6
    b987dfbe5393 49eb8a235d05 "kube-scheduler --au…" 6 minutes ago Up 6 minutes k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_1
    b58c6f7b060a 49eb8a235d05 "kube-scheduler --au…" 9 minutes ago Exited (255) 6 minutes ago k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
    3f10dc09d4c7 0369cf4303ff "etcd --advertise-cl…" 9 minutes ago Up 9 minutes k8s_etcd_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
    20d9211b0e6a k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
    de77e4ef7180 k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_0
    aa3ae9c1f522 k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_0
    73f054ec8d53 k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
    asmoljo@lfc1master1:~$

  • Hi @asmoljo,

    What are the docker logs showing for the control plane containers?

    What are the outputs of the top and df -h commands run on your control plane node?

    Regards,
    -Chris

Categories

Upcoming Training