Welcome to the Linux Foundation Forum!

kubeadm join ISSUE: [discovery] Failed to request cluster info, will try again:

dmccuk
dmccuk Posts: 3
edited October 2019 in LFS258 Class Forum

Hi,

I've been trying to add a worker node to the cluster. I've followed the doc but I'm hitting this issue and I can't find a way past it. I've obviously missed something. Here is the command and error:

PART1

root@ip-172-31-18-206:~# kubeadm join --token od1wg1.a9wd79hstxz3ll4z 172.31.19.37:6443  --discovery-token-ca-cert-hash sha256:4aed0a78c329495d91e031a336668ccaf07528c84b7120f230f2f
161a98e7693 --v=2
I1018 15:46:36.761858   25485 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName
I1018 15:46:36.761930   25485 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock
[preflight] Running pre-flight checks
I1018 15:46:36.762004   25485 preflight.go:90] [preflight] Running general checks
I1018 15:46:36.762037   25485 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests
I1018 15:46:36.762083   25485 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf
I1018 15:46:36.762124   25485 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I1018 15:46:36.762140   25485 checks.go:105] validating the container runtime
I1018 15:46:36.806159   25485 checks.go:131] validating if the service is enabled and active
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
I1018 15:46:36.858635   25485 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I1018 15:46:36.858693   25485 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward
I1018 15:46:36.858729   25485 checks.go:653] validating whether swap is enabled or not
I1018 15:46:36.858762   25485 checks.go:382] validating the presence of executable ip
I1018 15:46:36.858793   25485 checks.go:382] validating the presence of executable iptables
I1018 15:46:36.858813   25485 checks.go:382] validating the presence of executable mount
I1018 15:46:36.858834   25485 checks.go:382] validating the presence of executable nsenter
I1018 15:46:36.858851   25485 checks.go:382] validating the presence of executable ebtables
I1018 15:46:36.858870   25485 checks.go:382] validating the presence of executable ethtool
I1018 15:46:36.858891   25485 checks.go:382] validating the presence of executable socat
I1018 15:46:36.858910   25485 checks.go:382] validating the presence of executable tc
I1018 15:46:36.858927   25485 checks.go:382] validating the presence of executable touch
I1018 15:46:36.858950   25485 checks.go:524] running all checks
I1018 15:46:36.873553   25485 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I1018 15:46:36.882411   25485 checks.go:622] validating kubelet version
I1018 15:46:36.937337   25485 checks.go:131] validating if the service is enabled and active
I1018 15:46:36.943627   25485 checks.go:209] validating availability of port 10250
I1018 15:46:36.943778   25485 checks.go:292] validating the existence of file /etc/kubernetes/pki/ca.crt
I1018 15:46:36.943797   25485 checks.go:439] validating if the connectivity type is via proxy or direct
I1018 15:46:36.943826   25485 join.go:427] [preflight] Discovering cluster-info
I1018 15:46:36.944224   25485 token.go:200] [discovery] Trying to connect to API Server "172.31.19.37:6443"
I1018 15:46:36.944877   25485 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://172.31.19.37:6443"
I1018 15:47:06.945803   25485 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.31.19.37:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.31.19.37:6443: i/o timeout]
I1018 15:47:41.946445   25485 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://172.31.19.37:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 172.31.19.37:6443: i/o timeout]
^C
root@ip-172-31-18-206:~#

I'm able to telnet to port 22 from the worker to the master:

root@ip-172-31-18-206:~# telnet 172.31.19.37 22
Trying 172.31.19.37...
Connected to 172.31.19.37.
Escape character is '^]'.
SSH-2.0-OpenSSH_7.2p2 Ubuntu-4ubuntu2.8
^]
telnet> quit
Connection closed.
root@ip-172-31-18-206:~#
root@ip-172-31-18-206:~# telnet 172.31.19.37 6443
Trying 172.31.19.37...

PART2 with more details follows once it gets approved.

Answers

  • chrispokorni
    chrispokorni Posts: 2,273

    Hi @dmccuk,

    Similar discussions have been posted recently in the forum, where a second node fails to join the cluster.
    From your output, the failure is a timeout when accessing port 6443 on the master node.
    Port 22 is irrelevant in this scenario since Kubernetes uses lots of different individual port numbers and port ranges - and 6443 is one of them.

    Read carefully the special instructions at the beginning of Lab exercise 3.1. These instructions are critical in setting up your infrastructure's networking (firewall rules) for inter-node communication.

    Regards,
    -Chris

  • dmccuk
    dmccuk Posts: 3

    Hi Chris,

    Thanks for your message. I worked out what I hadn't done. I'll write it here so others can benefit:

    1) In AWS, create a new security group and open up all the ports.
    2) Select one of your Kubernetes instances --> actions --> networking
    3) Tick the new kubernetes group, adding it to your instance.
    4) Repeat for all the other kubernetes instances.
    5) Retry the failing command.

    I hope that helps.

    Dennis

  • PART2:

    The FW is off on both the worker and the master:

    WORKER:

    root@ip-172-31-18-206:~# sudo ufw status
    Status: inactive
    root@ip-172-31-18-206:~#
    root@ip-172-31-18-206:~# service ufw status
    ● ufw.service - Uncomplicated firewall
       Loaded: loaded (/lib/systemd/system/ufw.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Fri 2019-10-18 16:00:34 UTC; 1min 13s ago
      Process: 26404 ExecStop=/lib/ufw/ufw-init stop (code=exited, status=0/SUCCESS)
     Main PID: 396 (code=exited, status=0/SUCCESS)
    
    Oct 18 14:51:35 ubuntu systemd[1]: Started Uncomplicated firewall.
    Oct 18 16:00:34 ip-172-31-18-206 systemd[1]: Stopping Uncomplicated firewall...
    Oct 18 16:00:34 ip-172-31-18-206 ufw-init[26404]: Skip stopping firewall: ufw (not enabled)
    Oct 18 16:00:34 ip-172-31-18-206 systemd[1]: Stopped Uncomplicated firewall.
    Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
    root@ip-172-31-18-206:~#                                                                              
    

    MASTER:

    ubuntu@ip-172-31-19-37:~$ sudo ufw status
    Status: inactive
    ubuntu@ip-172-31-19-37:~$ sudo service ufw status
    ● ufw.service - Uncomplicated firewall
       Loaded: loaded (/lib/systemd/system/ufw.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Fri 2019-10-18 16:02:49 UTC; 14s ago
      Process: 6637 ExecStop=/lib/ufw/ufw-init stop (code=exited, status=0/SUCCESS)
     Main PID: 379 (code=exited, status=0/SUCCESS)
    
    Oct 18 14:51:23 ubuntu systemd[1]: Started Uncomplicated firewall.
    Oct 18 16:02:49 ip-172-31-19-37 systemd[1]: Stopping Uncomplicated firewall...
    Oct 18 16:02:49 ip-172-31-19-37 ufw-init[6637]: Skip stopping firewall: ufw (not enabled)
    Oct 18 16:02:49 ip-172-31-19-37 systemd[1]: Stopped Uncomplicated firewall.
    Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.
    ubuntu@ip-172-31-19-37:~$                                                                            
    

    Here is the master namespaces:

    ubuntu@ip-172-31-19-37:~$ kubectl get pods --all-namespaces
    NAMESPACE     NAME                                      READY   STATUS    RESTARTS   AGE
    kube-system   calico-node-9zmmr                         2/2     Running   0          63m
    kube-system   coredns-fb8b8dccf-mbg2w                   1/1     Running   0          65m
    kube-system   coredns-fb8b8dccf-nbm88                   1/1     Running   0          65m
    kube-system   etcd-ip-172-31-19-37                      1/1     Running   0          64m
    kube-system   kube-apiserver-ip-172-31-19-37            1/1     Running   0          64m
    kube-system   kube-controller-manager-ip-172-31-19-37   1/1     Running   0          64m
    kube-system   kube-proxy-tztvb                          1/1     Running   0          65m
    kube-system   kube-scheduler-ip-172-31-19-37            1/1     Running   0          64m
    

    I've been through this link and the steps I'm taking are identical:
    https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network

    The TOKEN and openssl key I'm using in my join command:

    ubuntu@ip-172-31-19-37:~$ kubeadm token list
    TOKEN                     TTL       EXPIRES                USAGES                   DESCRIPTION                                                EXTRA GROUPS
    od1wg1.a9wd79hstxz3ll4z   22h       2019-10-19T14:58:11Z   authentication,signing   The default bootstrap token generated by 'kubeadm init'.   system:bootstrappers:kubeadm:default-node-token
    ubuntu@ip-172-31-19-37:~$
    ubuntu@ip-172-31-19-37:~$ openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl
    dgst -sha256 -hex | sed 's/^.* //'
    4aed0a78c329495d91e031a336668ccaf07528c84b7120f230f2f161a98e7693
    ubuntu@ip-172-31-19-37:~$
    

    NETSTAT from the master:

    ubuntu@ip-172-31-19-37:~$ netstat -tnlp
    (Not all processes could be identified, non-owned process info
     will not be shown, you would have to be root to see it all.)
    Active Internet connections (only servers)
    Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
    tcp        0      0 127.0.0.1:9099          0.0.0.0:*               LISTEN      -
    tcp        0      0 172.31.19.37:2379       0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:2379          0.0.0.0:*               LISTEN      -
    tcp        0      0 172.31.19.37:2380       0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:10257         0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:43122         0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:10259         0.0.0.0:*               LISTEN      -
    tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:42623         0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:10248         0.0.0.0:*               LISTEN      -
    tcp        0      0 127.0.0.1:10249         0.0.0.0:*               LISTEN      -
    tcp6       0      0 :::10250                :::*                    LISTEN      -
    tcp6       0      0 :::10251                :::*                    LISTEN      -
    tcp6       0      0 :::6443                 :::*                    LISTEN      -
    tcp6       0      0 :::10252                :::*                    LISTEN      -
    tcp6       0      0 :::10256                :::*                    LISTEN      -
    tcp6       0      0 :::22                   :::*                    LISTEN      -
    

    I'm stuck! If anyone can help I would really appreciate it!

Categories

Upcoming Training