Unable to setup cluster on Lab 3.1
As I'm following the instructions in the book, the setup for kubeadm init --config=kubeadm-config.yaml --upload-certs --v=5 | tee kubeadm-init.out
fails with API Server not being available.
On running systemctl status kubelet
the output is
and on running journalctl -xeu kubelet
the output is
I've tried debugging by using other forums online but to no use. If I try to figure out the logs for the containers, the output is
Please help!
Comments
-
Hello,
I see a lot of errors saying "pranaynada2c.mylabserver.com not found". If you edited /etc/hosts and the kubeadm-config.yaml files properly, and only used k8smaster as the server name, kubelet shouldn't be asking for the actual host name.
- I would first make sure you are not changing the hostname, just adding an alias to your primary IP to /etc/hosts.
- Double check the name and other values in kubeadm-config.yaml
- Ensure you are using a fresh instance, not one that failed,edited and tried again. Not everything is cleaned out by kubeadm reset
- Test that the alias works using ping prior to running kubeadm init on the fresh system. Double check you are using your primary interface.
Regards,
0 -
I have similar kind of issue.
root@master:~/LFS258/SOLUTIONS/s_03# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
W0502 18:10:58.821199 3699 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[init] Using Kubernetes version: v1.18.1
[preflight] Running pre-flight checks
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local k8smaster] and IPs [10.96.0.1 10.0.0.2]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [master localhost] and IPs [10.0.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [10.0.0.2 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
W0502 18:11:08.853546 3699 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[control-plane] Creating static Pod manifest for "kube-scheduler"
W0502 18:11:08.856794 3699 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.Unfortunately, an error has occurred: error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
timed out waiting for the conditionThis error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all Kubernetes containers running in docker: - 'docker ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'docker logs CONTAINERID'
0 -
other information from enviroment. I am using GSK node.
root@master:~/LFS258/SOLUTIONS/s_03# cat kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: 1.18.1
controlPlaneEndpoint: "k8smaster:6443"
networking:
podSubnet: 192.168.0.0/16
root@master:~/LFS258/SOLUTIONS/s_03# nslookup k8master
Server: 127.0.0.53
Address: 127.0.0.53#53Non-authoritative answer:
Name: k8master
Address: 10.0.0.2root@master:~/LFS258/SOLUTIONS/s_03# telnet k8master 6443
Trying 10.0.0.2...
Connected to k8master.
Escape character is '^]'.
^CConnection closed by foreign host.
root@master:~/LFS258/SOLUTIONS/s_03# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sat 2020-05-02 18:24:59 UTC; 13s ago
Docs: https://kubernetes.io/docs/home/
Main PID: 19838 (kubelet)
Tasks: 16 (limit: 4915)
CGroup: /system.slice/kubelet.service
└─19838 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/configMay 02 18:25:11 master kubelet[19838]: E0502 18:25:11.523688 19838 kubelet.go:2267] node "master" not found
May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.623972 19838 kubelet.go:2267] node "master" not found
May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.724250 19838 kubelet.go:2267] node "master" not found
May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.824483 19838 kubelet.go:2267] node "master" not found
May 02 18:25:11 master kubelet[19838]: E0502 18:25:11.924786 19838 kubelet.go:2267] node "master" not found
May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.025039 19838 kubelet.go:2267] node "master" not found
May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.125323 19838 kubelet.go:2267] node "master" not found
May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.225569 19838 kubelet.go:2267] node "master" not found
May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.306152 19838 controller.go:136] failed to ensure node lease exists, will retry in 7s, error: Get https://k8smaster:6
May 02 18:25:12 master kubelet[19838]: E0502 18:25:12.325826 19838 kubelet.go:2267] node "master" not found0 -
root@master:~/LFS258/SOLUTIONS/s_03# docker ps -a | grep kube | grep -v pause
1958f65e62ab d1ccdd18e6ed "kube-controller-man…" 14 minutes ago Up 14 minutes k8s_kube-controller-manager_kube-controller-manager-master_kube-system_a2e7dbae641996802ce46175f4f5c5dc_0
08f86fdddaf7 6c9320041a7b "kube-scheduler --au…" 14 minutes ago Up 14 minutes k8s_kube-scheduler_kube-scheduler-master_kube-system_363a5bee1d59c51a98e345162db75755_0
7b9ae64e82b3 a595af0107f9 "kube-apiserver --ad…" 14 minutes ago Up 14 minutes k8s_kube-apiserver_kube-apiserver-master_kube-system_e26867be8b93ee68c10a8808e67e6488_0
6654e9244a6f 303ce5db0e90 "etcd --advertise-cl…" 14 minutes ago Up 14 minutes k8s_etcd_etcd-master_kube-system_66dbf808af4751d2cf0d4dad30261e40_0
root@master:~/LFS258/SOLUTIONS/s_03# journalctl -xeu kubelet
May 02 18:25:44 master kubelet[20628]: I0502 18:25:44.047630 20628 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
May 02 18:25:44 master kubelet[20628]: I0502 18:25:44.071348 20628 kubelet_node_status.go:70] Attempting to register node master
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.095585 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.194055 20628 controller.go:136] failed to ensure node lease exists, will retry in 3.2s, error: Get https://k8smaster
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.195805 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.247486 20628 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiti
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.296035 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.396259 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.447478 20628 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:526: Failed to list *v1.Node: Get https://k8
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.496920 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.597158 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.647446 20628 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://k8smaster:644
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.697391 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.797591 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.847352 20628 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1.CSIDriver: Get https://
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.897815 20628 kubelet.go:2267] node "master" not found
May 02 18:25:44 master kubelet[20628]: E0502 18:25:44.998056 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.098263 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.198443 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.298636 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.398840 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.447164 20628 reflector.go:178] k8s.io/client-go/informers/factory.go:135: Failed to list *v1beta1.RuntimeClass: Get
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.499052 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.599301 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.699542 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.799738 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.899939 20628 kubelet.go:2267] node "master" not found
May 02 18:25:45 master kubelet[20628]: E0502 18:25:45.915224 20628 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiti
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.000127 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.100367 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.200623 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: I0502 18:25:46.247711 20628 kubelet_node_status.go:294] Setting node annotation to enable volume controller attach/detach
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.250685 20628 event.go:269] Unable to write event: 'Post https://k8smaster:6443/api/v1/namespaces/default/events: dia
May 02 18:25:46 master kubelet[20628]: I0502 18:25:46.272767 20628 kubelet_node_status.go:70] Attempting to register node master
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.274607 20628 kubelet_node_status.go:92] Unable to register node "master" with API server: Post https://k8smaster:644
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.300868 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.401202 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.501406 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.601625 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.701864 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.802124 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.902328 20628 kubelet.go:2267] node "master" not found
May 02 18:25:46 master kubelet[20628]: E0502 18:25:46.914745 20628 csi_plugin.go:271] Failed to initialize CSINodeInfo: error updating CSINode annotation: timed out waiti
May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.002541 20628 kubelet.go:2267] node "master" not found
May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.102852 20628 kubelet.go:2267] node "master" not found
May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.152316 20628 reflector.go:178] k8s.io/kubernetes/pkg/kubelet/kubelet.go:517: Failed to list *v1.Service: Get https:/
May 02 18:25:47 master kubelet[20628]: E0502 18:25:47.203059 20628 kubelet.go:2267] node "master" not found0 -
I have tried this multiple times but its failing on same point. step 14 in LAB3.1. I am using google cloud. Let me know if any known solution out there or any steps missing in LAB 3.1
0 -
Hello,
I note that most errors say "node "master" not found". If you have edited the kubeadm-config.yaml file it should be requesting the alias k8smaster instead. From the prompt it seems that the node name is "master". Is this the original node name, or did you change the prompt?
Also, you mention you are using GSK environment. I am not familiar with this environment. We suggest GCE. Also, how many networkinterfaces do you have attached to your instances? Try it with a single interface and see if the node can be found.
Regards,
0 -
Hello again,
I just ran the scripts using GCE nodes, and a single interface. Worked as expected and as found in the book. This looks to be something with the network configuration you are using for you nodes. This is where the output of my kubeadm init and yours diverge, near the end, the last common line starts with [wait-control-plane]:....
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 18.003142 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.18" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
686d3e6e9eed5baa0229498667e145ef504023324f6a293385514176b13a8b71
[mark-control-plane] Marking the node test-hkbb as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node test-hkbb as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: r4jk9i.er3823rtldnlcasf
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxyYour Kubernetes control-plane has initialized successfully!
....As such the communication is not finding master where expected. Was your hostname or IP something else when the node was installed and you changed it? Do you have multiple interfaces and the IP you used is not the primary interface? Do you have a typo in your /etc/hosts file. I notice you show the nslookup output, but not the file. Did you edit /etc/hosts?
This appears to be an issue with the network configuration and/or DNS environment, not Kubernetes, not the exercise.
Regards,
0 -
Hi @mlgajjar,
You mentioned a GSK environment, which I am not sure what it is. If I assume correctly, and it is a managed Kubernetes service, trying to install Kubernetes again, as instructed in lab exercise 3, then you would end up with an already preconfigured Kubernetes cluster with an additional cluster installed in top of it. This is similar to tossing a second engine in the backseat of a running vehicle and expect that 2nd engine to work.
Regards,
-Chris0 -
Hi,
I have encountered a similar issue at lab 3.1.14. -kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
This results in an error log like the following:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused. [kubelet-check] It seems like the kubelet isn't running or healthy. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
If I try to manually run
curl -sSL http://localhost:10248/healthz
in the terminal, the command runs without error.systemctl status kubelet
shows it as active but with a consistent error messsage ofkubelet.go:2267] node "ip-172-31-60-156" not found
I have configured my
/etc/hosts
with the following lines172.31.63.255 k8smaster 127.0.0.1 localhost
This IP is taken from the output of a
ip addr show
command, which lists anens5
interface as opposed to theens4
used in the lab example2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000 link/ether 16:b5:4d:43:c0:c1 brd ff:ff:ff:ff:ff:ff inet 172.31.60.156/20 brd 172.31.63.255 scope global dynamic ens5 valid_lft 2428sec preferred_lft 2428sec inet6 fe80::14b5:4dff:fe43:c0c1/64 scope link valid_lft forever preferred_lft forever
I notice that the IP range referenced here appears to be referenced in the kubelet error logs.
I am using an AWS EC2 instance which is running Ubuntu Server 18.04 LTS (HVM), SSD Volume Type - ami-00ddb0e5626798373 (64-bit x86).Appreciate any advice in troubleshooting this further.
Thanks0 -
Hello,
Is the kubeadm-config.yaml file in your current directory? Did you edit /etc/hosts and add your alias properly?
Perhaps you can share your kubeadm-config.yaml as code or an image so we can notice indentation issues.
Regards,
0 -
Thanks for the reply Tim, here is what I'm working with:
0 -
Hello,
It looks like you edited the IP range to 172.31 range, leave it as it was, so that it matches what Calico uses, and is in the example YAML file from the course tarball. If you are using 192.168 network already, then edit both the calico and the kubeadm-config.yaml so that they match each other, but do not overlap an IP range in use elsewhere in your environment.
Regards,
0 -
ah yes, I actually changed it in both locations previously while troubleshooting.
I was encountering the same errors previously with the default192.168.0.0/16
range that comes withwget https://docs.projectcalico.org/manifests/calico.yaml
.Two discrepancies that I noticed with the
calico.yaml
file in the example are that the following lines are commented out by default with different indentation to that shown in the pdf for lab 3.1.What I had tested previously was uncommenting the relevant section with the following indentation
I have tested again just now to be sure. Here I am testing again with the
192.168.0.0/16
rangeI reset kubeadm with
kubeadm reset
and re-rankubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
to validate. I again got the same errorand the same logs
0 -
When you say edit, there is no need to edit the calico.yaml file and uncomment anything. You'd need to edit if you were already using 192.168 elsewhere. If you follow the lab exactly as written, what happens?
0 -
Hi Tim,
I reset kubeadm, downloaded a fresh version of thecalico.yaml
file and tried again without editing the file, but got the same results as previously. All other files are the same as prior to your last message and I am running docker.io
.0 -
Your errors list the hostname instead if the alias, as it should. You should have an alias that sets k8smaster to your IP of 172.31.60.156. If you created the alias properly in /etc/hosts then are you sure your kubeadm-config.yaml file is in your current working directory?
It appears the file is not being read. When you run kubeadm init and pass the filename the lab syntax is for the file to be in the current directory. When you type ls, do you see it and is it readable by student, or your non-root user?
Regards,
0 -
File permissions had been
-rw-r--r--
but I brute forced it with achmod 777
for troubleshooting purposes.
The output below shows thekubeadm-config.yaml
file in my working directory and with fullrwx
permissions for all users.In spite of that, I still had the same kubelet error of
node "ip-172-31-60-156" not found
was thrown.One interesting difference is that on this occasion, the
kubeadm init
command failed but did not include the error that was thrown on previous attempts (though it did still fail) -[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
0 -
For completeness' sake I ran through a fresh install. Again Ubuntu 18 on AWS EC2 as before.
ip addr show
gives the same IP (172.31.63.255
)I pulled the calico.yaml file and did not edit it.
Created/etc/hosts
file and, in current direcory createdkubeadm-config.yaml
.Ran
kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
and first attempt failed with errorThe HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
Ran
chmod 777 kubeadm-config.yaml
andkubeadm reset
before again runningkubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
. This time it failed again with the same error as the immediately previous attempt.Hopefully this fresh attempt helps in the troubleshooting process. Please let me know your thoughts and happy to troubleshoot further.
0 -
Hi @oisin,
From your detailed outputs it seems that in both scenarios you misconfiguring the
k8smaster
alias in the/etc/hosts
file. The first step in resolving your issue is to pay close attention to the solution suggested by @serewicz above. Since you are on AWS, thanks to their default EC2 hostname naming convention, the IP of your master node can be extracted from the master node's hostname itself - and that is the IP address that needs to be added to the/etc/hosts
file.Also, make sure that your AWS VPC has a firewall rule (SG) to allow all ingress traffic - from all sources, to all destination ports, all protocols.
Regards,
-Chris0 -
Thanks Chris and Tim for the troubleshooting help. Much appreciated!
I have gotten it working and have realized my oversight. I will outline it here in case it helps any future students.
In the documentation for this lab, the output ofip addr show
includesinet 10.128.0.3/32 brd 10.128.0.3 scope global ens4
The
/etc/hosts
file is then configured with a line reading10.128.0.3 k8smaster
My presumption was that the
brd
address was being used but it was theinet
address.
In my previous messages, it was failing because of this.
In the case of my first message on this thread, I listed the output ofip addr show
as includinginet 172.31.60.156/20 brd 172.31.63.255 scope global dynamic ens5
If I had added
172.31.60.156 k8smaster
as opposed to172.31.63.255 k8smaster
it would have worked.
Chris's suggestion to get the IP directly from the EC2 hostname is also a good solution.Presuming that it is always the
inet
address that we should be using, would it be possible to update the documentation to avoid confusion and specifically advise people to use theinet
address? I presume that many would instinctively know which address to use, but it could help to limit the scope of ambiguity.Thanks again for the help.
0 -
It seems the problem has been solved. I just want to offer another solution to 'kubeadm init' problem when you see:
And when I did a 'journalctl -u kubelet.service' I saw:
Jan 04 20:31:57 master kubelet[23220]: E0104 20:31:57.126622 23220 kubelet.go:2267] node "master" not found
Solution:
In my case I just had a typo in the /etc/hosts.10.2.0.4 k8master
instead of10.2.0.4 k8smaster
0 -
Hello everyone,
I have the same problem. My environment is on the Virtual Box. I installed two Ubuntu 18.04 servers with 4xCPU, 6GB RAM. I followed all the instructions but when I run 'kubeadm --v = 5 init --config = kubeadm-config.yaml --upload-certs | tee kubeadm-init.out 'I get the following error(see below). I checked and tried everything you suggested to the other students but without success. Below you can find screenshots of files, services and journalctl. Can you please help ?0 -
Hi @asmoljo,
Thank you for providing such detailed outputs. It seems that the first error is at a timeout, when either port 6443 is not accessible or
k8smaster
cannot be resolved. Can you verify that port 6443 of the VM is open, and what application is listening on it? Also, can you verify thatk8smaster
can be resolved to the expected IP address?What is your host OS and what is the size of your host machine? Are there any firewalls active on your host?
What is your guest OS full release? Are there any firewalls active on your guest?
Aside from the CPU and memory mentioned above, how much disk space did you configure for each VM? Is it dynamic?
What type of networking did you configure on the virtualbox VM?Regards,
-Chris0 -
@chrispokorni said:
Hi @asmoljo,Thank you for providing such detailed outputs. It seems that the first error is at a timeout, when either port 6443 is not accessible or
k8smaster
cannot be resolved. Can you verify that port 6443 of the VM is open, and what application is listening on it? Also, can you verify thatk8smaster
can be resolved to the expected IP address?What is your host OS and what is the size of your host machine? Are there any firewalls active on your host?
What is your guest OS full release? Are there any firewalls active on your guest?
Aside from the CPU and memory mentioned above, how much disk space did you configure for each VM? Is it dynamic?
What type of networking did you configure on the virtualbox VM?Regards,
-ChrisHi Chris,
I'm sure the port is open, in fact the uwf is completely stopped. k8smaster is resolved to the expected IP address.Port 6443 is not used by any other application, if that's what you mean.
Host: Windows 10 Pro, amd ryzen 7 2700x 8x cores, 32 GB RAM. Windows on SSD, VMs on HDD.
Firewall is active on my host for private and public networks.
Guest full release: Operating System: Ubuntu 18.04.5 LTS,Kernel: Linux 4.15.0-135-generic,Architecture: x86-64.
Guest firewalls are disabled.
Guest disk space is dynamic and for both VM is:
udev 2.9G 0 2.9G 0% /dev
tmpfs 597M 1.1M 595M 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 19G 7.2G 11G 41% /
tmpfs 3.0G 0 3.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.0G 0 3.0G 0% /sys/fs/cgroup
/dev/sda2 976M 78M 832M 9% /boot
tmpfs 597M 0 597M 0% /run/user/1000Network type for VM is Bridged adapter
regards,
Antonio0 -
@asmoljo said:
@chrispokorni said:
Hi @asmoljo,Thank you for providing such detailed outputs. It seems that the first error is at a timeout, when either port 6443 is not accessible or
k8smaster
cannot be resolved. Can you verify that port 6443 of the VM is open, and what application is listening on it? Also, can you verify thatk8smaster
can be resolved to the expected IP address?What is your host OS and what is the size of your host machine? Are there any firewalls active on your host?
What is your guest OS full release? Are there any firewalls active on your guest?
Aside from the CPU and memory mentioned above, how much disk space did you configure for each VM? Is it dynamic?
What type of networking did you configure on the virtualbox VM?Regards,
-ChrisHi Chris,
I'm sure the port is open, in fact the uwf is completely stopped. k8smaster is resolved to the expected IP address.Port 6443 is not used by any other application, if that's what you mean.
Host: Windows 10 Pro, amd ryzen 7 2700x 8x cores, 32 GB RAM. Windows on SSD, VMs on HDD.
Firewall is active on my host for private and public networks.
Guest full release: Operating System: Ubuntu 18.04.5 LTS,Kernel: Linux 4.15.0-135-generic,Architecture: x86-64.
Guest firewalls are disabled.
Guest disk space is dynamic and for both VM is:
udev 2.9G 0 2.9G 0% /dev
tmpfs 597M 1.1M 595M 1% /run
/dev/mapper/ubuntu--vg-ubuntu--lv 19G 7.2G 11G 41% /
tmpfs 3.0G 0 3.0G 0% /dev/shm
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.0G 0 3.0G 0% /sys/fs/cgroup
/dev/sda2 976M 78M 832M 9% /boot
tmpfs 597M 0 597M 0% /run/user/1000Network type for VM is Bridged adapter
regards,
AntonioHi,
now i tried with the windows firewall turned off and everything went smoothly. Very strange. I recently tried to install k8s with kubespray and rke and everything went smoothly. I created an inbound rule for the Virtual Box and continuing with the course. Thanks for your help.1 -
Hi,
now I'm stuck on the next step.please look below.asmoljo@lfc1master1:~$ kubectl apply -f calico.yaml
The connection to the server k8smaster:6443 was refused - did you specify the right host or port?I noticed that something is wrong with the master plane pods.
asmoljo@lfc1master1:~$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
daf87d762c09 ce0df89806bb "kube-apiserver --ad…" 4 minutes ago Exited (255) 4 minutes ago k8s_kube-apiserver_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_42
a310a9cffe85 538929063f23 "kube-controller-man…" About an hour ago Up About an hour k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_11
7bd01ec286af 538929063f23 "kube-controller-man…" About an hour ago Exited (255) About an hour ago k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_10
5d2115572946 49eb8a235d05 "kube-scheduler --au…" 3 hours ago Up 3 hours k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_1
a79e30ebec7c 49eb8a235d05 "kube-scheduler --au…" 3 hours ago Exited (255) 3 hours ago k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
f174fb6df5c1 0369cf4303ff "etcd --advertise-cl…" 3 hours ago Up 3 hours k8s_etcd_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
26b49f7b5f1f k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_0
afcc9028738b k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
342a2102c126 k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
4fedc2b653af k8s.gcr.io/pause:3.2 "/pause" 3 hours ago Up 3 hours k8s_POD_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_0asmoljo@lfc1master1:~$ sudo docker version
Client:
Version: 19.03.6
API version: 1.40
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Fri Dec 18 12:21:44 2020
OS/Arch: linux/amd64
Experimental: falseServer:
Engine:
Version: 19.03.6
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: 369ce74a3c
Built: Thu Dec 10 13:23:49 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.3.3-0ubuntu1~18.04.4
GitCommit:
runc:
Version: spec: 1.0.1-dev
GitCommit:
docker-init:
Version: 0.18.0
GitCommit:0 -
Hi @asmoljo,
Going back to my prior comment: "resources".
You assigned 4 CPUs to each VM, you have 2 VMs, that sums up to 8 CPUs assigned to the VMs alone. Your host seems to have 8 cores altogether, which seem to be over committed, considering that your Windows host OS needs CPU to run, and the hypervisor needs resources as well.
Dynamic VM disk management may also be an issue with VBox. Your cluster does not know that additional disk space can and will be allocated. Instead, it only sees the current usage as a fraction of currently allocated disk space - high. This is where your cluster panics.
So the over committed CPU together with dynamic disk management - a recipe for cluster failure.
I would recommend revisiting the VM sizing guide in the Overview section of Lab 3.1, and fit that into the physical resources available on your host machine, ensuring all running components have a fair amount of resources to operate.
Regards,
-Chris0 -
Hi Chris,
I recreated VMs so they have fixed size disks and 2 CPUs, but I still have the same problem with master plane containers.
It seems like the containers are constantly restarting.asmoljo@lfc1master1:~$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a35b0a749b14 538929063f23 "kube-controller-man…" 9 seconds ago Up 9 seconds k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_4
0177c47e5607 ce0df89806bb "kube-apiserver --ad…" About a minute ago Exited (255) 39 seconds ago k8s_kube-apiserver_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_6
b987dfbe5393 49eb8a235d05 "kube-scheduler --au…" About a minute ago Up About a minute k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_1
7c6cc53ad60d 538929063f23 "kube-controller-man…" About a minute ago Exited (255) 29 seconds ago k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_3
b58c6f7b060a 49eb8a235d05 "kube-scheduler --au…" 3 minutes ago Exited (255) About a minute ago k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
3f10dc09d4c7 0369cf4303ff "etcd --advertise-cl…" 3 minutes ago Up 3 minutes k8s_etcd_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
20d9211b0e6a k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
de77e4ef7180 k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_0
aa3ae9c1f522 k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_0
73f054ec8d53 k8s.gcr.io/pause:3.2 "/pause" 3 minutes ago Up 3 minutes k8s_POD_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0six minutes later
asmoljo@lfc1master1:~$ sudo docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
1cd8da1e0dab ce0df89806bb "kube-apiserver --ad…" 30 seconds ago Exited (255) 5 seconds ago k8s_kube-apiserver_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_9
1b0341052fd4 538929063f23 "kube-controller-man…" About a minute ago Exited (255) 4 seconds ago k8s_kube-controller-manager_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_6
b987dfbe5393 49eb8a235d05 "kube-scheduler --au…" 6 minutes ago Up 6 minutes k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_1
b58c6f7b060a 49eb8a235d05 "kube-scheduler --au…" 9 minutes ago Exited (255) 6 minutes ago k8s_kube-scheduler_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
3f10dc09d4c7 0369cf4303ff "etcd --advertise-cl…" 9 minutes ago Up 9 minutes k8s_etcd_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
20d9211b0e6a k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-scheduler-lfc1master1_kube-system_d8964234650b330c55fcf8fb2f5295dd_0
de77e4ef7180 k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-controller-manager-lfc1master1_kube-system_0182991bee489435046543d9389e78da_0
aa3ae9c1f522 k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_kube-apiserver-lfc1master1_kube-system_259d7168afb4e521dba91ac5a929fbf8_0
73f054ec8d53 k8s.gcr.io/pause:3.2 "/pause" 9 minutes ago Up 9 minutes k8s_POD_etcd-lfc1master1_kube-system_80ebeff3505004ee5b56b87a252ac81b_0
asmoljo@lfc1master1:~$0 -
Hi @asmoljo,
What are the docker logs showing for the control plane containers?
What are the outputs of the
top
anddf -h
commands run on your control plane node?Regards,
-Chris0
Categories
- All Categories
- 167 LFX Mentorship
- 219 LFX Mentorship: Linux Kernel
- 795 Linux Foundation IT Professional Programs
- 355 Cloud Engineer IT Professional Program
- 179 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 127 Cloud Native Developer IT Professional Program
- 112 Express Training Courses
- 112 Express Courses - Discussion Forum
- 6.2K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 17 LFC131 Class Forum
- 35 LFD102 Class Forum
- 227 LFD103 Class Forum
- 14 LFD110 Class Forum
- 39 LFD121 Class Forum
- 15 LFD133 Class Forum
- 7 LFD134 Class Forum
- 17 LFD137 Class Forum
- 63 LFD201 Class Forum
- 3 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 2 LFD237 Class Forum
- 23 LFD254 Class Forum
- 697 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 153 LFS101 Class Forum
- 1 LFS111 Class Forum
- 1 LFS112 Class Forum
- 1 LFS116 Class Forum
- 1 LFS118 Class Forum
- LFS120 Class Forum
- 7 LFS142 Class Forum
- 7 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 1 LFS157 Class Forum
- 33 LFS158 Class Forum
- 8 LFS162 Class Forum
- 1 LFS166 Class Forum
- 1 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 1 LFS178 Class Forum
- 1 LFS180 Class Forum
- 1 LFS182 Class Forum
- 1 LFS183 Class Forum
- 29 LFS200 Class Forum
- 736 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 14 LFS203 Class Forum
- 102 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 48 LFS241 Class Forum
- 42 LFS242 Class Forum
- 37 LFS243 Class Forum
- 15 LFS244 Class Forum
- LFS245 Class Forum
- LFS246 Class Forum
- 50 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 154 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 5 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 111 LFS260 Class Forum
- 159 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 20 LFS267 Class Forum
- 24 LFS268 Class Forum
- 29 LFS269 Class Forum
- 1 LFS270 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 9 LFW111 Class Forum
- 261 LFW211 Class Forum
- 182 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 782 Hardware
- 198 Drivers
- 68 I/O Devices
- 37 Monitors
- 96 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 83 Storage
- 743 Linux Distributions
- 80 Debian
- 67 Fedora
- 15 Linux Mint
- 13 Mageia
- 23 openSUSE
- 143 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 348 Ubuntu
- 461 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 90 Linux Security
- 77 Network Management
- 101 System Management
- 46 Web Management
- 64 Mobile Computing
- 17 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 371 Off Topic
- 114 Introductions
- 174 Small Talk
- 19 Study Material
- 507 Programming and Development
- 285 Kernel Development
- 204 Software Development
- 1.8K Software
- 211 Applications
- 180 Command Line
- 3 Compiling/Installing
- 405 Games
- 309 Installation
- 97 All In Program
- 97 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)