Issues with running k8sMaster.sh: Swap & Time Out
First, thanks for this excellent course! I had made it up through most of unit 4 before the recent update. I understand that these updates need to happen, and I appreciate that we're being prepared for the latest version of the exam. That said, I'm running into some issues getting started with new VMs with the new k8sMaster.sh script.
This is the set up that was working fine with the V2021-01-26 materials. (The only maintenance I need to do was run
sudo ntpdate time.nist.gov after restarting the VMs and occasionally rerun the k8sMaster.sh / k8sSecond.sh commands on the respective VMs.)
- 2 Ubuntu VMs running locally via VirtualBox on Windows 10 Business, v21H1, 10.0.19043, 32 GB RAM, i9010885H CPU @ 2.40GHz
- 2 GB RAM, 2 processor cores, 25 GB virtual disk image, Network: Attached to Bridged Adapter to Ethernet port
- OS installed via ubuntu-18.04.5-live-server-amd64.iso (Date modified: 2021-03-20 2:58 PM)
I'm having the following issues when running the V2021-05-26 version of the set up scripts:
1. This one might be a recommendation for updating the scripts. When running
bash k8sMaster.sh | tee $HOME/master.out, there's an error about swap not being disabled. So now, as a workaround, each time I re-attempt on a fresh VM, I disable swap before running the scripts. (I either run
sudo swapoff -aor edit /etc/fstab and reboot.)
[preflight] Running pre-flight checks error execution phase preflight: [preflight] Some fatal errors occurred: [ERROR Swap]: running with swap on is not supported. Please disable swap
- This is where my current support question lies. There is a timeout in the [kubelet-check] step (after the
sudo kubeadm init --config=$(find / -name kubeadm.yaml 2>/dev/null )command. See pertinent output below. I'll also include samples of the output of the debugging steps recommended by the output.
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl: - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'crictl --runtime-endpoint unix:///var/run/crio/crio.sock logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher
The output of the recommended debugging steps are:
[email protected]:~$ systemctl status kubelet ● kubelet.service - kubelet: The Kubernetes Node Agent Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled) Drop-In: /etc/systemd/system/kubelet.service.d └─10-kubeadm.conf Active: active (running) since Sun 2021-05-30 20:18:49 UTC; 5h 21min ago Docs: https://kubernetes.io/docs/home/ Main PID: 20472 (kubelet) Tasks: 15 (limit: 2316) CGroup: /system.slice/kubelet.service └─20472 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime=remote --co May 31 01:40:44 master kubelet: E0531 01:40:44.774526 20472 kuberuntime_sandbox.go:68] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to mount container k8s_POD_kube-sMay 31 01:40:44 master kubelet: E0531 01:40:44.774546 20472 kuberuntime_manager.go:790] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to mount container k8s_POD_kube-sMay 31 01:40:44 master kubelet: E0531 01:40:44.774591 20472 pod_workers.go:190] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-scheduler-master_kube-system(22ea193343aa28May 31 01:40:44 master kubelet: E0531 01:40:44.808612 20472 kubelet.go:2291] "Error getting node" err="node \"master\" not found" May 31 01:40:44 master kubelet: E0531 01:40:44.908942 20472 kubelet.go:2291] "Error getting node" err="node \"master\" not found" # 5 similar lines omitted
[email protected]:~$ journalctl -xeu kubelet # (1001 lines, but here's a sample) May 31 02:20:52 master kubelet: E0531 02:20:52.945247 20472 kubelet.go:2291] "Error getting node" err="node \"master\" not found" May 31 02:20:52 master kubelet: E0531 02:20:52.985422 20472 eviction_manager.go:255] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"master\" not found" May 31 02:20:53 master kubelet: E0531 02:20:53.045960 20472 kubelet.go:2291] "Error getting node" err="node \"master\" not found"
[email protected]:~$ sudo crictl --runtime-endpoint unix:///var/run/crio/crio.sock ps -a | grep kube | grep -v pause [sudo] password for master: # [no output]
Running the command with the verbosity flag (
sudo kubeadm init --config=$(find ~ -name kubeadm.yaml 2>/dev/null ) --v=5) doesn't add any more useful information