LFS258 Lab 3.1 - Something overwriting edits to /etc/hosts file, preventing worker join to cp node
Hello Mentors! Thank you for the great course. Could you please help me with an issue?
I managed to complete all of Lab 3.1 (Exercises 3.1-3.5) successfully using three nodes on Digital Ocean, but I am running into a problem trying to replicate the same lab/procedure on a MacBook Pro (M1 Max/Apple Silicon/aarch64/arm64) using a cp node and one or two workers using Canonical's Multipass v1.11.1 to create the nodes. Multipass enables me to launch three Ubuntu 22.04.1 nodes quickly right on the Mac, each with a unique IP, and of course, you can open a shell in Terminal to connect to each one via SSH (using 'multipass shell '), as you would expect. I am running macOS Ventura 13.3.1.
Everything works fine and as expected (i.e., containerd, kubelets, and all kube-system pods are running and ready with no apparent errors, etc.)--up until just after the moment where I try to join the first worker node to the cp. At that point, there is the usual brief message on the worker that it has joined the cluster. Shortly after that, on the cp, it briefly reports that node1 (the worker) has joined, and both the cp and node1 both flash "ready," but a moment later, the shell connections to the cp and worker both freeze-up and are lost (i.e., after a few seconds, each shell reverts to the usual Terminal prompt).
After the connections are lost, Multipass reports the nodes are still running, but I have been unable to recover the connections without stopping both nodes first, and then starting them up again, one at a time. During this process, I noticed that after I reestablish each shell connection, something has overwritten my edits to the /etc/hosts file on each node (cp and worker). So it appears the worker nodes cannot find the cp and vice-versa. Also, the alias I have created on each machine ("alias k='kubectl'") has been removed/forgotten. I noticed that if I reenter the control plane IP in the /etc/hosts file on the worker, it typically rejoins the cluster for a short time before things freeze up again.
So it appears some part of the system might be rebooting (maybe the kube-apiserver or kubelet or something ... but not the nodes themselves, probably), causing key edits to be lost (causing DNS failure, I suspect). Is there a way to fix this?
I tried commenting-out the line '- update_etc_hosts' in the '/etc/cloud/cloud.cfg' file, but that did not fix the issue (see this thread). This other thread on Stackoverflow suggested some reasons why the edits to /etc/hosts are not persisting (perhaps being overwritten by systemd-resolved.service), but I can't figure it out, so I'm asking the experts!
Additional context: I've taken the usual steps to set up the cluster--swap has been disabled ('swapoff -a'), all firewall tables have been flushed ('iptables -F'), SELINUX has been set to 'permissive', etc. So to reiterate: basically the same approach works fine to create a 3-node kubeadm cluster when I use my Mac with three nodes on Digital Ocean, but not when I try to create a similar cluster using Multipass on the Mac M1.
What can I do to fix this and make the kubeadm cluster function normally using Multipass on Apple Silicon? Thanks!
Comments
-
Having reflected on this issue a little more--and experiencing it again while using Multipass--it could be that when you use Multipass to stop a node and restart it again, that causes /etc/hosts to revert to it's original settings, and for any aliases you have set to be forgotten.
But about the original issue (i.e., that a worker node can be connected to a cp node using the LFS258 Lab 3.1 procedure on Multipass, but then the system breaks down): I would still like to know if there is a way to resolve it (to run a multi-node kubeadm cluster on Multipass on Apple Silicon) if anyone knows. Thanks!
0 -
Hi @andrew.nichols,
The lab material has not yet migrated to Ubuntu 22.04 LTS. It is still on Ubuntu 20.04 LTS to mirror the CKA exam environment, per:
The Kubernetes nodes disconnects may be due to container images incompatibilities with the ARM architecture. As most Kubernetes control plane agents are run by containers, then the CNI network plugin, DNS servers, and any other plugins need to be compatible with the guest system architecture.
Regards,
-Chris0
Categories
- All Categories
- 207 LFX Mentorship
- 207 LFX Mentorship: Linux Kernel
- 734 Linux Foundation IT Professional Programs
- 339 Cloud Engineer IT Professional Program
- 166 Advanced Cloud Engineer IT Professional Program
- 66 DevOps Engineer IT Professional Program
- 132 Cloud Native Developer IT Professional Program
- 122 Express Training Courses
- 122 Express Courses - Discussion Forum
- 6K Training Courses
- 40 LFC110 Class Forum - Discontinued
- 66 LFC131 Class Forum
- 39 LFD102 Class Forum
- 222 LFD103 Class Forum
- 17 LFD110 Class Forum
- 34 LFD121 Class Forum
- 17 LFD133 Class Forum
- 6 LFD134 Class Forum
- 17 LFD137 Class Forum
- 70 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 3 LFD237 Class Forum
- 23 LFD254 Class Forum
- 689 LFD259 Class Forum
- 110 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 112 LFS101 Class Forum
- LFS111 Class Forum
- 2 LFS112 Class Forum
- 1 LFS116 Class Forum
- 3 LFS118 Class Forum
- 3 LFS142 Class Forum
- 3 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 2 LFS147 Class Forum
- 8 LFS151 Class Forum
- 1 LFS157 Class Forum
- 18 LFS158 Class Forum
- 5 LFS162 Class Forum
- 1 LFS166 Class Forum
- 3 LFS167 Class Forum
- 1 LFS170 Class Forum
- 1 LFS171 Class Forum
- 2 LFS178 Class Forum
- 2 LFS180 Class Forum
- 1 LFS182 Class Forum
- 4 LFS183 Class Forum
- 30 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 17 LFS203 Class Forum
- 118 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 50 LFS241 Class Forum
- 44 LFS242 Class Forum
- 37 LFS243 Class Forum
- 13 LFS244 Class Forum
- 1 LFS245 Class Forum
- 45 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 146 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 6 LFS256 Class Forum
- LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 116 LFS260 Class Forum
- 156 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 23 LFS267 Class Forum
- 18 LFS268 Class Forum
- 29 LFS269 Class Forum
- 200 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 8 LFW111 Class Forum
- 257 LFW211 Class Forum
- 180 LFW212 Class Forum
- 12 SKF100 Class Forum
- SKF200 Class Forum
- SKF201 Class Forum
- 791 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 98 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 754 Linux Distributions
- 82 Debian
- 67 Fedora
- 16 Linux Mint
- 13 Mageia
- 23 openSUSE
- 147 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 351 Ubuntu
- 465 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 91 Linux Security
- 78 Network Management
- 101 System Management
- 47 Web Management
- 56 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 366 Off Topic
- 114 Introductions
- 171 Small Talk
- 20 Study Material
- 534 Programming and Development
- 293 Kernel Development
- 223 Software Development
- 1.2K Software
- 212 Applications
- 182 Command Line
- 3 Compiling/Installing
- 405 Games
- 312 Installation
- 79 All In Program
- 79 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)