[Exercise 2.2: Deploy a New Cluster] Hit "node xx not found" issue

yang.wang11 · January 2022

I followed the guide to create my cluster on Ali Cloud, and the two instances with 2cpu, 8G.

root@master:~# cat /etc/hosts
10.250.115.210  master
10.250.115.211  slaver

root@master:~# hostname
master

the kubeadm init always block at following block

I0201 00:57:06.271718   29692 waitcontrolplane.go:91] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

I googled it and found the following issue seems like what I met.

https://github.com/cri-o/cri-o/issues/2357
https://github.com/kubernetes/kubeadm/issues/1153
https://github.com/kubernetes/kubeadm/issues/2370
https://github.com/kubernetes/kubernetes/issues/106464

I did remove the docker if it exists, and double confirm the type of container group is the same in crio and kubelet. the error reporting is still kubelet's problem like below:

Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.198073   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.298393   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.398656   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.499651   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.599724   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.700032   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.800410   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.900674   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:20 master kubelet[29902]: E0201 00:57:20.001051   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
Feb 01 00:57:20 master kubelet[29902]: E0201 00:57:20.101439   29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"

I tried to upgrade the kubeadm, kubelet, kubectl to the newest version 1.23.3. it seems not to work. is there anyone who may give some insight about it? thanks.

BTW, below is the kubeadm.yaml serve for kubeadm init.

apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/crio/crio.sock
  name: master
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.23.3
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 192.168.0.0/16
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

yang.wang11 · February 2022

it is caused by the API server is not healthy.

yang.wang11 · January 2022

I forgot one thing, I tested it locally in VMware Pro with the same configuration, and the problem remains the same.

chrispokorni · February 2022

Hi @yang.wang11,

Kubernetes is highly sensitive to VM instance/Node networking configuration. Have you had a chance to watch the two cluster set up videos for AWS and GCP? While they are different cloud providers, it is possible you may find some networking and firewall configuration tips that can be used in other cloud settings or local hypervisors.

I would also stick with the recommended Kubernetes v1.22.1, as per the lab guide, and the VM guest OS - Ubuntu 18.04 LTS. Disable guest OS firewalls if any are enabled by default, and disable swap as well.

Regards,
-Chris

[Exercise 2.2: Deploy a New Cluster] Hit "node xx not found" issue

Best Answer

Answers

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)