Welcome to the Linux Foundation Forum!

[Exercise 2.2: Deploy a New Cluster] Hit "node xx not found" issue

Posts: 5
edited January 2022 in LFD259 Class Forum

I followed the guide to create my cluster on Ali Cloud, and the two instances with 2cpu, 8G.

  1. root@master:~# cat /etc/hosts
  2. 10.250.115.210 master
  3. 10.250.115.211 slaver
  4.  
  5. root@master:~# hostname
  6. master

the kubeadm init always block at following block

  1. I0201 00:57:06.271718 29692 waitcontrolplane.go:91] [wait-control-plane] Waiting for the API server to be healthy
  2. [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
  3. [kubelet-check] Initial timeout of 40s passed.

I googled it and found the following issue seems like what I met.

https://github.com/cri-o/cri-o/issues/2357
https://github.com/kubernetes/kubeadm/issues/1153
https://github.com/kubernetes/kubeadm/issues/2370
https://github.com/kubernetes/kubernetes/issues/106464

I did remove the docker if it exists, and double confirm the type of container group is the same in crio and kubelet. the error reporting is still kubelet's problem like below:

  1. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.198073 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  2. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.298393 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  3. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.398656 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  4. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.499651 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  5. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.599724 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  6. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.700032 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  7. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.800410 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  8. Feb 01 00:57:19 master kubelet[29902]: E0201 00:57:19.900674 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  9. Feb 01 00:57:20 master kubelet[29902]: E0201 00:57:20.001051 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"
  10. Feb 01 00:57:20 master kubelet[29902]: E0201 00:57:20.101439 29902 kubelet.go:2422] "Error getting node" err="node \"master\" not found"

I tried to upgrade the kubeadm, kubelet, kubectl to the newest version 1.23.3. it seems not to work. is there anyone who may give some insight about it? thanks.

BTW, below is the kubeadm.yaml serve for kubeadm init.

  1. apiVersion: kubeadm.k8s.io/v1beta2
  2. bootstrapTokens:
  3. - groups:
  4. - system:bootstrappers:kubeadm:default-node-token
  5. token: abcdef.0123456789abcdef
  6. ttl: 24h0m0s
  7. usages:
  8. - signing
  9. - authentication
  10. kind: InitConfiguration
  11. localAPIEndpoint:
  12. bindPort: 6443
  13. nodeRegistration:
  14. criSocket: unix:///var/run/crio/crio.sock
  15. name: master
  16. taints: null
  17. ---
  18. apiServer:
  19. timeoutForControlPlane: 4m0s
  20. apiVersion: kubeadm.k8s.io/v1beta2
  21. certificatesDir: /etc/kubernetes/pki
  22. clusterName: kubernetes
  23. controllerManager: {}
  24. dns:
  25. type: CoreDNS
  26. etcd:
  27. local:
  28. dataDir: /var/lib/etcd
  29. imageRepository: registry.aliyuncs.com/google_containers
  30. kind: ClusterConfiguration
  31. kubernetesVersion: 1.23.3
  32. networking:
  33. dnsDomain: cluster.local
  34. serviceSubnet: 10.96.0.0/12
  35. podSubnet: 192.168.0.0/16
  36. scheduler: {}
  37. ---
  38. apiVersion: kubelet.config.k8s.io/v1beta1
  39. authentication:
  40. anonymous:
  41. enabled: false
  42. webhook:
  43. cacheTTL: 0s
  44. enabled: true
  45. x509:
  46. clientCAFile: /etc/kubernetes/pki/ca.crt
  47. authorization:
  48. mode: Webhook
  49. webhook:
  50. cacheAuthorizedTTL: 0s
  51. cacheUnauthorizedTTL: 0s
  52. cgroupDriver: systemd
  53. clusterDNS:
  54. - 10.96.0.10
  55. clusterDomain: cluster.local
  56. cpuManagerReconcilePeriod: 0s
  57. evictionPressureTransitionPeriod: 0s
  58. fileCheckFrequency: 0s
  59. healthzBindAddress: 127.0.0.1
  60. healthzPort: 10248
  61. httpCheckFrequency: 0s
  62. imageMinimumGCAge: 0s
  63. kind: KubeletConfiguration
  64. logging: {}
  65. nodeStatusReportFrequency: 0s
  66. nodeStatusUpdateFrequency: 0s
  67. resolvConf: /run/systemd/resolve/resolv.conf
  68. rotateCertificates: true
  69. runtimeRequestTimeout: 0s
  70. shutdownGracePeriod: 0s
  71. shutdownGracePeriodCriticalPods: 0s
  72. staticPodPath: /etc/kubernetes/manifests
  73. streamingConnectionIdleTimeout: 0s
  74. syncFrequency: 0s
  75. volumeStatsAggPeriod: 0s

Best Answer

Answers

  • I forgot one thing, I tested it locally in VMware Pro with the same configuration, and the problem remains the same.

  • Hi @yang.wang11,

    Kubernetes is highly sensitive to VM instance/Node networking configuration. Have you had a chance to watch the two cluster set up videos for AWS and GCP? While they are different cloud providers, it is possible you may find some networking and firewall configuration tips that can be used in other cloud settings or local hypervisors.

    I would also stick with the recommended Kubernetes v1.22.1, as per the lab guide, and the VM guest OS - Ubuntu 18.04 LTS. Disable guest OS firewalls if any are enabled by default, and disable swap as well.

    Regards,
    -Chris

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training