Welcome to the Linux Foundation Forum!

Can you help me debug the master initialization?

I get the error timed out waiting for the condition when I try to initialize the master:

  1. root@master:~# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
  2. W0103 15:13:11.353300 8986 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
  3. [init] Using Kubernetes version: v1.18.1
  4. [preflight] Running pre-flight checks
  5. [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
  6. [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
  7. [preflight] Pulling images required for setting up a Kubernetes cluster
  8. [preflight] This might take a minute or two, depending on the speed of your internet connection
  9. [preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
  10. [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
  11. [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
  12. [kubelet-start] Starting the kubelet
  13. [certs] Using certificateDir folder "/etc/kubernetes/pki"
  14. [certs] Generating "ca" certificate and key
  15. [certs] Generating "apiserver" certificate and key
  16. [certs] apiserver serving cert is signed for DNS names [master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local k8smaster] and IPs [10.96.0.1 10.2.0.2]
  17. [certs] Generating "apiserver-kubelet-client" certificate and key
  18. [certs] Generating "front-proxy-ca" certificate and key
  19. [certs] Generating "front-proxy-client" certificate and key
  20. [certs] Generating "etcd/ca" certificate and key
  21. [certs] Generating "etcd/server" certificate and key
  22. [certs] etcd/server serving cert is signed for DNS names [master localhost] and IPs [10.2.0.2 127.0.0.1 ::1]
  23. [certs] Generating "etcd/peer" certificate and key
  24. [certs] etcd/peer serving cert is signed for DNS names [master localhost] and IPs [10.2.0.2 127.0.0.1 ::1]
  25. [certs] Generating "etcd/healthcheck-client" certificate and key
  26. [certs] Generating "apiserver-etcd-client" certificate and key
  27. [certs] Generating "sa" key and public key
  28. [kubeconfig] Using kubeconfig folder "/etc/kubernetes"
  29. [kubeconfig] Writing "admin.conf" kubeconfig file
  30. [kubeconfig] Writing "kubelet.conf" kubeconfig file
  31. [kubeconfig] Writing "controller-manager.conf" kubeconfig file
  32. [kubeconfig] Writing "scheduler.conf" kubeconfig file
  33. [control-plane] Using manifest folder "/etc/kubernetes/manifests"
  34. [control-plane] Creating static Pod manifest for "kube-apiserver"
  35. [control-plane] Creating static Pod manifest for "kube-controller-manager"
  36. W0103 15:13:33.458024 8986 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
  37. [control-plane] Creating static Pod manifest for "kube-scheduler"
  38. W0103 15:13:33.459355 8986 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
  39. [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
  40. [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
  41. [kubelet-check] Initial timeout of 40s passed.
  42. [kubelet-check] It seems like the kubelet isn't running or healthy.
  43. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
  44. [kubelet-check] It seems like the kubelet isn't running or healthy.
  45. [kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp 127.0.0.1:10248: connect: connection refused.
  46.  
  47. Unfortunately, an error has occurred:
  48. timed out waiting for the condition
  49.  
  50. This error is likely caused by:
  51. - The kubelet is not running
  52. - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
  53.  
  54. If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
  55. - 'systemctl status kubelet'
  56. - 'journalctl -xeu kubelet'
  57.  
  58. Additionally, a control plane component may have crashed or exited when started by the container runtime.
  59. To troubleshoot, list all containers using your preferred container runtimes CLI.
  60.  
  61. Here is one example how you may list all Kubernetes containers running in docker:
  62. - 'docker ps -a | grep kube | grep -v pause'
  63. Once you have found the failing container, you can inspect its logs with:
  64. - 'docker logs CONTAINERID'
  65.  
  66. error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
  67. To see the stack trace of this error execute with --v=5 or higher

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Answers

  • Posts: 1,000

    Hello,

    That errors include localhost and not the k8smaster alias indicate an issue with the location of the kubeadm-config.yaml file, the syntax of the file, and/or the edit to /etc/hosts.

    As it seems you have deviated from the labs as written I would encourage you to slow down and read, and follow, each step carefully. After you get a cluster working you can revisit, experiment, and diverge from the labs.

    Regards,

  • I've been following the steps in the lab very carefully, without deviating in any way. It's not clear to me what went wrong. I would really appreciate some help to move forward. I am currently blocked and cannot continue with the course.

  • Posts: 1,000

    Hello,

    As I look through your previous questions and examples it is clear you are not following the exercise guide.

    Let's start with some ground work. We can go step by step to get you back on track. What are you using for you nodes, GCE, AWS, Virtual Box?

  • GCE

  • I've just fixed a typo in the file /etc/hosts and rerun kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    Now I get this:

    1. root@master:~# kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    2. W0103 20:06:49.751110 19618 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
    3. [init] Using Kubernetes version: v1.18.1
    4. [preflight] Running pre-flight checks
    5. [WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
    6. [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
    7. error execution phase preflight: [preflight] Some fatal errors occurred:
    8. [ERROR Port-6443]: Port 6443 is in use
    9. [ERROR Port-10259]: Port 10259 is in use
    10. [ERROR Port-10257]: Port 10257 is in use
    11. [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
    12. [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
    13. [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
    14. [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
    15. [ERROR Port-10250]: Port 10250 is in use
    16. [ERROR Port-2379]: Port 2379 is in use
    17. [ERROR Port-2380]: Port 2380 is in use
    18. [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
    19. [preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
    20. To see the stack trace of this error execute with --v=5 or higher
    21.  
  • Posts: 1,000

    Are you using cri-o or docker as the container engine? You have missed a step either way as it cannot find the docker or cri-o engine. The init process has a search order which first looks for docker then cri-o then containerd. The error says docker is not enabled. Did you run the exercise step to install docker?

    Now that I know you are using GCE, did you create a VPC which allows all traffic to all ports? Did you run step 2 and 3 of the exercise?

  • Docker is correctly installed:

    1. root@master:~# docker -v
    2. Docker version 19.03.6, build 369ce74a3c
  • Posts: 1,000

    What is the output of sudo systemctl status docker?

    Can you write your command history to a file and share it? history > history.out

  • The VPC I created should allow all traffic to all ports:

  • Posts: 1,000

    Great, the VPC looks good. What size nodes are you using, and what version on Ubuntu?

  • n1-standard-2 (2 vCPUs, 7.5 GB memory) with Ubuntu, 18.04 LTS

  • This is the output of sudo systemctl status docker:

    1. root@master:~# sudo systemctl status docker
    2. docker.service - Docker Application Container Engine
    3. Loaded: loaded (/lib/systemd/system/docker.service; disabled; vendor preset: enabled)
    4. Active: active (running) since Sun 2021-01-03 15:13:12 UTC; 6h ago
    5. Docs: https://docs.docker.com
    6. Main PID: 9004 (dockerd)
    7. Tasks: 17
    8. CGroup: /system.slice/docker.service
    9. └─9004 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
    10. root@master:~#
  • Posts: 1,000

    That looks right as well. Could you show the commands and put up to the point of kubeadm init please.

  • Here is the most recent command history:

    1. root@master:~# history
    2. 1 apt-get update && apt-get upgrade -y
    3. 2 apt-get install -y vim
    4. 3 vim
    5. 4 apt-get install -y docker.io
    6. 5 pwd
    7. 6 ls
    8. 7 vim /etc/apt/source.list.d/kubernetes.list
    9. 8 vim /etc/apt/sources.list.d/kubernetes.list
    10. 9 curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
    11. 10 pwd
    12. 11 ls
    13. 12 apt-get update
    14. 13 apt-get install -y kubeadm=1.18.1-00 kubelet=1.18.1-00 kubectl=1.18.1-00
    15. 14 apt-mark hold kubelet kubeadm kubectl
    16. 15 wget https://docs.projectcalico.org/manifests/calico.yaml
    17. 16 less
    18. 17 less calico.yaml
    19. 18 ip addr show
    20. 19 vim /etc/hosts
    21. 20 vim kubeadm-config.yaml
    22. 21 kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    23. 22 less calico.yaml
    24. 23 kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    25. 24 vi kubeadm-init.out
    26. 25 docker ps -a | grep kube | grep -v pause
    27. 26 systemctl status kubelet
    28. 27 journalctl -xeu kubelet
    29. 28 vim /etc/hosts
    30. 29 apt-get update
    31. 30 less calico.yaml
    32. 31 ip addr show
    33. 32 vim /etc/hosts
    34. 33 vim kubeadm-config.yaml
    35. 34 kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    36. 35 wget https://docs.projectcalico.org/manifests/calico.yaml
    37. 36 less calico.yaml
    38. 37 kubeadm init --config=kubeadm-config.yaml --upload-certs | tee kubeadm-init.out
    39. 38 docker ps
    40. 39 docker -v
    41. 40 sudo systemctl status docker
    42. 41 history > history.out
    43. 42 history
    44. root@master:~#
  • I fixed the problem by running kubeadm reset.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training