Welcome to the Linux Foundation Forum!

lab 3.x

I'm just getting started with the labs and I've hit a bit of trouble right off the bat, I'm not sure which direction to explore for possible solution.

I'm installing k8s using kubeadm, my infra is AWS based, I have my own VPC (might be something with the network setup), inside the VPC which is accessible from the internet of course I have 2 ubuntu ec2 instances, a master and a worker.
The security group for each instance has the inbound rules as described here:
https://kubernetes.io/docs/setup/independent/install-kubeadm/

I was able to complete lab 3.1 almost to the letter, the only issue I saw was with the commands :
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

I kept getting an error saying sudo: unable to resolve host ip-10-0-..

by this point the master is in ready state and all pods (including calico) are running so I pushed forward
at lab 3.2 I was able to bootstrap the worker, but when that joined the master I have one calico pod in error mode, everything else was according to the lab description so I pushed forward again
I stopped at 3.3 as the nginx pod is stuck in containerCreation, the description of the pod gives back this:

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned default/nginx-64f497f8fd-d9pth to ip-10-0-1-111
Warning FailedCreatePodSandBox 10s kubelet, ip-10-0-1-111 Failed create pod sandbox: rpc error: code = Unknown desc = [failed to set up sandbox container "8196208e2cf244509e49b6fedc7952042a79197763a1dc751b96a8ce17e4a313" network for pod "nginx-64f497f8fd-d9pth": NetworkPlugin cni failed to set up pod "nginx-64f497f8fd-d9pth_default" network: Unable to retreive ReadyFlag from Backend: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
, failed to clean up sandbox container "8196208e2cf244509e49b6fedc7952042a79197763a1dc751b96a8ce17e4a313" network for pod "nginx-64f497f8fd-d9pth": NetworkPlugin cni failed to teardown pod "nginx-64f497f8fd-d9pth_default" network: Unable to retreive ReadyFlag from Backend: client: etcd cluster is unavailable or misconfigured; error #0: client: endpoint http://10.96.232.136:6666 exceeded header timeout
]
Normal SandboxChanged 9s kubelet, ip-10-0-1-111 Pod sandbox changed, it will be killed and re-created.

The problem seems obvious? I get something similar from the calico pod that's failing as in it's unhappy cuz of etcd, but installing etcd and/or configuring it was not in the labs as far as I can tell, what am I missing?

Please advise.

Regards,
Naim

Comments

  • serewicz
    serewicz Posts: 1,000

    Hello Naim,
    I have not seen this error when working with kubeadm, but I have seen sudo errors on nodes where the current hostname is not in the /etc/hosts file. Did you update the hostname?

    If the .kube/config file does not have the proper server IP and port listed the kubectl command won't know where to send the APIs.

    Regards,

  • I've seen very small issues cause big problems so let's explore that, my master host seems to be called ip-10-0-1-158
    currently my /etc/hosts looks like this:

    127.0.0.1 localhost

    The following lines are desirable for IPv6 capable hosts

    ::1 ip6-localhost ip6-loopback
    fe00::0 ip6-localnet
    ff00::0 ip6-mcastprefix
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
    ff02::3 ip6-allhosts

    are you suggestion I add my hostname like so?
    ip-10-0-1-158 localhost

  • serewicz
    serewicz Posts: 1,000

    Well,
    That looks just like my /etc/hosts file as well, without the inclusion of the specific hostname. So it must be something else.

    You logged into the node and then used sudo -i to become root? Did that work prior to running kubeadm? If it did, but after you exit back to a non-root user that would be quite strange.

    What IP address did you use when you ran** kubeadm init?** Perhaps there is a conflict between Calico and the local node?

    Regards,

  • Hi Naim,
    I see a timeout on port 6666, which is not included in the ports section at "Installing kubeadm". Since SGs act as firewalls, can you try opening your SG to all traffic? Not a best practice, I know, but for the purpose of completing these labs it may help.
    Regards,
    -Chris

  • NaimSalameh
    NaimSalameh Posts: 3
    edited September 2018

    Incredible.. Chris it was the port thing!! as soon as I opened it on both the master sg and worker sg the nginx pod is up and running.
    Thank you so much don't know how I didn't think about that myself, I was more focused on the etcd thing as it struck me more significant

    I'm still new here but this can be marked as resolved

  • Glad to hear it got resolved and it works now!
    -Chris

Categories

Upcoming Training