Issue with worker node on Lab 3.2 Step 30 (worker pull from registry)

rcougil · December 2019

Hi,

I have a problem in Step 30 of Lab 3.2

context: i have two ec2 instances with port TCP 5000 opened in its security group

worker node (cannot pull from local registry):

ubuntu@ip-172-31-16-147:~/LFD259/SOLUTIONS/s_02$ cat /etc/docker/daemon.json
{ "insecure-registries":["10.111.241.52:5000"] }

ubuntu@ip-172-31-16-147:~/LFD259/SOLUTIONS/s_02$ k get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    24h
nginx        ClusterIP   10.107.231.29   <none>        443/TCP    61m
registry     ClusterIP   10.111.241.52   <none>        5000/TCP   61m

ubuntu@ip-172-31-16-147:~/LFD259/SOLUTIONS/s_02$ docker pull 10.111.241.52:5000/simpleapp
Using default tag: latest
Error response from daemon: Get http://10.111.241.52:5000/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

ubuntu@ip-172-31-16-147:~/LFD259/SOLUTIONS/s_02$ curl http://10.111.241.52:5000/v2/
^C

master node (works fine):

ubuntu@ip-172-31-17-134:~/LFD259/SOLUTIONS/s_02$ k get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    24h
nginx        ClusterIP   10.107.231.29   <none>        443/TCP    58m
registry     ClusterIP   10.111.241.52   <none>        5000/TCP   58m

ubuntu@ip-172-31-17-134:~/LFD259/SOLUTIONS/s_02$ bat /etc/docker/daemon.json
{ "insecure-registries":["10.111.241.52:5000"] }

ubuntu@ip-172-31-17-134:~/LFD259/SOLUTIONS/s_02$ curl http://10.111.241.52:5000/v2/
{}

ubuntu@ip-172-31-17-134:~/LFD259/SOLUTIONS/s_02$ sudo docker pull 10.111.241.52:5000/tagtest
Using default tag: latest
latest: Pulling from tagtest
Digest: sha256:134c7fe821b9d359490cd009ce7ca322453f4f2d018623f849e580a89a685e5d
Status: Image is up to date for 10.111.241.52:5000/tagtest:latest

Any ideas? Thank you!

chrispokorni · December 2019

Typically docker commands run as root. Also, I am not sure what the ownership of the daemon.json file is, normally it should be root.
I see an interesting setup there with kubectl running on both nodes.
As far as curl goes on the worker, there may still be an SG issue.

Regards,
-Chris

rcougil · December 2019

I've added user to docker group, so no need to sudo i guess. If i do a docker info i see how the daemon.json config has been picked after restart docker service in both worker and master nodes.

Yeah, i've installed kubectl in worker node too to troubleshoot, but problem was there before

Curl does not work from worker to master using that k8s cluster inner ip (10.111.241.52), but Ping works with master node IP (private IP from of ec2 instances within the same VPC). I've opened all TCP ports on SG with same result. So... no really confident about being a SG issue.

Some colleagues suggested that maybe the type of the registry service should be NodePort instead of ClusterIP in order to publish the address and port to the worker node. Actually, I've changed it to NodePort and it worked, but i am not sure that was the right approach since the Lab is pretty clear about using k8s inner IP from the worker node (type ClusterIP).

Any other ideas? Do you need more info from my side?
Many thanks for the help

Regards,
Rubén

chrispokorni · December 2019

Hi @rcougil,
Is your SG overall restrictive with individual rules to allow traffic to specific ports? An SG configured to allow all ingress traffic may be the solution in this case. For an extra level of security, a custom VPC may be suitable for the all-open SG, with the Kubernetes cluster nodes deployed in the custom VPC.

Regards,
-Chris

gfalasca · January 2020

Hello, I had a similar problem maybe my fix could solve also your case.
As the lab uses calico network plugin, the TCP port 179 should be open on every node's iptables.

A good way to verify that this is eventually your case is via
sudo calicoctl node status

You can get calicoctl from
curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.11.1/calicoctl

If you see anything different from Established in the Info column it means that the connection among your nodes is not fully established.

In my case the port TCP 179 was not accepting incoming connections on the worker machine. I just fixed it on that machine's iptables by
sudo iptables -I INPUT 5 -i eth0 -p tcp --dport 179 -m state --state NEW,ESTABLISHED -j ACCEPT
and everything started working properly, calicoctl showing Established now.

Issue with worker node on Lab 3.2 Step 30 (worker pull from registry)

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)