Welcome to the new Linux Foundation Forum!

Issue with worker node on Lab 3.2 Step 30 (worker pull from registry)

Hi,

I have a problem in Step 30 of Lab 3.2 :(

context: i have two ec2 instances with port TCP 5000 opened in its security group

worker node (cannot pull from local registry):

[email protected]:~/LFD259/SOLUTIONS/s_02$ cat /etc/docker/daemon.json
{ "insecure-registries":["10.111.241.52:5000"] }

[email protected]:~/LFD259/SOLUTIONS/s_02$ k get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    24h
nginx        ClusterIP   10.107.231.29   <none>        443/TCP    61m
registry     ClusterIP   10.111.241.52   <none>        5000/TCP   61m

[email protected]:~/LFD259/SOLUTIONS/s_02$ docker pull 10.111.241.52:5000/simpleapp
Using default tag: latest
Error response from daemon: Get http://10.111.241.52:5000/v2/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

[email protected]:~/LFD259/SOLUTIONS/s_02$ curl http://10.111.241.52:5000/v2/
^C

master node (works fine):

[email protected]:~/LFD259/SOLUTIONS/s_02$ k get svc
NAME         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.96.0.1       <none>        443/TCP    24h
nginx        ClusterIP   10.107.231.29   <none>        443/TCP    58m
registry     ClusterIP   10.111.241.52   <none>        5000/TCP   58m

[email protected]:~/LFD259/SOLUTIONS/s_02$ bat /etc/docker/daemon.json
{ "insecure-registries":["10.111.241.52:5000"] }

[email protected]:~/LFD259/SOLUTIONS/s_02$ curl http://10.111.241.52:5000/v2/
{}

[email protected]:~/LFD259/SOLUTIONS/s_02$ sudo docker pull 10.111.241.52:5000/tagtest
Using default tag: latest
latest: Pulling from tagtest
Digest: sha256:134c7fe821b9d359490cd009ce7ca322453f4f2d018623f849e580a89a685e5d
Status: Image is up to date for 10.111.241.52:5000/tagtest:latest

Any ideas? Thank you!

Comments

  • Typically docker commands run as root. Also, I am not sure what the ownership of the daemon.json file is, normally it should be root.
    I see an interesting setup there with kubectl running on both nodes.
    As far as curl goes on the worker, there may still be an SG issue.

    Regards,
    -Chris

  • rcougilrcougil Posts: 4
    edited December 2019

    I've added user to docker group, so no need to sudo i guess. If i do a docker info i see how the daemon.json config has been picked after restart docker service in both worker and master nodes.

    Yeah, i've installed kubectl in worker node too to troubleshoot, but problem was there before :|

    Curl does not work from worker to master using that k8s cluster inner ip (10.111.241.52), but Ping works with master node IP (private IP from of ec2 instances within the same VPC). I've opened all TCP ports on SG with same result. So... no really confident about being a SG issue.

    Some colleagues suggested that maybe the type of the registry service should be NodePort instead of ClusterIP in order to publish the address and port to the worker node. Actually, I've changed it to NodePort and it worked, but i am not sure that was the right approach since the Lab is pretty clear about using k8s inner IP from the worker node (type ClusterIP).

    Any other ideas? Do you need more info from my side?
    Many thanks for the help

    Regards,
    Rubén

  • serewiczserewicz Posts: 553

    Hello,

    I see from your context you mention opening TCP port 5000 in your security group. Could you please try the lab again, this time with the firewall fully removed as suggested in the lab setup guide and video and see if the issue persists?

    Kind regards,

  • Hi @rcougil,
    Is your SG overall restrictive with individual rules to allow traffic to specific ports? An SG configured to allow all ingress traffic may be the solution in this case. For an extra level of security, a custom VPC may be suitable for the all-open SG, with the Kubernetes cluster nodes deployed in the custom VPC.

    Regards,
    -Chris

  • serewiczserewicz Posts: 553

    Hello again,
    Please reference the docker-compose.yaml file in the lab. You will note that there is an nginx registry running on port 443. If you add this port to your network security settings does the pull command work? Also, when you run the sudo docker pull command, if you include a -vvv

  • gfalascagfalasca Posts: 8

    Hello, I had a similar problem maybe my fix could solve also your case.
    As the lab uses calico network plugin, the TCP port 179 should be open on every node's iptables.

    A good way to verify that this is eventually your case is via
    sudo calicoctl node status

    You can get calicoctl from
    curl -O -L https://github.com/projectcalico/calicoctl/releases/download/v3.11.1/calicoctl

    If you see anything different from Established in the Info column it means that the connection among your nodes is not fully established.

    In my case the port TCP 179 was not accepting incoming connections on the worker machine. I just fixed it on that machine's iptables by
    sudo iptables -I INPUT 5 -i eth0 -p tcp --dport 179 -m state --state NEW,ESTABLISHED -j ACCEPT
    and everything started working properly, calicoctl showing Established now.

Sign In or Register to comment.