Welcome to the Linux Foundation Forum!

Lab 3 now on GCP. Can't get nodes to curl nginx.

btanouebtanoue Posts: 59
edited April 2019 in LFS258 Class Forum

Master Node Output:
[email protected]:~$ kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.101.100.68 80/TCP 9m32s
[email protected]:~$ kubectl get ep nginx
NAME ENDPOINTS AGE
nginx 192.168.2.2:80 9m37s
[email protected]:~$ kubectl describe pod nginx-7db75b8b78-j5dq7 |grep Node:
Node: kubeworker/10.142.0.9

Worker Node:
[email protected]:~$ sudo tcpdump -i tunl0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes

Master Node
[email protected]:~$ curl 10.101.100.86:80

This just hangs.

[email protected]:~$ curl 192.168.2.2:80

Hangs as well.

Any suggestion for debug?

Comments

  • btanouebtanoue Posts: 59

    Nodes seem OK:

    NAME STATUS ROLES AGE VERSION
    kubemaster Ready master 4h26m v1.13.1
    kubeworker Ready 55m v1.13.1

  • btanouebtanoue Posts: 59

    Yaml file: first.yaml.
    I think I edited it OK.

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    annotations:
    deployment.kubernetes.io/revision: "1"
    generation: 1
    labels:
    app: nginx
    name: nginx
    namespace: default
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    app: nginx
    strategy:
    rollingUpdate:
    maxSurge: 25%
    maxUnavailable: 25%
    type: RollingUpdate
    template:
    metadata:
    creationTimestamp: null
    labels:
    app: nginx
    spec:
    containers:
    - image: nginx
    imagePullPolicy: Always
    name: nginx
    _ ports:
    - containerPort: 80
    protocol: TCP _
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    terminationGracePeriodSeconds: 30

  • Hi,

    Check the IP in your curl command, and compare with the ClusterIP, there is a typo in your curl to 10.101...

    However, that is not why your curls are hanging now. Similar curl issues have been reported in earlier discussions and solutions posted as well. Feel free to check earlier discussions before reporting an issue because chances are it may have already been reported and solutioned.

    By not being able to curl to your endpoint (from the master node to a pod running on the worker node) indicates that there is a networking issue between your nodes. It could be a firewall in your Ubuntu OS (check ufw, apparmor, ...) or a networking firewall issue at GCE level. Do you have a custom VPC network? Do you have an allow-all (all-open) firewall rule? Your VMs inside the new VPC network?

    The nodes listing looks ok. I hope your pods are running as expected as well.

    I also hope your YAML file is properly formatted/indented, because the way I see it above it would clearly fail. There are also some underscores ("_") which should not be there. Check your file for accuracy.

    Regards,
    -Chris

  • btanouebtanoue Posts: 59

    For the Yaml, the copy and paste didn't work well....

  • btanouebtanoue Posts: 59

    I didn't do anything special but create a VM on GCE. I didn't touch firewalls or anything like that. Just following the directions....again.

  • In case you missed it, in section 3.1 Overview, an info box labeled !Very Important addresses the firewall for GCP. Please review it.

    Also, a day or two ago, I provided detailed instructions for you to follow when transitioned from vbox to GCP. In case you missed those too, here they are, again:

    When setting up your VMs in the cloud keep in mind the initial networking requirements - nodes need to talk to each other and talk to the internet. For this purpose, create a new custom VPC network (do not go with the predefined VPCs), assign to it a new custom firewall rule which allows all traffic (all protocols, all ports, from all sources, to all destinations) and provision your VMs inside this custom network.

    All these instructions are provided to guide you towards successfull completion of the lab exercises. Please read them carefully before applying them. They are equally important for the initial environemt setup and for the overall Kubernetes cluster behavior. A misconfigured environment leads to issues in Kubernetes - as you already experienced, and in some cases they may be quite difficult to troubleshoot.

    Good luck!
    -Chris

  • btanouebtanoue Posts: 59

    The Typeo was there because I tried to run the command again and re-generate output. I ran it with the 10.10 ip and it was the same.

  • btanouebtanoue Posts: 59

    Oh, Let me check the yaml.

  • btanouebtanoue Posts: 59

    Ignore the formatting, I can't get it to paste right, but there weren't any underscores in the yaml. I double checked.

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    annotations:
    deployment.kubernetes.io/revision: "1"
    generation: 1
    labels:
    app: nginx
    name: nginx
    namespace: default
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    app: nginx
    strategy:
    rollingUpdate:
    maxSurge: 25%
    maxUnavailable: 25%
    type: RollingUpdate
    template:
    metadata:
    creationTimestamp: null
    labels:
    app: nginx
    spec:
    containers:
    - image: nginx
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
    protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    terminationGracePeriodSeconds: 30

  • fcioancafcioanca Posts: 615

    You can use the Code block option in the ribbon to paste formatted code. Besides the yaml file, you need to ensure your environment and networking is set up properly from the start (follow the instructions in Lab 2.1, also repeated by Chris in a previous post in this thread).

  • btanouebtanoue Posts: 59

    OK, since I switched from VB to GCP, I did miss the networking change.

    I don't know how to create the rule in GCP. I'm really new to all of this so the directions don't even have a link on how to do that.

    "If using GCP you can add a rule to the project which allows all traffic to all ports."

  • btanouebtanoue Posts: 59

    OK. I tried to add a VPC and clicked all the kubernetes rules that were there and called the network kubernetes-network.
    I restarted the VMs just in case.

    Now I can't connect to the cluster:

    kubectl get svc nginx
    The connection to the server localhost:8080 was refused - did you specify the right host or port?

  • btanouebtanoue Posts: 59

    Disregard the previous mail. Wrong node...grrr.

  • btanouebtanoue Posts: 59

    OK, I still can't curl.
    I'd like to request an admin to contact me and webex so I can show you my GCP setup and Kubernetes setup. I don't really know what else I can do now. I've been stuck in Chapter 3 for a week.

  • coopcoop Posts: 598

    I'm not a moderator on this forum, but I am afraid one-on-one live support and tutoring is not available at this kind of price point.

    I think the moderators have been helping as much as they can as fast as they can. This is not real time support and when posting 4 or 5 messages in less than an hour , you can't expect an immediate response. Hopefully, with moderator support your problems will be solved.

    In my experience, most of the problems in the course forums I do moderate come from not careful enough reading of instructions, or cutting and pasting from pdf's which can butcher the characters such as underscores and other special characters. So please take a fresh look :wink:

  • serewiczserewicz Posts: 779

    We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.

  • fcioancafcioanca Posts: 615

    Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).

  • btanouebtanoue Posts: 59

    @serewicz said:
    We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.

    That would be awesome! Where is it?

  • btanouebtanoue Posts: 59

    @fcioanca said:
    Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).

    Thanks.

  • btanouebtanoue Posts: 59

    So, a lot of pain was self inflicted. I learned a lot in how GCE works and how the VPC works.
    Since my original plan was to use VB for all the labs, I shifted midway to GCE.

    When I did that, I forgot that I needed the VPC until @serewicz made the comment above and then it all started to make sense. I forgot about the video since I was hyper focused on VB at that time and when I moved, I just forgot. I'm trying to take this between meetings and daily work etc. Still my bad.

    I also learned that you can't move a VM in GCE between networks even if you shut it down. The edit didn't seem to let me do it so I had to recreate a VM to make sure it was on the right Kubernetes network since I couldn't figure out how to move it.

    After that, everything started to work. I got the LoadBalancer up and traffic was flowing. It was pretty magnificent.

    I'm still a little confused right now on how the Endpoints work. I understand the service is what creates the IP and it connects to an EP I think. I assume as we go through the rest of the material and lab I'll learn more about how it all connects.

    I'm so happy it is working. Thank you all for the help and push in the right direction.

  • I'm having the same issue. However, gcloud tells me that my nodes should be able to talk to each other:

    $ gcloud compute firewall-rules list --filter="network:kubernetes-the-hard-way"
    NAME                                    NETWORK                  DIRECTION  PRIORITY  ALLOW                 DENY  DISABLED
    kubernetes-the-hard-way-allow-external  kubernetes-the-hard-way  INGRESS    1000      tcp:22,tcp:6443,icmp        False
    kubernetes-the-hard-way-allow-internal  kubernetes-the-hard-way  INGRESS    1000      tcp,udp,icmp                False
    

    The video shows how to do this using putty and the browser, but I'm working in Linux and want to be able to build up and tear down nodes rapidly. Since I've clearly addressed the Very Important warning on firewall in google cloud, is this anything I ought to check?

    thanks!

  • serewiczserewicz Posts: 779

    Hello,

    I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.

    If you reference the setup video on page 1.8 Course Resources you see that when I fully open the network it shows all open to the all encompasing IP of 0.0.0.0/0. Perhaps your rules are not for all IPs? Perhaps the correct targets are not chosen.

    If you configure the network via the console does it work correctly?

    Regards,

  • Hi @toholdaquill,

    The firewall rules you set up for "Kubernetes the hard way" may be too restrictive for this course. It seems those rules only allow traffic to ports 22 and 6443. Is all other traffic being dropped by your rules? Kubernetes and all its addons use a lot more ports than those two. Following Tim's suggestion to open all ingress traffic, from all sources, to all ports, all protocols should resolve your issues.

    Regards,
    -Chris

  • toholdaquilltoholdaquill Posts: 4
    edited October 16

    Thank you both for the quick reply, it's much appreciated.

    I've wiped and recreated everything from scratch this morning, and confirmed (as per above) that the internal firewall rules are allow all:

    NAME                                    NETWORK                  DIRECTION  PRIORITY  ALLOW                 DENY  DISABLED
    kubernetes-the-hard-way-allow-internal  kubernetes-the-hard-way  INGRESS    1000      tcp,udp,icmp                False
    

    The commands I'm using the create the firewall rules are here, specifically:

    gcloud compute firewall-rules create kubernetes-the-hard-way-allow-**internal **\
      --allow tcp,udp,icmp \
      --network kubernetes-the-hard-way \
      --source-ranges 10.240.0.0/24,10.200.0.0/16
    

    Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.

    What are some simple debugging tests that I can incorporate early and often in a test deployment like this in order to uncover this kind of networking issue before getting to the deployment stage?

  • toholdaquilltoholdaquill Posts: 4
    edited October 16

    @serewicz said:

    I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.

    according to the gcloud documentation, allow tcp, udp, icmp means allow all:

    --allow=PROTOCOL[:PORT[-PORT]],[…]
        A list of protocols and ports whose traffic will be allowed.
    
    [...]
    
        For port-based protocols - tcp, udp, and sctp - a list of destination ports or port ranges to which the rule applies may optionally be specified. If no port or port range is specified, the rule applies to all destination ports. 
    
  • toholdaquilltoholdaquill Posts: 4
    edited October 16

    I have modified the firewall creation rules to the following:

    gcloud compute firewall-rules create kubernetes-the-hard-way-allow-all \
      --allow tcp,tcp,icmp \
      --network kubernetes-the-hard-way \
      --source-ranges 0.0.0.0/0
    

    This is the only firewall rule now, for all traffic in the VPC, and it produces this result in the gcloud browser:

    This appears to match exactly the target you mentioned above, @serewicz.

    However, after joining the first worker to the cluster, I am still unable to ping either way between master and worker on the tunl0 addresses.

    I had hoped that simply allowing all traffic for 0.0.0.0/0 here would solve the problem. I am surprised that the problem persists.

    Can you offer specific troubleshooting steps I can take as early in the Lab as possible to ensure networking is properly configured? It is a bit frustrating to have to spend close to an hour on every testing cycle to see if a change solved the problem or not.

  • serewiczserewicz Posts: 779

    Hello,

    To better help you troubleshoot as there are quite a few points on the thread, could you please let me know the particular step you are having a problem with, what you type, and the error you are receiving.

    I will test the same step on my cluster and we can begin to compare and contrast to see what is causing the issue.

    Regards,

  • Hi @toholdaquill,

    The final firewall rule fails to allow UDP, which is critical for in-cluster DNS. All three protocols need to be allowed: tcp, udp, icmp.

    What are the properties of the "kubernetes-the-hard-way" network? How is that network setup?

    Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.

    First there is a mention of Node IP subnet 10.240.0.10 - 10.240.0.20, then the Node IP addresses change to 192.168.x.y. Where in the lab exercise was this IP swap found? Replacing Node IP addresses with IPs that overlap with the default Pod IP network managed by calico (192.168.0.0/16) results in traffic routing and DNS issues in your Kubernetes cluster.

    Regards,
    -Chris

Sign In or Register to comment.