Welcome to the Linux Foundation Forum!

Lab 3 now on GCP. Can't get nodes to curl nginx.

btanoue
btanoue Posts: 59
edited April 2019 in LFS258 Class Forum

Master Node Output:
tanoue@kubemaster:~$ kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.101.100.68 80/TCP 9m32s
tanoue@kubemaster:~$ kubectl get ep nginx
NAME ENDPOINTS AGE
nginx 192.168.2.2:80 9m37s
tanoue@kubemaster:~$ kubectl describe pod nginx-7db75b8b78-j5dq7 |grep Node:
Node: kubeworker/10.142.0.9

Worker Node:
tanoue@kubeworker:~$ sudo tcpdump -i tunl0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes

Master Node
tanoue@kubemaster:~$ curl 10.101.100.86:80

This just hangs.

tanoue@kubemaster:~$ curl 192.168.2.2:80

Hangs as well.

Any suggestion for debug?

Comments

  • btanoue
    btanoue Posts: 59

    Nodes seem OK:

    NAME STATUS ROLES AGE VERSION
    kubemaster Ready master 4h26m v1.13.1
    kubeworker Ready 55m v1.13.1

  • btanoue
    btanoue Posts: 59

    Yaml file: first.yaml.
    I think I edited it OK.

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    annotations:
    deployment.kubernetes.io/revision: "1"
    generation: 1
    labels:
    app: nginx
    name: nginx
    namespace: default
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    app: nginx
    strategy:
    rollingUpdate:
    maxSurge: 25%
    maxUnavailable: 25%
    type: RollingUpdate
    template:
    metadata:
    creationTimestamp: null
    labels:
    app: nginx
    spec:
    containers:
    - image: nginx
    imagePullPolicy: Always
    name: nginx
    _ ports:
    - containerPort: 80
    protocol: TCP _
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    terminationGracePeriodSeconds: 30

  • chrispokorni
    chrispokorni Posts: 2,346

    Hi,

    Check the IP in your curl command, and compare with the ClusterIP, there is a typo in your curl to 10.101...

    However, that is not why your curls are hanging now. Similar curl issues have been reported in earlier discussions and solutions posted as well. Feel free to check earlier discussions before reporting an issue because chances are it may have already been reported and solutioned.

    By not being able to curl to your endpoint (from the master node to a pod running on the worker node) indicates that there is a networking issue between your nodes. It could be a firewall in your Ubuntu OS (check ufw, apparmor, ...) or a networking firewall issue at GCE level. Do you have a custom VPC network? Do you have an allow-all (all-open) firewall rule? Your VMs inside the new VPC network?

    The nodes listing looks ok. I hope your pods are running as expected as well.

    I also hope your YAML file is properly formatted/indented, because the way I see it above it would clearly fail. There are also some underscores ("_") which should not be there. Check your file for accuracy.

    Regards,
    -Chris

  • btanoue
    btanoue Posts: 59

    For the Yaml, the copy and paste didn't work well....

  • btanoue
    btanoue Posts: 59

    I didn't do anything special but create a VM on GCE. I didn't touch firewalls or anything like that. Just following the directions....again.

  • chrispokorni
    chrispokorni Posts: 2,346

    In case you missed it, in section 3.1 Overview, an info box labeled !Very Important addresses the firewall for GCP. Please review it.

    Also, a day or two ago, I provided detailed instructions for you to follow when transitioned from vbox to GCP. In case you missed those too, here they are, again:

    When setting up your VMs in the cloud keep in mind the initial networking requirements - nodes need to talk to each other and talk to the internet. For this purpose, create a new custom VPC network (do not go with the predefined VPCs), assign to it a new custom firewall rule which allows all traffic (all protocols, all ports, from all sources, to all destinations) and provision your VMs inside this custom network.

    All these instructions are provided to guide you towards successfull completion of the lab exercises. Please read them carefully before applying them. They are equally important for the initial environemt setup and for the overall Kubernetes cluster behavior. A misconfigured environment leads to issues in Kubernetes - as you already experienced, and in some cases they may be quite difficult to troubleshoot.

    Good luck!
    -Chris

  • btanoue
    btanoue Posts: 59

    The Typeo was there because I tried to run the command again and re-generate output. I ran it with the 10.10 ip and it was the same.

  • btanoue
    btanoue Posts: 59

    Oh, Let me check the yaml.

  • btanoue
    btanoue Posts: 59

    Ignore the formatting, I can't get it to paste right, but there weren't any underscores in the yaml. I double checked.

    apiVersion: extensions/v1beta1
    kind: Deployment
    metadata:
    annotations:
    deployment.kubernetes.io/revision: "1"
    generation: 1
    labels:
    app: nginx
    name: nginx
    namespace: default
    spec:
    progressDeadlineSeconds: 600
    replicas: 1
    revisionHistoryLimit: 10
    selector:
    matchLabels:
    app: nginx
    strategy:
    rollingUpdate:
    maxSurge: 25%
    maxUnavailable: 25%
    type: RollingUpdate
    template:
    metadata:
    creationTimestamp: null
    labels:
    app: nginx
    spec:
    containers:
    - image: nginx
    imagePullPolicy: Always
    name: nginx
    ports:
    - containerPort: 80
    protocol: TCP
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    dnsPolicy: ClusterFirst
    restartPolicy: Always
    schedulerName: default-scheduler
    securityContext: {}
    terminationGracePeriodSeconds: 30

  • fcioanca
    fcioanca Posts: 2,149

    You can use the Code block option in the ribbon to paste formatted code. Besides the yaml file, you need to ensure your environment and networking is set up properly from the start (follow the instructions in Lab 2.1, also repeated by Chris in a previous post in this thread).

  • btanoue
    btanoue Posts: 59

    OK, since I switched from VB to GCP, I did miss the networking change.

    I don't know how to create the rule in GCP. I'm really new to all of this so the directions don't even have a link on how to do that.

    "If using GCP you can add a rule to the project which allows all traffic to all ports."

  • btanoue
    btanoue Posts: 59

    OK. I tried to add a VPC and clicked all the kubernetes rules that were there and called the network kubernetes-network.
    I restarted the VMs just in case.

    Now I can't connect to the cluster:

    kubectl get svc nginx
    The connection to the server localhost:8080 was refused - did you specify the right host or port?

  • btanoue
    btanoue Posts: 59

    Disregard the previous mail. Wrong node...grrr.

  • btanoue
    btanoue Posts: 59

    OK, I still can't curl.
    I'd like to request an admin to contact me and webex so I can show you my GCP setup and Kubernetes setup. I don't really know what else I can do now. I've been stuck in Chapter 3 for a week.

  • coop
    coop Posts: 916

    I'm not a moderator on this forum, but I am afraid one-on-one live support and tutoring is not available at this kind of price point.

    I think the moderators have been helping as much as they can as fast as they can. This is not real time support and when posting 4 or 5 messages in less than an hour , you can't expect an immediate response. Hopefully, with moderator support your problems will be solved.

    In my experience, most of the problems in the course forums I do moderate come from not careful enough reading of instructions, or cutting and pasting from pdf's which can butcher the characters such as underscores and other special characters. So please take a fresh look :wink:

  • serewicz
    serewicz Posts: 1,000

    We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.

  • fcioanca
    fcioanca Posts: 2,149

    Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).

  • btanoue
    btanoue Posts: 59

    @serewicz said:
    We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.

    That would be awesome! Where is it?

  • btanoue
    btanoue Posts: 59

    @fcioanca said:
    Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).

    Thanks.

  • btanoue
    btanoue Posts: 59

    So, a lot of pain was self inflicted. I learned a lot in how GCE works and how the VPC works.
    Since my original plan was to use VB for all the labs, I shifted midway to GCE.

    When I did that, I forgot that I needed the VPC until @serewicz made the comment above and then it all started to make sense. I forgot about the video since I was hyper focused on VB at that time and when I moved, I just forgot. I'm trying to take this between meetings and daily work etc. Still my bad.

    I also learned that you can't move a VM in GCE between networks even if you shut it down. The edit didn't seem to let me do it so I had to recreate a VM to make sure it was on the right Kubernetes network since I couldn't figure out how to move it.

    After that, everything started to work. I got the LoadBalancer up and traffic was flowing. It was pretty magnificent.

    I'm still a little confused right now on how the Endpoints work. I understand the service is what creates the IP and it connects to an EP I think. I assume as we go through the rest of the material and lab I'll learn more about how it all connects.

    I'm so happy it is working. Thank you all for the help and push in the right direction.

  • I'm having the same issue. However, gcloud tells me that my nodes should be able to talk to each other:

    $ gcloud compute firewall-rules list --filter="network:kubernetes-the-hard-way"
    NAME                                    NETWORK                  DIRECTION  PRIORITY  ALLOW                 DENY  DISABLED
    kubernetes-the-hard-way-allow-external  kubernetes-the-hard-way  INGRESS    1000      tcp:22,tcp:6443,icmp        False
    kubernetes-the-hard-way-allow-internal  kubernetes-the-hard-way  INGRESS    1000      tcp,udp,icmp                False
    

    The video shows how to do this using putty and the browser, but I'm working in Linux and want to be able to build up and tear down nodes rapidly. Since I've clearly addressed the Very Important warning on firewall in google cloud, is this anything I ought to check?

    thanks!

  • serewicz
    serewicz Posts: 1,000

    Hello,

    I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.

    If you reference the setup video on page 1.8 Course Resources you see that when I fully open the network it shows all open to the all encompasing IP of 0.0.0.0/0. Perhaps your rules are not for all IPs? Perhaps the correct targets are not chosen.

    If you configure the network via the console does it work correctly?

    Regards,

  • chrispokorni
    chrispokorni Posts: 2,346

    Hi @toholdaquill,

    The firewall rules you set up for "Kubernetes the hard way" may be too restrictive for this course. It seems those rules only allow traffic to ports 22 and 6443. Is all other traffic being dropped by your rules? Kubernetes and all its addons use a lot more ports than those two. Following Tim's suggestion to open all ingress traffic, from all sources, to all ports, all protocols should resolve your issues.

    Regards,
    -Chris

  • toholdaquill
    toholdaquill Posts: 5
    edited October 2020

    Thank you both for the quick reply, it's much appreciated.

    I've wiped and recreated everything from scratch this morning, and confirmed (as per above) that the internal firewall rules are allow all:

    NAME                                    NETWORK                  DIRECTION  PRIORITY  ALLOW                 DENY  DISABLED
    kubernetes-the-hard-way-allow-internal  kubernetes-the-hard-way  INGRESS    1000      tcp,udp,icmp                False
    

    The commands I'm using the create the firewall rules are here, specifically:

    gcloud compute firewall-rules create kubernetes-the-hard-way-allow-**internal **\
      --allow tcp,udp,icmp \
      --network kubernetes-the-hard-way \
      --source-ranges 10.240.0.0/24,10.200.0.0/16
    

    Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.

    What are some simple debugging tests that I can incorporate early and often in a test deployment like this in order to uncover this kind of networking issue before getting to the deployment stage?

  • toholdaquill
    toholdaquill Posts: 5
    edited October 2020

    @serewicz said:

    I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.

    according to the gcloud documentation, allow tcp, udp, icmp means allow all:

    --allow=PROTOCOL[:PORT[-PORT]],[…]
        A list of protocols and ports whose traffic will be allowed.
    
    [...]
    
        For port-based protocols - tcp, udp, and sctp - a list of destination ports or port ranges to which the rule applies may optionally be specified. If no port or port range is specified, the rule applies to all destination ports. 
    
  • toholdaquill
    toholdaquill Posts: 5
    edited October 2020

    I have modified the firewall creation rules to the following:

    gcloud compute firewall-rules create kubernetes-the-hard-way-allow-all \
      --allow tcp,tcp,icmp \
      --network kubernetes-the-hard-way \
      --source-ranges 0.0.0.0/0
    

    This is the only firewall rule now, for all traffic in the VPC, and it produces this result in the gcloud browser:

    This appears to match exactly the target you mentioned above, @serewicz.

    However, after joining the first worker to the cluster, I am still unable to ping either way between master and worker on the tunl0 addresses.

    I had hoped that simply allowing all traffic for 0.0.0.0/0 here would solve the problem. I am surprised that the problem persists.

    Can you offer specific troubleshooting steps I can take as early in the Lab as possible to ensure networking is properly configured? It is a bit frustrating to have to spend close to an hour on every testing cycle to see if a change solved the problem or not.

  • serewicz
    serewicz Posts: 1,000

    Hello,

    To better help you troubleshoot as there are quite a few points on the thread, could you please let me know the particular step you are having a problem with, what you type, and the error you are receiving.

    I will test the same step on my cluster and we can begin to compare and contrast to see what is causing the issue.

    Regards,

  • chrispokorni
    chrispokorni Posts: 2,346

    Hi @toholdaquill,

    The final firewall rule fails to allow UDP, which is critical for in-cluster DNS. All three protocols need to be allowed: tcp, udp, icmp.

    What are the properties of the "kubernetes-the-hard-way" network? How is that network setup?

    Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.

    First there is a mention of Node IP subnet 10.240.0.10 - 10.240.0.20, then the Node IP addresses change to 192.168.x.y. Where in the lab exercise was this IP swap found? Replacing Node IP addresses with IPs that overlap with the default Pod IP network managed by calico (192.168.0.0/16) results in traffic routing and DNS issues in your Kubernetes cluster.

    Regards,
    -Chris

Categories

Upcoming Training