Lab 3 now on GCP. Can't get nodes to curl nginx.

btanoue · April 2019

Master Node Output:
tanoue@kubemaster:~$ kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.101.100.68 80/TCP 9m32s
tanoue@kubemaster:~$ kubectl get ep nginx
NAME ENDPOINTS AGE
nginx 192.168.2.2:80 9m37s
tanoue@kubemaster:~$ kubectl describe pod nginx-7db75b8b78-j5dq7 |grep Node:
Node: kubeworker/10.142.0.9

Worker Node:
tanoue@kubeworker:~$ sudo tcpdump -i tunl0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes

Master Node
tanoue@kubemaster:~$ curl 10.101.100.86:80

This just hangs.

tanoue@kubemaster:~$ curl 192.168.2.2:80

Hangs as well.

Any suggestion for debug?

btanoue · April 2019

Nodes seem OK:

NAME STATUS ROLES AGE VERSION
kubemaster Ready master 4h26m v1.13.1
kubeworker Ready 55m v1.13.1

btanoue · April 2019

Yaml file: first.yaml.
I think I edited it OK.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
generation: 1
labels:
app: nginx
name: nginx
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
_ ports:
- containerPort: 80
protocol: TCP _
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30

chrispokorni · April 2019

Hi,

Check the IP in your curl command, and compare with the ClusterIP, there is a typo in your curl to 10.101...

However, that is not why your curls are hanging now. Similar curl issues have been reported in earlier discussions and solutions posted as well. Feel free to check earlier discussions before reporting an issue because chances are it may have already been reported and solutioned.

By not being able to curl to your endpoint (from the master node to a pod running on the worker node) indicates that there is a networking issue between your nodes. It could be a firewall in your Ubuntu OS (check ufw, apparmor, ...) or a networking firewall issue at GCE level. Do you have a custom VPC network? Do you have an allow-all (all-open) firewall rule? Your VMs inside the new VPC network?

The nodes listing looks ok. I hope your pods are running as expected as well.

I also hope your YAML file is properly formatted/indented, because the way I see it above it would clearly fail. There are also some underscores ("_") which should not be there. Check your file for accuracy.

Regards,
-Chris

btanoue · April 2019

For the Yaml, the copy and paste didn't work well....

btanoue · April 2019

I didn't do anything special but create a VM on GCE. I didn't touch firewalls or anything like that. Just following the directions....again.

chrispokorni · April 2019

In case you missed it, in section 3.1 Overview, an info box labeled !Very Important addresses the firewall for GCP. Please review it.

Also, a day or two ago, I provided detailed instructions for you to follow when transitioned from vbox to GCP. In case you missed those too, here they are, again:

When setting up your VMs in the cloud keep in mind the initial networking requirements - nodes need to talk to each other and talk to the internet. For this purpose, create a new custom VPC network (do not go with the predefined VPCs), assign to it a new custom firewall rule which allows all traffic (all protocols, all ports, from all sources, to all destinations) and provision your VMs inside this custom network.

All these instructions are provided to guide you towards successfull completion of the lab exercises. Please read them carefully before applying them. They are equally important for the initial environemt setup and for the overall Kubernetes cluster behavior. A misconfigured environment leads to issues in Kubernetes - as you already experienced, and in some cases they may be quite difficult to troubleshoot.

Good luck!
-Chris

btanoue · April 2019

The Typeo was there because I tried to run the command again and re-generate output. I ran it with the 10.10 ip and it was the same.

btanoue · April 2019

Oh, Let me check the yaml.

btanoue · April 2019

Ignore the formatting, I can't get it to paste right, but there weren't any underscores in the yaml. I double checked.

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
generation: 1
labels:
app: nginx
name: nginx
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30

fcioanca · April 2019

You can use the Code block option in the ribbon to paste formatted code. Besides the yaml file, you need to ensure your environment and networking is set up properly from the start (follow the instructions in Lab 2.1, also repeated by Chris in a previous post in this thread).

btanoue · April 2019

OK, since I switched from VB to GCP, I did miss the networking change.

I don't know how to create the rule in GCP. I'm really new to all of this so the directions don't even have a link on how to do that.

"If using GCP you can add a rule to the project which allows all traffic to all ports."

btanoue · April 2019

OK. I tried to add a VPC and clicked all the kubernetes rules that were there and called the network kubernetes-network.
I restarted the VMs just in case.

Now I can't connect to the cluster:

kubectl get svc nginx
The connection to the server localhost:8080 was refused - did you specify the right host or port?

btanoue · April 2019

Disregard the previous mail. Wrong node...grrr.

btanoue · April 2019

OK, I still can't curl.
I'd like to request an admin to contact me and webex so I can show you my GCP setup and Kubernetes setup. I don't really know what else I can do now. I've been stuck in Chapter 3 for a week.

coop · April 2019

I'm not a moderator on this forum, but I am afraid one-on-one live support and tutoring is not available at this kind of price point.

I think the moderators have been helping as much as they can as fast as they can. This is not real time support and when posting 4 or 5 messages in less than an hour , you can't expect an immediate response. Hopefully, with moderator support your problems will be solved.

In my experience, most of the problems in the course forums I do moderate come from not careful enough reading of instructions, or cutting and pasting from pdf's which can butcher the characters such as underscores and other special characters. So please take a fresh look

serewicz · April 2019

We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.

fcioanca · April 2019

Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).

btanoue · April 2019

@serewicz said:
We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.

That would be awesome! Where is it?

btanoue · April 2019

@fcioanca said:
Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).

Thanks.

btanoue · April 2019

So, a lot of pain was self inflicted. I learned a lot in how GCE works and how the VPC works.
Since my original plan was to use VB for all the labs, I shifted midway to GCE.

When I did that, I forgot that I needed the VPC until @serewicz made the comment above and then it all started to make sense. I forgot about the video since I was hyper focused on VB at that time and when I moved, I just forgot. I'm trying to take this between meetings and daily work etc. Still my bad.

I also learned that you can't move a VM in GCE between networks even if you shut it down. The edit didn't seem to let me do it so I had to recreate a VM to make sure it was on the right Kubernetes network since I couldn't figure out how to move it.

After that, everything started to work. I got the LoadBalancer up and traffic was flowing. It was pretty magnificent.

I'm still a little confused right now on how the Endpoints work. I understand the service is what creates the IP and it connects to an EP I think. I assume as we go through the rest of the material and lab I'll learn more about how it all connects.

I'm so happy it is working. Thank you all for the help and push in the right direction.

toholdaquill · October 2020

I'm having the same issue. However, gcloud tells me that my nodes should be able to talk to each other:

$ gcloud compute firewall-rules list --filter="network:kubernetes-the-hard-way"
NAME                                    NETWORK                  DIRECTION  PRIORITY  ALLOW                 DENY  DISABLED
kubernetes-the-hard-way-allow-external  kubernetes-the-hard-way  INGRESS    1000      tcp:22,tcp:6443,icmp        False
kubernetes-the-hard-way-allow-internal  kubernetes-the-hard-way  INGRESS    1000      tcp,udp,icmp                False

The video shows how to do this using putty and the browser, but I'm working in Linux and want to be able to build up and tear down nodes rapidly. Since I've clearly addressed the Very Important warning on firewall in google cloud, is this anything I ought to check?

thanks!

serewicz · October 2020

Hello,

I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.

If you reference the setup video on page 1.8 Course Resources you see that when I fully open the network it shows all open to the all encompasing IP of 0.0.0.0/0. Perhaps your rules are not for all IPs? Perhaps the correct targets are not chosen.

If you configure the network via the console does it work correctly?

Regards,

chrispokorni · October 2020

Hi @toholdaquill,

The firewall rules you set up for "Kubernetes the hard way" may be too restrictive for this course. It seems those rules only allow traffic to ports 22 and 6443. Is all other traffic being dropped by your rules? Kubernetes and all its addons use a lot more ports than those two. Following Tim's suggestion to open all ingress traffic, from all sources, to all ports, all protocols should resolve your issues.

Regards,
-Chris

toholdaquill · October 2020

Thank you both for the quick reply, it's much appreciated.

I've wiped and recreated everything from scratch this morning, and confirmed (as per above) that the internal firewall rules are allow all:

NAME                                    NETWORK                  DIRECTION  PRIORITY  ALLOW                 DENY  DISABLED
kubernetes-the-hard-way-allow-internal  kubernetes-the-hard-way  INGRESS    1000      tcp,udp,icmp                False

The commands I'm using the create the firewall rules are here, specifically:

gcloud compute firewall-rules create kubernetes-the-hard-way-allow-**internal **\
  --allow tcp,udp,icmp \
  --network kubernetes-the-hard-way \
  --source-ranges 10.240.0.0/24,10.200.0.0/16

Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.

What are some simple debugging tests that I can incorporate early and often in a test deployment like this in order to uncover this kind of networking issue before getting to the deployment stage?

toholdaquill · October 2020

@serewicz said:

I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.

according to the gcloud documentation, allow tcp, udp, icmp means allow all:

--allow=PROTOCOL[:PORT[-PORT]],[…]
    A list of protocols and ports whose traffic will be allowed.

[...]

    For port-based protocols - tcp, udp, and sctp - a list of destination ports or port ranges to which the rule applies may optionally be specified. If no port or port range is specified, the rule applies to all destination ports.

toholdaquill · October 2020

I have modified the firewall creation rules to the following:

gcloud compute firewall-rules create kubernetes-the-hard-way-allow-all \
  --allow tcp,tcp,icmp \
  --network kubernetes-the-hard-way \
  --source-ranges 0.0.0.0/0

This is the only firewall rule now, for all traffic in the VPC, and it produces this result in the gcloud browser:

This appears to match exactly the target you mentioned above, @serewicz.

However, after joining the first worker to the cluster, I am still unable to ping either way between master and worker on the tunl0 addresses.

I had hoped that simply allowing all traffic for 0.0.0.0/0 here would solve the problem. I am surprised that the problem persists.

Can you offer specific troubleshooting steps I can take as early in the Lab as possible to ensure networking is properly configured? It is a bit frustrating to have to spend close to an hour on every testing cycle to see if a change solved the problem or not.

serewicz · October 2020

Hello,

To better help you troubleshoot as there are quite a few points on the thread, could you please let me know the particular step you are having a problem with, what you type, and the error you are receiving.

I will test the same step on my cluster and we can begin to compare and contrast to see what is causing the issue.

Regards,

chrispokorni · October 2020

Hi @toholdaquill,

The final firewall rule fails to allow UDP, which is critical for in-cluster DNS. All three protocols need to be allowed: tcp, udp, icmp.

What are the properties of the "kubernetes-the-hard-way" network? How is that network setup?

Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.

First there is a mention of Node IP subnet 10.240.0.10 - 10.240.0.20, then the Node IP addresses change to 192.168.x.y. Where in the lab exercise was this IP swap found? Replacing Node IP addresses with IPs that overlap with the default Pod IP network managed by calico (192.168.0.0/16) results in traffic routing and DNS issues in your Kubernetes cluster.

Regards,
-Chris

Lab 3 now on GCP. Can't get nodes to curl nginx.

Comments

Categories

Upcoming Training

Kubernetes Administration (LFS458)

Linux System Administration (LFS301)

Open Source Virtualization (LFS462)

Linux Kernel Debugging and Security (LFD440)