Lab 3 now on GCP. Can't get nodes to curl nginx.
Master Node Output:
tanoue@kubemaster:~$ kubectl get svc nginx
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
nginx ClusterIP 10.101.100.68 80/TCP 9m32s
tanoue@kubemaster:~$ kubectl get ep nginx
NAME ENDPOINTS AGE
nginx 192.168.2.2:80 9m37s
tanoue@kubemaster:~$ kubectl describe pod nginx-7db75b8b78-j5dq7 |grep Node:
Node: kubeworker/10.142.0.9
Worker Node:
tanoue@kubeworker:~$ sudo tcpdump -i tunl0
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
Master Node
tanoue@kubemaster:~$ curl 10.101.100.86:80
This just hangs.
tanoue@kubemaster:~$ curl 192.168.2.2:80
Hangs as well.
Any suggestion for debug?
Comments
-
Nodes seem OK:
NAME STATUS ROLES AGE VERSION
kubemaster Ready master 4h26m v1.13.1
kubeworker Ready 55m v1.13.10 -
Yaml file: first.yaml.
I think I edited it OK.apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
generation: 1
labels:
app: nginx
name: nginx
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
_ ports:
- containerPort: 80
protocol: TCP _
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 300 -
Hi,
Check the IP in your curl command, and compare with the ClusterIP, there is a typo in your curl to 10.101...
However, that is not why your curls are hanging now. Similar curl issues have been reported in earlier discussions and solutions posted as well. Feel free to check earlier discussions before reporting an issue because chances are it may have already been reported and solutioned.
By not being able to curl to your endpoint (from the master node to a pod running on the worker node) indicates that there is a networking issue between your nodes. It could be a firewall in your Ubuntu OS (check ufw, apparmor, ...) or a networking firewall issue at GCE level. Do you have a custom VPC network? Do you have an allow-all (all-open) firewall rule? Your VMs inside the new VPC network?
The nodes listing looks ok. I hope your pods are running as expected as well.
I also hope your YAML file is properly formatted/indented, because the way I see it above it would clearly fail. There are also some underscores ("_") which should not be there. Check your file for accuracy.
Regards,
-Chris0 -
For the Yaml, the copy and paste didn't work well....
0 -
I didn't do anything special but create a VM on GCE. I didn't touch firewalls or anything like that. Just following the directions....again.
0 -
In case you missed it, in section 3.1 Overview, an info box labeled !Very Important addresses the firewall for GCP. Please review it.
Also, a day or two ago, I provided detailed instructions for you to follow when transitioned from vbox to GCP. In case you missed those too, here they are, again:
When setting up your VMs in the cloud keep in mind the initial networking requirements - nodes need to talk to each other and talk to the internet. For this purpose, create a new custom VPC network (do not go with the predefined VPCs), assign to it a new custom firewall rule which allows all traffic (all protocols, all ports, from all sources, to all destinations) and provision your VMs inside this custom network.
All these instructions are provided to guide you towards successfull completion of the lab exercises. Please read them carefully before applying them. They are equally important for the initial environemt setup and for the overall Kubernetes cluster behavior. A misconfigured environment leads to issues in Kubernetes - as you already experienced, and in some cases they may be quite difficult to troubleshoot.
Good luck!
-Chris1 -
The Typeo was there because I tried to run the command again and re-generate output. I ran it with the 10.10 ip and it was the same.
0 -
Oh, Let me check the yaml.
0 -
Ignore the formatting, I can't get it to paste right, but there weren't any underscores in the yaml. I double checked.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
generation: 1
labels:
app: nginx
name: nginx
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
ports:
- containerPort: 80
protocol: TCP
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 300 -
You can use the Code block option in the ribbon to paste formatted code. Besides the yaml file, you need to ensure your environment and networking is set up properly from the start (follow the instructions in Lab 2.1, also repeated by Chris in a previous post in this thread).
0 -
OK, since I switched from VB to GCP, I did miss the networking change.
I don't know how to create the rule in GCP. I'm really new to all of this so the directions don't even have a link on how to do that.
"If using GCP you can add a rule to the project which allows all traffic to all ports."
0 -
OK. I tried to add a VPC and clicked all the kubernetes rules that were there and called the network kubernetes-network.
I restarted the VMs just in case.Now I can't connect to the cluster:
kubectl get svc nginx
The connection to the server localhost:8080 was refused - did you specify the right host or port?0 -
Disregard the previous mail. Wrong node...grrr.
0 -
OK, I still can't curl.
I'd like to request an admin to contact me and webex so I can show you my GCP setup and Kubernetes setup. I don't really know what else I can do now. I've been stuck in Chapter 3 for a week.0 -
I'm not a moderator on this forum, but I am afraid one-on-one live support and tutoring is not available at this kind of price point.
I think the moderators have been helping as much as they can as fast as they can. This is not real time support and when posting 4 or 5 messages in less than an hour , you can't expect an immediate response. Hopefully, with moderator support your problems will be solved.
In my experience, most of the problems in the course forums I do moderate come from not careful enough reading of instructions, or cutting and pasting from pdf's which can butcher the characters such as underscores and other special characters. So please take a fresh look
0 -
We do have a video available with details of how to set up the GCE lab environment. Perhaps it would show where the setup is incorrect.
0 -
Check the online resources as the video is available there - page 1.8. Course Resources has the instructions on how to access them (same location as the files you are using for labs).
0 -
So, a lot of pain was self inflicted. I learned a lot in how GCE works and how the VPC works.
Since my original plan was to use VB for all the labs, I shifted midway to GCE.When I did that, I forgot that I needed the VPC until @serewicz made the comment above and then it all started to make sense. I forgot about the video since I was hyper focused on VB at that time and when I moved, I just forgot. I'm trying to take this between meetings and daily work etc. Still my bad.
I also learned that you can't move a VM in GCE between networks even if you shut it down. The edit didn't seem to let me do it so I had to recreate a VM to make sure it was on the right Kubernetes network since I couldn't figure out how to move it.
After that, everything started to work. I got the LoadBalancer up and traffic was flowing. It was pretty magnificent.
I'm still a little confused right now on how the Endpoints work. I understand the service is what creates the IP and it connects to an EP I think. I assume as we go through the rest of the material and lab I'll learn more about how it all connects.
I'm so happy it is working. Thank you all for the help and push in the right direction.
1 -
I'm having the same issue. However, gcloud tells me that my nodes should be able to talk to each other:
$ gcloud compute firewall-rules list --filter="network:kubernetes-the-hard-way" NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED kubernetes-the-hard-way-allow-external kubernetes-the-hard-way INGRESS 1000 tcp:22,tcp:6443,icmp False kubernetes-the-hard-way-allow-internal kubernetes-the-hard-way INGRESS 1000 tcp,udp,icmp False
The video shows how to do this using putty and the browser, but I'm working in Linux and want to be able to build up and tear down nodes rapidly. Since I've clearly addressed the Very Important warning on firewall in google cloud, is this anything I ought to check?
thanks!
0 -
Hello,
I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.
If you reference the setup video on page 1.8 Course Resources you see that when I fully open the network it shows all open to the all encompasing IP of 0.0.0.0/0. Perhaps your rules are not for all IPs? Perhaps the correct targets are not chosen.
If you configure the network via the console does it work correctly?
Regards,
-1 -
Hi @toholdaquill,
The firewall rules you set up for "Kubernetes the hard way" may be too restrictive for this course. It seems those rules only allow traffic to ports 22 and 6443. Is all other traffic being dropped by your rules? Kubernetes and all its addons use a lot more ports than those two. Following Tim's suggestion to open all ingress traffic, from all sources, to all ports, all protocols should resolve your issues.
Regards,
-Chris-1 -
Thank you both for the quick reply, it's much appreciated.
I've wiped and recreated everything from scratch this morning, and confirmed (as per above) that the internal firewall rules are allow all:
NAME NETWORK DIRECTION PRIORITY ALLOW DENY DISABLED kubernetes-the-hard-way-allow-internal kubernetes-the-hard-way INGRESS 1000 tcp,udp,icmp False
The commands I'm using the create the firewall rules are here, specifically:
gcloud compute firewall-rules create kubernetes-the-hard-way-allow-**internal **\ --allow tcp,udp,icmp \ --network kubernetes-the-hard-way \ --source-ranges 10.240.0.0/24,10.200.0.0/16
Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.
What are some simple debugging tests that I can incorporate early and often in a test deployment like this in order to uncover this kind of networking issue before getting to the deployment stage?
0 -
@serewicz said:
I note that your line does not say all traffic. I don't know the backend of what Google means when they report that as distinct from tcp,udp, icmp.
according to the gcloud documentation, allow tcp, udp, icmp means allow all:
--allow=PROTOCOL[:PORT[-PORT]],[…] A list of protocols and ports whose traffic will be allowed. [...] For port-based protocols - tcp, udp, and sctp - a list of destination ports or port ranges to which the rule applies may optionally be specified. If no port or port range is specified, the rule applies to all destination ports.
0 -
I have modified the firewall creation rules to the following:
gcloud compute firewall-rules create kubernetes-the-hard-way-allow-all \ --allow tcp,tcp,icmp \ --network kubernetes-the-hard-way \ --source-ranges 0.0.0.0/0
This is the only firewall rule now, for all traffic in the VPC, and it produces this result in the gcloud browser:
This appears to match exactly the target you mentioned above, @serewicz.
However, after joining the first worker to the cluster, I am still unable to ping either way between master and worker on the tunl0 addresses.
I had hoped that simply allowing all traffic for 0.0.0.0/0 here would solve the problem. I am surprised that the problem persists.
Can you offer specific troubleshooting steps I can take as early in the Lab as possible to ensure networking is properly configured? It is a bit frustrating to have to spend close to an hour on every testing cycle to see if a change solved the problem or not.
0 -
Hello,
To better help you troubleshoot as there are quite a few points on the thread, could you please let me know the particular step you are having a problem with, what you type, and the error you are receiving.
I will test the same step on my cluster and we can begin to compare and contrast to see what is causing the issue.
Regards,
0 -
Hi @toholdaquill,
The final firewall rule fails to allow UDP, which is critical for in-cluster DNS. All three protocols need to be allowed: tcp, udp, icmp.
What are the properties of the "kubernetes-the-hard-way" network? How is that network setup?
Stepping through the lab again carefully, I'm able to ping both ways between master and worker on the ens4 interface (in my case, 10.240.0.10 <-> 10.240.0.20). Everything works fine until Lab Section 3.1.7. The RFC 1918 addresses for tunl0 don't work in either direction. master (192.168.192.64) cannot ping worker (192.168.43.0) or vice versa.
First there is a mention of Node IP subnet 10.240.0.10 - 10.240.0.20, then the Node IP addresses change to 192.168.x.y. Where in the lab exercise was this IP swap found? Replacing Node IP addresses with IPs that overlap with the default Pod IP network managed by calico (192.168.0.0/16) results in traffic routing and DNS issues in your Kubernetes cluster.
Regards,
-Chris0
Categories
- All Categories
- 217 LFX Mentorship
- 217 LFX Mentorship: Linux Kernel
- 788 Linux Foundation IT Professional Programs
- 352 Cloud Engineer IT Professional Program
- 177 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 146 Cloud Native Developer IT Professional Program
- 137 Express Training Courses
- 137 Express Courses - Discussion Forum
- 6.2K Training Courses
- 46 LFC110 Class Forum - Discontinued
- 70 LFC131 Class Forum
- 42 LFD102 Class Forum
- 226 LFD103 Class Forum
- 18 LFD110 Class Forum
- 37 LFD121 Class Forum
- 18 LFD133 Class Forum
- 7 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 4 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 693 LFD259 Class Forum
- 111 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 145 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 2 LFS116 Class Forum
- 4 LFS118 Class Forum
- 6 LFS142 Class Forum
- 5 LFS144 Class Forum
- 4 LFS145 Class Forum
- 2 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 2 LFS157 Class Forum
- 25 LFS158 Class Forum
- 7 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 31 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム
- 18 LFS203 Class Forum
- 130 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 2 LFS245 Class Forum
- LFS246 Class Forum
- 48 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 150 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 7 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 118 LFS260 Class Forum
- 159 LFS261 Class Forum
- 42 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 22 LFS268 Class Forum
- 30 LFS269 Class Forum
- LFS270 Class Forum
- 202 LFS272 Class Forum
- 2 LFS272-JP クラス フォーラム
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 9 LFW111 Class Forum
- 259 LFW211 Class Forum
- 181 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 795 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 102 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 63 Mobile Computing
- 18 Android
- 33 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 370 Off Topic
- 114 Introductions
- 173 Small Talk
- 22 Study Material
- 805 Programming and Development
- 303 Kernel Development
- 484 Software Development
- 1.8K Software
- 261 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 96 All In Program
- 96 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)