Issue on 2019-11-05 version Lab 3.2 problem
The excercise regards the configuration of a local docker repository.
I am doing the lab in a bare-metal cluster on Ubuntu 18.04 configured via the k8sMaster and k8sSecond scripts.
Two nodes, one of which is an untaint master.
The problem occurs at the step 21, where we are supposed to access the "registry" service via curl directly from the master node's shell.
In my case the connection hangs until the timeout is reached, I have also disabled the ufw so there shouldn't be any firewall problem as mentioned in the lab.
Being the "registry" service of ClusterIP type, I was not surprised of being unable to access it directly from the shell as I knew that ClusterIP services are accessible only from inside kubernetes.
But from the description of the step 21 it seems that we should be able to access the service IP:Port directly from the shell.
Am I doing anything wrong?
Could anyone confirm that, from the master machine's shell, a ClusterIP svc address and ports should be directly reachable?
Comments
-
Hi @gfalasca,
You are correct, you should be able to reach a Cluster IP curling from either node of your cluster, just as described in the exercise.
Did you encounter any issues in the previous Lab exercise 2.1, possibly on step 10, where you were also directed to curl to a Cluster IP from the master node?
Regards,
-Chris0 -
Hi Chris, thanks a lot.
Yes, I had the problem also there, I went forward but yes, already at the step 7 of lab 2.3, where there is the curl to the basic pod main container listening on port 80.Double checked and apparmor is down and SELinux not installed. Ufw also disabled.
Currently the status of the kube-system pods is ok, but I noticed a few things that may (or may not) have a correlation with the problem I face:
after the installation, the calico-node pods where 0/1 ready. Seems that the bird-ready readiness probe was failing, so I tried commenting it at the line 657 of calico.yaml. After that they started correctly, but sometimes one of them fall in an error state. It happened after hours of operations. Deleting and redeploying it seems to restore its readiness.
Even if they shouldn't affect the curl to the service as I am directly using the ClusterIP IP address, also the coredns pods seem to be a bit unstable, sometimes I find one of them in CrashLoopBackOff state.
coredns-5644d7b6d9-5r4x4 0/1 CrashLoopBackOff 20 2d
After a while the pod becomes apparently ready and running again.
kubernetes@dev2:~$ k get pod -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-6bbf58546b-btk9d 1/1 Running 1 23h calico-node-87vj7 1/1 Running 4 18h calico-node-n856s 1/1 Running 0 103s coredns-5644d7b6d9-5r4x4 1/1 Running 19 2d coredns-5644d7b6d9-78jng 1/1 Running 0 104s etcd-dev2 1/1 Running 12 2d kube-apiserver-dev2 1/1 Running 13 2d kube-controller-manager-dev2 1/1 Running 12 2d kube-proxy-skjqf 1/1 Running 1 2d kube-proxy-x62v9 1/1 Running 16 2d kube-scheduler-dev2 1/1 Running 12 2d
`
Any idea about other checks I can do?
Maybe the machine where the master is installed is too small. It's a physical srvr with 2GB ram only, but there is only the k8s master on it.0 -
You may have just answered your own question
It is safe to have 4 GB mem to every 1 CPU. But exercises will run faster with 2 CPUs and 8 GB mem, according to the Overview section of Lab exercise 2.1.Commenting out a readiness probe is just like a flu medicine - it only masks the symptoms, while the flu is still there.
Regards,
-Chris0 -
Thanks a lot Chris, strange enough everything works as expected from the worker node, I can curl both the basic pod and the registry service without problems.
From the master instead it doesn't work, I will try with the master on a bigger machine as you suggested, but sounds weird that everything works from the worker node.Thanks and regards
0 -
Hi @chrispokorni the problem I am having seems very similar to the issue reported by @rcougil in Issue with worker node on Lab 3.2 Step 30 (worker pull from registry)
Before moving the master to a bigger machine I did some further analysis and I noticed the following:- The IP address + Container port of a container seems to be reachable from the same node where the pod is deployed (-o wide). From the other machine of my two nodes cluster, curling that address:port hangs until the connection timeout is reached. Sniffing the packets reveals the presence of only the SYN packets.
- The ClusterIP+port of a service which has one single endpoint (like with the basicservice and registry services of the lab) seem to be perfectly reachable, but only from the node where the pod is actually deployed
@rcougil maybe you could verify if in your case the ClusterIP service behave in the same way?
To me it still seems to be a network related problem, as all the rest seems to work (scaling deployment, creation of resources,...).
In the next days I will migrate the master to a bigger machine with at least 4Gb of RAM per core as suggested by @chrispokorni and will let you know if the issue disappears.0 -
Hi @gfalasca, yes same issue. I'm pretty sure Lab instructions are wrong. A service type ClusterIP cannot be reached from node Worker. Only would be accesible from a Pod running in Worker node, but not from the node itself.
This K8s course is the crappiest e-learning course i've done in my entire life, is extremely overprice for it's content (or the lack of it), also is plenty of errata until the end), very disappointed. i do encourage colleges to not to buy it in any circumstance.
0 -
Hello,
Thank you for your feedback. I was wondering if you used the setup videos when configuring your lab environment? I'd like to make sure the directions are clear and concise. Would you be so kind as to let me know what you are using for your lab environment, such as GCE, AWS, VirtualBox, etc.... Did you ensure that there are no firewalls for the nodes?
I have just run these steps again, and they worked. Would it helped if I recorded the steps via screen capture so you could see the process? Perhaps that can help figure out why the steps are not working for you.
Kind regards,
0 -
Please be aware that infrastructure networking configuration plays a key role in the behavior of your Kubernetes cluster. Disabling firewall services at the node OS level may not be sufficient. As you are setting up your VM instances you have to ensure that infrastructure level firewalls are open to ingress traffic from all sources to all ports for all protocols.
If in the cloud, this requires one custom VPC network with one all-open ingress firewall rule (or all open Security Group on AWS).
If on a local hypervisor, enabling traffic from all sources is achieved by configuring the networking settings at the hypervisor level.
Without any firewalls in place (infra and OS level) a Service Cluster IP is accessible from any node in the cluster. Access to Pod IP addresses should also be available from any node in the cluster, regardless of the node where the Pod is running.
Regards,
-Chris0 -
Hi everyone,
I completed Lab 3.3 a few days ago and did not have any problems using ClusterIP for registry access from the both the master and worker 'host'.
I am running on bare metal, 16 GB master and 8 GB worker (though I started with 4GB each) running Ubuntu 18. These are Dell 9020M.. I got 6 of these relatively cheap from Dell Refurb website. I see they have another batch on sale right now, with coupon code you can get like 40% off the listed price..
anyway in @gfalasca post there are a large number of restarts which I find concerning. In my case I ran k8Master.sh setup 5 days ago, then had a power failure 2 days ago due to an ice storm.
I just restarted everything (swapoff -a; service kubectl start) on both nodes, and this is what I see from the master node (node2):
root@node2:~/lfd259/LFD259/SOLUTIONS# kc get pods -A -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES default nginx-595f85746d-p44hg 1/1 Running 2 3d20h 192.168.166.152 node1 <none> <none> default registry-cbc9b4779-jt5hl 1/1 Running 2 3d20h 192.168.166.153 node1 <none> <none> kube-system calico-kube-controllers-6bbf58546b-fh995 1/1 Running 2 5d19h 192.168.104.24 node2 <none> <none> kube-system calico-node-6kksh 1/1 Running 2 5d19h 10.99.10.3 node2 <none> <none> kube-system calico-node-bjf77 1/1 Running 2 5d19h 10.99.10.2 node1 <none> <none> kube-system coredns-5644d7b6d9-2hbk7 1/1 Running 2 5d19h 192.168.104.22 node2 <none> <none> kube-system coredns-5644d7b6d9-zbndc 1/1 Running 2 5d19h 192.168.104.19 node2 <none> <none> kube-system etcd-node2 1/1 Running 2 5d19h 10.99.10.3 node2 <none> <none> kube-system kube-apiserver-node2 1/1 Running 2 5d19h 10.99.10.3 node2 <none> <none> kube-system kube-controller-manager-node2 1/1 Running 2 5d19h 10.99.10.3 node2 <none> <none> kube-system kube-proxy-9j6b2 1/1 Running 2 5d19h 10.99.10.3 node2 <none> <none> kube-system kube-proxy-rdpnb 1/1 Running 2 5d19h 10.99.10.2 node1 <none> <none> kube-system kube-scheduler-node2 1/1 Running 2 5d19h 10.99.10.3 node2 <none> <none>
In my case, the registry is running on the worker node (node1)
here's the services
root@node2:~/lfd259/LFD259/SOLUTIONS# kc get service NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 5d20h nginx ClusterIP 10.102.132.154 <none> 443/TCP 3d20h registry ClusterIP 10.99.112.183 <none> 5000/TCP 3d20h
from the master node I can push an image to the registry that's running on the worker node, using the cluster-ip:
root@node2:~/lfd259/LFD259/SOLUTIONS# docker push 10.99.112.183:5000/simpleapp The push refers to repository [10.99.112.183:5000/simpleapp] d0348f584525: Pushed a98ea9b99554: Pushed 03a3dc679282: Pushed 35fc403d4c4c: Pushed c1fbc35a2660: Pushed f63773c65620: Pushed e6d60910d056: Pushed b52c1c103fae: Pushed 6f1c84e6ec59: Pushed dd5242c2dc8a: Pushed latest: digest: sha256:46a671056aabf5c4a4ed0dc77e7fad5c209529037c8737f6e465d64aa01d01e9 size: 2428
as part of the lab, 'simpleapp' has already been run on both the worker and master nodes. If I switch to the worker node (node1) and pull from the registry, using the cluster IP, the pull works and docker says the image hasn't changed
root@node1:~# docker pull 10.99.112.183:5000/simpleapp Using default tag: latest latest: Pulling from simpleapp Digest: sha256:46a671056aabf5c4a4ed0dc77e7fad5c209529037c8737f6e465d64aa01d01e9 Status: Image is up to date for 10.99.112.183:5000/simpleapp:latest
The master node can reach the registry service running on the worker node, using the cluster-ip, because kube-proxy has configured iptables to route the connection over the calico network to the worker node.
Here, on the master node I try to ping all the cluster-IP addresses listed above, all the pings fail:
root@node2:~/lfd259/LFD259/SOLUTIONS# ping 10.96.0.1 PING 10.96.0.1 (10.96.0.1) 56(84) bytes of data. ^C --- 10.96.0.1 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1028ms root@node2:~/lfd259/LFD259/SOLUTIONS# ping 10.102.132.154 PING 10.102.132.154 (10.102.132.154) 56(84) bytes of data. ^C --- 10.102.132.154 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1011ms root@node2:~/lfd259/LFD259/SOLUTIONS# ping 10.99.112.183 PING 10.99.112.183 (10.99.112.183) 56(84) bytes of data. ^C --- 10.99.112.183 ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 0ms
however a tcp connection to 10.99.112.183 port 5000 will work, due to the iptables setup on the master node (and worker node)
@rcougil said that cluster-ip services are only reachable from within a pod and not from the node itself. But I've shown above it does work, and @chrispokorni also said it should work as long as no blocking is occurring at the infrastructure level.
This article kind of explains why it works, but the diagrams don't clearly show that iptables does its work at the node level:
https://medium.com/google-cloud/understanding-kubernetes-networking-services-f0cb48e4cc82
0 -
Hi everyone, I did a further investigation and came to a conclusion, anyone can confirm the following?
About reachability of ClusterIP, my understanding is identical to @rcougil's one, in Kubernetes a ClusterIP shouldn't be reachable from outside the Cluster, and the master or worker nodes shells should be intended as outside the cluster.
In the lab installation scripts we've installed Calico network on top of Kubernetes (in my case versionCalico v3.9.4), and as explained in Projectcalico, "Calico v3.4 introduces the ability to advertise Kubernetes service cluster IP routes over BGP, making Kubernetes services accessible outside of the Kubernetes cluster without the need for a dedicated load balancer."
So my understanding is that in standard Kubernetes it shouldn't be possibile to access a ClusterIP from the node machines or from outside the cluster, but Calico >3.4 provides this feature on its own.
Now, I don't know yet why the Calico installation in my bare-metal Ubuntu cluster doesn't work properly, but for sure something is not working on calico side as the restarts of the calico pods noticed also by @bkclements testify.
0 -
Hi @bkclements,
Great work! Your detailed post is much appreciated!Regards,
-Chris0 -
Hi @gfalasca,
You are correct about the reachability of a ClusterIP - it should not be reachable from outside the cluster.
However, nodes are "inside" the cluster, being Kubernetes API resources - therefore both master and worker node shells are inside the cluster and should reach any service inside the cluster.
Regards,
-Chris0 -
Hi @eugeneng,
As mentioned earlier, that is not a normal/expected behavior of a Kubernetes cluster. Such behavior is consistent with network related issues between the Kubernetes cluster nodes - typically infrastructure network firewall rules or security groups. As confirmed in @bkclements' detailed post, the ClusterIP should be accessible from any node in the cluster.
Regards,
-Chris0 -
Hi @eugeneng, all,
at the end I've figured out what the problem was by digging into calico.As said I am using two bare-metal machines with Ubuntu 18.04. The master is small in my case (only 2GB ram), but it doesn't seem to be a problem for now.
In my case the two calico-node pods were starting but not getting ready because the calico's bird readiness probe was failing.
I noticed that the TCP port 179 (which should be open in every node for calico to work properly) was opened on the master node but not on the worker. Once I opened it on the worker, the calico-node pods reached the ready state and everything started working properly.
0
Categories
- All Categories
- 220 LFX Mentorship
- 220 LFX Mentorship: Linux Kernel
- 805 Linux Foundation IT Professional Programs
- 360 Cloud Engineer IT Professional Program
- 182 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 150 Cloud Native Developer IT Professional Program
- 138 Express Training Courses
- 138 Express Courses - Discussion Forum
- 6.3K Training Courses
- 48 LFC110 Class Forum - Discontinued
- 71 LFC131 Class Forum
- 43 LFD102 Class Forum
- 228 LFD103 Class Forum
- 19 LFD110 Class Forum
- 41 LFD121 Class Forum
- 18 LFD133 Class Forum
- 8 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 5 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 699 LFD259 Class Forum
- 111 LFD272 Class Forum - Discontinued
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 168 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 3 LFS116 Class Forum
- 7 LFS118 Class Forum
- LFS120 Class Forum
- 9 LFS142 Class Forum
- 8 LFS144 Class Forum
- 4 LFS145 Class Forum
- 3 LFS146 Class Forum
- 2 LFS148 Class Forum
- 14 LFS151 Class Forum
- 4 LFS157 Class Forum
- 40 LFS158 Class Forum
- 10 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 32 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム - Discontinued
- 19 LFS203 Class Forum
- 135 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 4 LFS245 Class Forum
- LFS246 Class Forum
- LFS248 Class Forum
- 52 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 155 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 9 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.3K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 122 LFS260 Class Forum
- 160 LFS261 Class Forum
- 43 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 25 LFS268 Class Forum
- 31 LFS269 Class Forum
- 5 LFS270 Class Forum
- 202 LFS272 Class Forum - Discontinued
- 2 LFS272-JP クラス フォーラム
- 3 LFS147 Class Forum
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 10 LFW111 Class Forum
- 261 LFW211 Class Forum
- 183 LFW212 Class Forum
- 15 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 797 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 104 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 64 Mobile Computing
- 18 Android
- 34 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 373 Off Topic
- 115 Introductions
- 174 Small Talk
- 23 Study Material
- 806 Programming and Development
- 304 Kernel Development
- 484 Software Development
- 1.8K Software
- 263 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 98 All In Program
- 98 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)