GCE worker node connection issue
Hi,
I have setup a 2 node cluster. The worker node is in GCE. I created the instance using a new VPC (non default) which allows all traffic.
I used the default subnets/regions to define the new VPC!
I could deploy my basic pod. I could deploy my basic service but for some reason I cannot reach the service from the other node where the pod is not deployed..
k get svc -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
basicservice ClusterIP 10.109.212.68 80/TCP 131m type=webserver
kubernetes ClusterIP 10.96.0.1 443/TCP 42h
I deactivated the apparmor and ufw.
Here is my IP Tables on the GCE node : iptables.txt
Any ideas why this cluster IP of the service is not reachable and how these IP ranges 10.109... are defined/assigned
Many thanks
Comments
-
So I went back again to this issue and made some trys to get rid of the issue.
First, I installed Ubutunu 18.04, instead of Ubuntu 19.04, on my 2 nodes.
- The master node is a fully managed VPS
- The worker node is a a Cloud GCE instance
So on completely different networks.
I also made sure that both nodes can talk to each other on all ports
In VPS node using ufw :
sudo ufw allow from 34.89.192.175
in GCE instance using firewall rules: allowAll rule
After that, I have tried again to redo the same steps; installing the master node, untainting it, installing the worker node...
The whole thing did not help since I got exactly the same issues in the same order:- First, once the worker node joined the cluster, both Calico pods stopped being in ready state
After analyzing the Calico pods logs, I found out that the worker node was using the internal IP address defined in the Google VPC (instead of the external one). It looks like the Calico pod on the master node could not reach out to the worker node. - So, I decided to add a route to the master so that all calls to the internal IP address of the worker node are redirected to its actual external IP
sudo iptables -t nat -I OUTPUT --dest 10.156.0.2/32 -j DNAT --to-dest 34.89.192.175
Listing the routes, I could see an extra line for 10.156.0.2 that was not there before:
~/k8s/lab_02$ route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 207.180.230.1 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.15.0 10.156.0.2 255.255.255.192 UG 0 0 0 tunl0
192.168.181.128 0.0.0.0 255.255.255.255 UH 0 0 0 cali5759f80f63a
192.168.181.128 0.0.0.0 255.255.255.192 U 0 0 0 *
192.168.181.130 0.0.0.0 255.255.255.255 UH 0 0 0 calie464868f422
192.168.181.131 0.0.0.0 255.255.255.255 UH 0 0 0 cali5ea36435324~/k8s/lab_02$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6bbf58546b-2ltv2 1/1 Running 0 5h16m
calico-node-cmb2s 1/1 Running 0 3h26m
calico-node-nh5dd 1/1 Running 0 3h26m
coredns-5644d7b6d9-j2jjq 1/1 Running 0 5h16m
coredns-5644d7b6d9-xtb62 1/1 Running 0 5h16m
etcd-master 1/1 Running 0 5h15m
kube-apiserver-master 1/1 Running 0 5h15m
kube-controller-managermaster 1/1 Running 0 5h15m
kube-proxy-bkj6m 1/1 Running 12 4h28m
kube-proxy-f46j7 1/1 Running 0 5h16m
kube-scheduler-master 1/1 Running 0 5h15m- After that the Calico pods stopped complaining and were again in a ready state.
- So, the problem of Calico solved, I went forward with the course steps and then faced the same issue with the service.
I intentionally put back the taint to forbid scheduling on the master node.
Curling the service from the master node (where the pod is not deployed) does not work.
It works however when done inside the worker pod.I run the pod and then the service.
~/k8s/lab_02$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
basic-service ClusterIP 10.103.53.132 80/TCP 30m
kubernetes ClusterIP 10.96.0.1 443/TCP 4h43mChecked the iptables rules added by kube-proxy on each node
~/k8s/lab_02$ sudo iptables-save | grep 10.103.53.132
-A KUBE-SERVICES ! -s 192.168.0.0/16 -d 10.103.53.132/32 -p tcp -m comment --comment "default/basic-service: cluster IP" -m tcp --dport 80 -j KUBE-MARK-MASQ
-A KUBE-SERVICES -d 10.103.53.132/32 -p tcp -m comment --comment "default/basic-service: cluster IP" -m tcp --dport 80 -j KUBE-SVC-BU6KMELWBNFQMJ6YDuring all this analysis, I found out that the services IP range is given as a parameter to the API Server on startup and that the services IPs are virtual and are supposed to be translated at some point by kube-proxy.
See etc/kubernetes/manifests/kube-apiserver.yaml
- Now I suspect that those rules written by the kube-proxy are not enough to redirect the traffic to the worker node/pod
- I also tried to change that IP address shared by the worked node.. I changed the kubelet startup files by giving the external address as discussed here https://github.com/kubernetes/kubeadm/issues/203.
The result was not as expected since the worker node had no IP at all.
So basically what I still did not do is check what happens exactly when I call the service IP. I do not find any relevant information anywhere. Doing an mtr to trace the call shows only the DNS server of the hosting provider, which probably means that the whole iptables rules did not work well..
Any help would be very appreciated
0 -
To more extend my analysis, I used tshark to sniff the network..
At my surprise, the master node could "translate" the service IP to the Pod Ip and send requests to the worker node.
Checking the logs of wireshark on both nodes, I arrived to the conclusion that :
1. Master node can send requests to the worker node using the right Pod IP
2. Worker node tries to send requests back to the master node but the they hang somewhere and never reach the Tunnel interface on the master.Here are the outputs for a curl http://10.103.53.132 from the master node:
Network packets on the master node
~/k8s/lab_02$ cat network.log
1 0.000000000 192.168.181.129 → 192.168.15.3 TCP 60 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498264725 TSecr=0 WS=128
2 1.004025910 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498265729 TSecr=0 WS=128
3 3.020038791 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498267745 TSecr=0 WS=128
4 7.180030621 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498271905 TSecr=0 WS=128
5 15.372232344 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498280098 TSecr=0 WS=128
6 31.500093081 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498296226 TSecr=0 WS=128
7 64.524046610 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498329250 TSecr=0 WS=128Network packets on worker node
~/k8s/lab_02$ cat network.log
1 0.000000000 192.168.181.129 → 192.168.15.3 TCP 60 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498264725 TSecr=0 WS=128
2 0.000155868 192.168.15.3 → 192.168.181.129 TCP 60 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110368944 TSecr=1498264725 WS=128
3 1.004588087 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498265729 TSecr=0 WS=128
4 1.004671123 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110369948 TSecr=1498264725 WS=128
5 2.013643936 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110370957 TSecr=1498264725 WS=128
6 3.020353932 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498267745 TSecr=0 WS=128
7 3.020437281 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110371964 TSecr=1498264725 WS=128
8 5.021649689 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110373965 TSecr=1498264725 WS=128
9 7.178294371 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498271905 TSecr=0 WS=128
10 7.178400688 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110376122 TSecr=1498264725 WS=128
11 11.229621154 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110380173 TSecr=1498264725 WS=128
12 15.372054389 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498280098 TSecr=0 WS=128
13 15.372137631 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110384316 TSecr=1498264725 WS=128
14 23.517611008 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110392461 TSecr=1498264725 WS=128
15 31.500269555 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498296226 TSecr=0 WS=128
16 31.500385133 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110400444 TSecr=1498264725 WS=128
17 47.581694573 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110416525 TSecr=1498264725 WS=128
18 64.525004821 192.168.181.129 → 192.168.15.3 TCP 60 [TCP Retransmission] 37469 → 80 [SYN] Seq=0 Win=29200 Len=0 MSS=1460 SACK_PERM=1 TSval=1498329250 TSecr=0 WS=128
19 64.525090416 192.168.15.3 → 192.168.181.129 TCP 60 [TCP Retransmission] 80 → 37469 [SYN, ACK] Seq=0 Ack=1 Win=65236 Len=0 MSS=1400 SACK_PERM=1 TSval=3110433469 TSecr=1498264725 WS=1280 -
Hi,
The scope of this course is to teach you Kubernetes, with hands-on lab exercises designed around the topics discussed in the lecture sections. The installation and bootstrapping of the Kubernetes cluster has been simplified to eliminate instance networking issues which otherwise would impact the cluster's behavior - specifically accessing Service ClusterIPs and Pod IPs from different nodes of the cluster. This allows you to focus on Kubernetes topics without spending too much time with overall infrastructure configuration.
The following infrastructure configuration allows a Kubernetes cluster to run efficiently on GCP:
- VPC - a custom VPC network created in the GCP account.
- Firewall - a custom firewall rule for the custom VPC network created in the previous step, where you allow all ingress traffic from all sources, all protocols, to all ports.
- GCE instances - create at least 2 GCE instances inside the custom VPC network to be able to follow along with the lab exercises. Instances sized with 2 vCPUs and 7.5 GB memory work just fine, the recommended OS image Ubuntu 18.04 LTS, while making sure to pick your custom VPC network.
These simple steps should allow you to follow along with the lab exercises as they are described in the lab manual, with outputs consistent with the ones presented.
Regards,
-Chris0 -
Hi,
The time I spend on the course is at my own discretion.. I do not want to blindly follow the instructions without understanding the Hows..Besides nothing in the course states that all instances should be in the GCP. I could also make everything local!
Anyway.. thanks for your time to reply0
Categories
- All Categories
- 207 LFX Mentorship
- 207 LFX Mentorship: Linux Kernel
- 734 Linux Foundation IT Professional Programs
- 339 Cloud Engineer IT Professional Program
- 166 Advanced Cloud Engineer IT Professional Program
- 66 DevOps Engineer IT Professional Program
- 132 Cloud Native Developer IT Professional Program
- 120 Express Training Courses
- 120 Express Courses - Discussion Forum
- 5.9K Training Courses
- 40 LFC110 Class Forum - Discontinued
- 66 LFC131 Class Forum
- 39 LFD102 Class Forum
- 220 LFD103 Class Forum
- 17 LFD110 Class Forum
- 32 LFD121 Class Forum
- 17 LFD133 Class Forum
- 6 LFD134 Class Forum
- 17 LFD137 Class Forum
- 70 LFD201 Class Forum
- 3 LFD210 Class Forum
- 2 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 1 LFD233 Class Forum
- 3 LFD237 Class Forum
- 23 LFD254 Class Forum
- 685 LFD259 Class Forum
- 109 LFD272 Class Forum
- 3 LFD272-JP クラス フォーラム
- 10 LFD273 Class Forum
- 99 LFS101 Class Forum
- LFS111 Class Forum
- 2 LFS112 Class Forum
- 1 LFS116 Class Forum
- 3 LFS118 Class Forum
- 2 LFS142 Class Forum
- 3 LFS144 Class Forum
- 3 LFS145 Class Forum
- 1 LFS146 Class Forum
- 2 LFS147 Class Forum
- 8 LFS151 Class Forum
- 1 LFS157 Class Forum
- 10 LFS158 Class Forum
- 4 LFS162 Class Forum
- 1 LFS166 Class Forum
- 3 LFS167 Class Forum
- 1 LFS170 Class Forum
- 1 LFS171 Class Forum
- 2 LFS178 Class Forum
- 2 LFS180 Class Forum
- 1 LFS182 Class Forum
- 4 LFS183 Class Forum
- 30 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 2 LFS201-JP クラス フォーラム
- 17 LFS203 Class Forum
- 114 LFS207 Class Forum
- 1 LFS207-DE-Klassenforum
- LFS207-JP クラス フォーラム
- 301 LFS211 Class Forum
- 55 LFS216 Class Forum
- 49 LFS241 Class Forum
- 43 LFS242 Class Forum
- 37 LFS243 Class Forum
- 13 LFS244 Class Forum
- 1 LFS245 Class Forum
- 45 LFS250 Class Forum
- 1 LFS250-JP クラス フォーラム
- LFS251 Class Forum
- 143 LFS253 Class Forum
- LFS254 Class Forum
- LFS255 Class Forum
- 6 LFS256 Class Forum
- LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 9 LFS258-JP クラス フォーラム
- 114 LFS260 Class Forum
- 152 LFS261 Class Forum
- 41 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 23 LFS267 Class Forum
- 18 LFS268 Class Forum
- 29 LFS269 Class Forum
- 199 LFS272 Class Forum
- 1 LFS272-JP クラス フォーラム
- LFS274 Class Forum
- 3 LFS281 Class Forum
- 2 LFW111 Class Forum
- 257 LFW211 Class Forum
- 176 LFW212 Class Forum
- 12 SKF100 Class Forum
- SKF200 Class Forum
- 791 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 98 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 754 Linux Distributions
- 82 Debian
- 67 Fedora
- 16 Linux Mint
- 13 Mageia
- 23 openSUSE
- 147 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 351 Ubuntu
- 464 Linux System Administration
- 39 Cloud Computing
- 70 Command Line/Scripting
- Github systems admin projects
- 91 Linux Security
- 78 Network Management
- 101 System Management
- 47 Web Management
- 56 Mobile Computing
- 17 Android
- 28 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 366 Off Topic
- 114 Introductions
- 171 Small Talk
- 20 Study Material
- 527 Programming and Development
- 293 Kernel Development
- 216 Software Development
- 1.1K Software
- 212 Applications
- 181 Command Line
- 3 Compiling/Installing
- 405 Games
- 311 Installation
- 79 All In Program
- 79 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)