Curl-ing a Cluster IP not working
Hi All,
I'm on exercise Exercise 3.4: Deploy A Simple Application line 20.
I've created a deployment and exposed a service.
pch@master:~/Deployments$ kubectl expose deployment/nginx service/nginx exposed pch@master:~/Deployments$ kubectl get svc nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nginx ClusterIP 10.109.43.160 <none> 80/TCP 40s pch@master:~/Deployments$ kubectl get ep nginx NAME ENDPOINTS AGE nginx 10.0.1.4:80 74s pch@master:~/Deployments$ kubectl describe pod nginx-7848d4b86f-hkmlk | grep Node: Node: worker/192.168.0.137
When I curl the service and endpoint I get Connection TimeOut errors.
pch@master:~/Deployments$ curl 10.109.43.160:80 curl: (28) Failed to connect to 10.109.43.160 port 80: Connection timed out pch@master:~/Deployments$ curl 10.0.1.4:80 curl: (28) Failed to connect to 10.0.1.4 port 80: Connection timed out
If I look at all infrastructure pods running (installation was with kubeadm), I see two coredns pods, and two flannel pods. Not seeing any issues.
pch@master:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-78fcd69978-qzfq9 1/1 Running 0 5h46m kube-system coredns-78fcd69978-vnr7m 1/1 Running 0 5h46m kube-system etcd-master 1/1 Running 0 5h47m kube-system kube-apiserver-master 1/1 Running 1 5h47m kube-system kube-controller-manager-master 1/1 Running 1 5h47m kube-system kube-flannel-ds-ks94v 1/1 Running 0 5h33m kube-system kube-flannel-ds-ztmzv 1/1 Running 0 5h36m kube-system kube-proxy-dwxrq 1/1 Running 0 5h33m kube-system kube-proxy-vvsmg 1/1 Running 0 5h46m kube-system kube-scheduler-master 1/1 Running 1 5h47m
When I deployed the flannel network I see it created NIC's on the head node and on the worker node.
HEAD NODE
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 1e:d1:a0:5b:66:57 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.0/32 brd 10.0.0.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::1cd1:a0ff:fe5b:6657/64 scope link
valid_lft forever preferred_lft forever
WORKER NODE
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 8e:39:10:e5:87:cc brd ff:ff:ff:ff:ff:ff
inet 10.0.1.0/32 brd 10.0.1.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::8c39:10ff:fee5:87cc/64 scope link
valid_lft forever preferred_lft forever
When I run tcpdump on the worker node (sudo tcpdump -i flannel.1) I do see activity, but still overall yields a connection failure on head node.
pch@worker:~$ sudo tcpdump -i flannel.1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes 00:11:00.870225 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938402630 ecr 0,nop,wscale 7], length 0 00:11:01.879778 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938403640 ecr 0,nop,wscale 7], length 0 00:11:03.896173 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938405656 ecr 0,nop,wscale 7], length 0 00:11:07.995933 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938409756 ecr 0,nop,wscale 7], length 0 00:11:16.183887 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938417944 ecr 0,nop,wscale 7], length 0 00:11:32.311722 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938434072 ecr 0,nop,wscale 7], length 0 00:11:56.524309 IP worker.mdns > 224.0.0.251.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 00:11:57.358047 IP6 worker.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 00:12:05.591719 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938467352 ecr 0,nop,wscale 7], length 0 00:12:12.875671 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:13.906135 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:14.929559 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:15.953546 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:16.977931 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:18.001375 ARP, Request who-has nicoda.kde.org tell worker, length 28
Could it be at the container level something is not working as expected?
Comments
-
Hello,
Is there a reason you're using Flannel instead of Calico as the labs call for?
As the pod is on the worker it's most likely an issue with your inter-VM network. What are you using to run your VMs?
When you connect to the worker node can you curl the pod's ephemeral IP? Then try the service ep.
Regards,
0 -
Looking back at when I created the flannel network I ran these commands, the second one failed it looks like.
pch@master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created pch@master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml unable to recognize "https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml": no matches for kind "ClusterRole" in version "rbac.authorization.k8s.io/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml": no matches for kind "ClusterRoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
0 -
I had followed another guide to setting up my environment with kubeadm.
I'm using docker for my VM's, I stuck with defaults there.I've deleted Flannel using:
kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlI've installed Calico using the recommended steps, adjusting the config for my CIDR network 10.0.0.0/16
After rebooting the hosts I see the Calico networks are in place.
It looks though like my deployment isn't too happy for some reason, the pod won't spin up.
pch@master:~/Deployments$ kubectl get deploy,pod NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 0/1 1 0 84m NAME READY STATUS RESTARTS AGE pod/nginx-7848d4b86f-zrvnx 0/1 ContainerCreating 0 72s
0 -
looking into the pod it's getting errors, I deleted the deployment and redeployed from the yaml file we created in the exercise. Same issue.
pch@master:~/Deployments$ kubectl logs nginx-7848d4b86f-4ms4b Error from server (BadRequest): container "nginx" in pod "nginx-7848d4b86f-4ms4b" is waiting to start: ContainerCreating
ah-ha, ok when running kubectl describe pods I see a calico error that's the blocker.
Warning FailedCreatePodSandBox 2m50s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ced8d79d1a96fa1ef5f47f03cd4910703e86aff693d60e33a5270823dab99658" network for pod "nginx-7848d4b86f-4ms4b": networkPlugin cni failed to set up pod "nginx-7848d4b86f-4ms4b_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
0 -
Issue is calico pods are crashing.
pch@master:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-58497c65d5-t6nk7 0/1 ContainerCreating 0 75s calico-node-d7t8w 1/1 Running 1 (29m ago) 42m calico-node-rp2cp 0/1 CrashLoopBackOff 17 (2m32s ago) 42m
It looks like the issue might be duplicate IP address between the head (master) node and the worker node.
pch@master:~$ kubectl -n kube-system logs calico-node-rp2cp 2021-09-01 05:22:53.596 [INFO][9] startup/startup.go 713: Using autodetected IPv4 address on interface br-d55a3b06d6c7: 172.18.0.1/16 2021-09-01 05:22:53.596 [INFO][9] startup/startup.go 530: Node IPv4 changed, will check for conflicts 2021-09-01 05:22:53.600 [WARNING][9] startup/startup.go 1074: Calico node 'master' is already using the IPv4 address 172.18.0.1. 2021-09-01 05:22:53.600 [INFO][9] startup/startup.go 360: Clearing out-of-date IPv4 address from this node IP="172.18.0.1/16" 2021-09-01 05:22:53.607 [WARNING][9] startup/utils.go 48: Terminating Calico node failed to start pch@master:~$ calicoctl get nodes -o wide NAME ASN IPV4 IPV6 master (64512) 172.18.0.1/16 worker
Checked Calico support info online, looks like the issue could be the IP_AUTODETECTION_METHOD listed here: https://github.com/projectcalico/calico/issues/1628
Well it's a good thought, I updated the calico config for that by running,
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=DESTINATION
This was set successfully. I rebooted my machines, and issue persists...
pch@master:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-58497c65d5-t6nk7 0/1 ContainerCreating 0 13m calico-node-hp5nt 1/1 Running 1 (3m53s ago) 4m45s calico-node-vwzt7 0/1 CrashLoopBackOff 6 (35s ago) 4m45s
I do see that both master and worker have the following NICS,
MASTER-HEAD
4: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:12:93:d4:cd brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
valid_lft forever preferred_lft foreverWORKER
3: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:17:0d:c6:31 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
valid_lft forever preferred_lft forever0 -
This is where I'm stuck, I'm not sure how to resolve this IP conflict, Calico auto populates this IP address and is choosing to assign the same IP to each node (Master + Worker).
0 -
Hello,
Docker uses its own networking configuration, and I am not too aware of the details of making it work. Calico does not automatically set the IP range, the lab actually has you look at where the information is coming from in the calico.yaml file and the configuration file passed to kubeadm init.
The short of it is you don't want any of your IP ranges (VMs, Host, service pods, ephemeral IPs) to overlap. Also you want to ensure there is NO firewall between your VMs. Are you sure you have configured Docker to allow all traffic?
Perhaps you can use VirtualBox, VMWare, or KVM/QEMU locally or GCE, AWS, or Digital Ocean, where you can properly control the IP range of you VMs, as well as the connectivity of the VMs to each other.
Regards,
0 -
Hi Serewicz,
Totally understand, I'm using VMware, and the VM's themselves are on a 192.168.0.0/24 network, so they are good. The kubeadm config I initiated with 10.0.0.0/16, and I set the calico.yaml to the same thing thinking they need to match.
- name: CALICO_IPV4POOL_CIDR value: "10.0.0.0/16"
kubeadm install command:
sudo kubeadm init --pod-network-cidr=10.0.0.0/16
I'm not sure where the 172.18.0.1/16 came from.
pch@master:~/Deployments$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:0c:29:31:02:a4 brd ff:ff:ff:ff:ff:ff altname enp2s1 inet 192.168.0.172/24 brd 192.168.0.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet6 fe80::4afa:aa33:58f0:ee0/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:5d:78:aa:29 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever 4: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:3c:3d:29:3f brd ff:ff:ff:ff:ff:ff inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7 valid_lft forever preferred_lft forever
0 -
Hello,
The VMs IP range conflicts with the pods ephemeral IPs. As a result when you try to connect to a pod your node is sending the request out of the primary interface instead of across the tunnel to the other node.
Exercise 3.1, steps 10 and 11 speak to what Calico uses by default. I would encourage you to create two new VMs, this time using an IP range like 10.128.0.0/16 or 172.16.0.0/12 for the VM, which does not conflict with your host, the Pod IPs or default ephemeral IPs. If you host is also using 192.168 as its own network there may be conflicts during routing, but less likely as the traffic should stay within the VMs.
In a previous command you mentioned you had erased and redone the networking. Unfortunately this does not really work. There is much more to how the IP is used when you create the cluster. Starting over is the suggested method if you want to change your cluster configuration is such a dramatic manner.
Regards,
0 -
I nuked my setup and rebuilt some VM's. This time I took adequate clones and snapshots if I need to rebuild again
I'm running through the installation exercise LAB 3.1, and I've found that I'm getting odd output when running "hostname -i" on both master and worker nodes.
Master:
pch@master:~$ hostname -i
127.0.1.1Worker:
pch@worker:~$ hostname -i
192.168.0.157 172.17.0.1 fe80::20c:29ff:feb1:9de8The vmNIC is using 192.168.0.15X
Docker created 172.17.0.1The server is Ubuntu 20.04. I set the IP static in the GUI and disabled IPV6 rebooted and checked again. Same output...
I realized that /etc/hosts wasn't reading the same way on both servers. I cleaned it up, and made sure the proper local IP was set to the appropriate DNS name and now we're all good!
0 -
Glad to hear its working.
0
Categories
- All Categories
- 217 LFX Mentorship
- 217 LFX Mentorship: Linux Kernel
- 788 Linux Foundation IT Professional Programs
- 352 Cloud Engineer IT Professional Program
- 177 Advanced Cloud Engineer IT Professional Program
- 82 DevOps Engineer IT Professional Program
- 146 Cloud Native Developer IT Professional Program
- 137 Express Training Courses
- 137 Express Courses - Discussion Forum
- 6.2K Training Courses
- 46 LFC110 Class Forum - Discontinued
- 70 LFC131 Class Forum
- 42 LFD102 Class Forum
- 226 LFD103 Class Forum
- 18 LFD110 Class Forum
- 37 LFD121 Class Forum
- 18 LFD133 Class Forum
- 7 LFD134 Class Forum
- 18 LFD137 Class Forum
- 71 LFD201 Class Forum
- 4 LFD210 Class Forum
- 5 LFD210-CN Class Forum
- 2 LFD213 Class Forum - Discontinued
- 128 LFD232 Class Forum - Discontinued
- 2 LFD233 Class Forum
- 4 LFD237 Class Forum
- 24 LFD254 Class Forum
- 694 LFD259 Class Forum
- 111 LFD272 Class Forum
- 4 LFD272-JP クラス フォーラム
- 12 LFD273 Class Forum
- 146 LFS101 Class Forum
- 1 LFS111 Class Forum
- 3 LFS112 Class Forum
- 2 LFS116 Class Forum
- 4 LFS118 Class Forum
- 6 LFS142 Class Forum
- 5 LFS144 Class Forum
- 4 LFS145 Class Forum
- 2 LFS146 Class Forum
- 3 LFS147 Class Forum
- 1 LFS148 Class Forum
- 15 LFS151 Class Forum
- 2 LFS157 Class Forum
- 25 LFS158 Class Forum
- 7 LFS162 Class Forum
- 2 LFS166 Class Forum
- 4 LFS167 Class Forum
- 3 LFS170 Class Forum
- 2 LFS171 Class Forum
- 3 LFS178 Class Forum
- 3 LFS180 Class Forum
- 2 LFS182 Class Forum
- 5 LFS183 Class Forum
- 31 LFS200 Class Forum
- 737 LFS201 Class Forum - Discontinued
- 3 LFS201-JP クラス フォーラム
- 18 LFS203 Class Forum
- 130 LFS207 Class Forum
- 2 LFS207-DE-Klassenforum
- 1 LFS207-JP クラス フォーラム
- 302 LFS211 Class Forum
- 56 LFS216 Class Forum
- 52 LFS241 Class Forum
- 48 LFS242 Class Forum
- 38 LFS243 Class Forum
- 15 LFS244 Class Forum
- 2 LFS245 Class Forum
- LFS246 Class Forum
- 48 LFS250 Class Forum
- 2 LFS250-JP クラス フォーラム
- 1 LFS251 Class Forum
- 151 LFS253 Class Forum
- 1 LFS254 Class Forum
- 1 LFS255 Class Forum
- 7 LFS256 Class Forum
- 1 LFS257 Class Forum
- 1.2K LFS258 Class Forum
- 10 LFS258-JP クラス フォーラム
- 118 LFS260 Class Forum
- 159 LFS261 Class Forum
- 42 LFS262 Class Forum
- 82 LFS263 Class Forum - Discontinued
- 15 LFS264 Class Forum - Discontinued
- 11 LFS266 Class Forum - Discontinued
- 24 LFS267 Class Forum
- 22 LFS268 Class Forum
- 30 LFS269 Class Forum
- LFS270 Class Forum
- 202 LFS272 Class Forum
- 2 LFS272-JP クラス フォーラム
- 1 LFS274 Class Forum
- 4 LFS281 Class Forum
- 9 LFW111 Class Forum
- 259 LFW211 Class Forum
- 181 LFW212 Class Forum
- 13 SKF100 Class Forum
- 1 SKF200 Class Forum
- 1 SKF201 Class Forum
- 795 Hardware
- 199 Drivers
- 68 I/O Devices
- 37 Monitors
- 102 Multimedia
- 174 Networking
- 91 Printers & Scanners
- 85 Storage
- 758 Linux Distributions
- 82 Debian
- 67 Fedora
- 17 Linux Mint
- 13 Mageia
- 23 openSUSE
- 148 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 353 Ubuntu
- 468 Linux System Administration
- 39 Cloud Computing
- 71 Command Line/Scripting
- Github systems admin projects
- 93 Linux Security
- 78 Network Management
- 102 System Management
- 47 Web Management
- 63 Mobile Computing
- 18 Android
- 33 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 370 Off Topic
- 114 Introductions
- 173 Small Talk
- 22 Study Material
- 805 Programming and Development
- 303 Kernel Development
- 484 Software Development
- 1.8K Software
- 261 Applications
- 183 Command Line
- 3 Compiling/Installing
- 987 Games
- 317 Installation
- 96 All In Program
- 96 All In Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)