Curl-ing a Cluster IP not working
Hi All,
I'm on exercise Exercise 3.4: Deploy A Simple Application line 20.
I've created a deployment and exposed a service.
pch@master:~/Deployments$ kubectl expose deployment/nginx service/nginx exposed pch@master:~/Deployments$ kubectl get svc nginx NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE nginx ClusterIP 10.109.43.160 <none> 80/TCP 40s pch@master:~/Deployments$ kubectl get ep nginx NAME ENDPOINTS AGE nginx 10.0.1.4:80 74s pch@master:~/Deployments$ kubectl describe pod nginx-7848d4b86f-hkmlk | grep Node: Node: worker/192.168.0.137
When I curl the service and endpoint I get Connection TimeOut errors.
pch@master:~/Deployments$ curl 10.109.43.160:80 curl: (28) Failed to connect to 10.109.43.160 port 80: Connection timed out pch@master:~/Deployments$ curl 10.0.1.4:80 curl: (28) Failed to connect to 10.0.1.4 port 80: Connection timed out
If I look at all infrastructure pods running (installation was with kubeadm), I see two coredns pods, and two flannel pods. Not seeing any issues.
pch@master:~$ kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-system coredns-78fcd69978-qzfq9 1/1 Running 0 5h46m kube-system coredns-78fcd69978-vnr7m 1/1 Running 0 5h46m kube-system etcd-master 1/1 Running 0 5h47m kube-system kube-apiserver-master 1/1 Running 1 5h47m kube-system kube-controller-manager-master 1/1 Running 1 5h47m kube-system kube-flannel-ds-ks94v 1/1 Running 0 5h33m kube-system kube-flannel-ds-ztmzv 1/1 Running 0 5h36m kube-system kube-proxy-dwxrq 1/1 Running 0 5h33m kube-system kube-proxy-vvsmg 1/1 Running 0 5h46m kube-system kube-scheduler-master 1/1 Running 1 5h47m
When I deployed the flannel network I see it created NIC's on the head node and on the worker node.
HEAD NODE
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 1e:d1:a0:5b:66:57 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.0/32 brd 10.0.0.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::1cd1:a0ff:fe5b:6657/64 scope link
valid_lft forever preferred_lft forever
WORKER NODE
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 8e:39:10:e5:87:cc brd ff:ff:ff:ff:ff:ff
inet 10.0.1.0/32 brd 10.0.1.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::8c39:10ff:fee5:87cc/64 scope link
valid_lft forever preferred_lft forever
When I run tcpdump on the worker node (sudo tcpdump -i flannel.1) I do see activity, but still overall yields a connection failure on head node.
pch@worker:~$ sudo tcpdump -i flannel.1 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes 00:11:00.870225 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938402630 ecr 0,nop,wscale 7], length 0 00:11:01.879778 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938403640 ecr 0,nop,wscale 7], length 0 00:11:03.896173 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938405656 ecr 0,nop,wscale 7], length 0 00:11:07.995933 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938409756 ecr 0,nop,wscale 7], length 0 00:11:16.183887 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938417944 ecr 0,nop,wscale 7], length 0 00:11:32.311722 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938434072 ecr 0,nop,wscale 7], length 0 00:11:56.524309 IP worker.mdns > 224.0.0.251.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 00:11:57.358047 IP6 worker.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45) 00:12:05.591719 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938467352 ecr 0,nop,wscale 7], length 0 00:12:12.875671 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:13.906135 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:14.929559 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:15.953546 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:16.977931 ARP, Request who-has nicoda.kde.org tell worker, length 28 00:12:18.001375 ARP, Request who-has nicoda.kde.org tell worker, length 28
Could it be at the container level something is not working as expected?
Comments
-
Looking back at when I created the flannel network I ran these commands, the second one failed it looks like.
pch@master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+ podsecuritypolicy.policy/psp.flannel.unprivileged created clusterrole.rbac.authorization.k8s.io/flannel created clusterrolebinding.rbac.authorization.k8s.io/flannel created serviceaccount/flannel created configmap/kube-flannel-cfg created daemonset.apps/kube-flannel-ds created pch@master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml unable to recognize "https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml": no matches for kind "ClusterRole" in version "rbac.authorization.k8s.io/v1beta1" unable to recognize "https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml": no matches for kind "ClusterRoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
0 -
I had followed another guide to setting up my environment with kubeadm.
I'm using docker for my VM's, I stuck with defaults there.I've deleted Flannel using:
kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.ymlI've installed Calico using the recommended steps, adjusting the config for my CIDR network 10.0.0.0/16
After rebooting the hosts I see the Calico networks are in place.
It looks though like my deployment isn't too happy for some reason, the pod won't spin up.
pch@master:~/Deployments$ kubectl get deploy,pod NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/nginx 0/1 1 0 84m NAME READY STATUS RESTARTS AGE pod/nginx-7848d4b86f-zrvnx 0/1 ContainerCreating 0 72s
0 -
looking into the pod it's getting errors, I deleted the deployment and redeployed from the yaml file we created in the exercise. Same issue.
pch@master:~/Deployments$ kubectl logs nginx-7848d4b86f-4ms4b Error from server (BadRequest): container "nginx" in pod "nginx-7848d4b86f-4ms4b" is waiting to start: ContainerCreating
ah-ha, ok when running kubectl describe pods I see a calico error that's the blocker.
Warning FailedCreatePodSandBox 2m50s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ced8d79d1a96fa1ef5f47f03cd4910703e86aff693d60e33a5270823dab99658" network for pod "nginx-7848d4b86f-4ms4b": networkPlugin cni failed to set up pod "nginx-7848d4b86f-4ms4b_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
0 -
Issue is calico pods are crashing.
pch@master:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-58497c65d5-t6nk7 0/1 ContainerCreating 0 75s calico-node-d7t8w 1/1 Running 1 (29m ago) 42m calico-node-rp2cp 0/1 CrashLoopBackOff 17 (2m32s ago) 42m
It looks like the issue might be duplicate IP address between the head (master) node and the worker node.
pch@master:~$ kubectl -n kube-system logs calico-node-rp2cp 2021-09-01 05:22:53.596 [INFO][9] startup/startup.go 713: Using autodetected IPv4 address on interface br-d55a3b06d6c7: 172.18.0.1/16 2021-09-01 05:22:53.596 [INFO][9] startup/startup.go 530: Node IPv4 changed, will check for conflicts 2021-09-01 05:22:53.600 [WARNING][9] startup/startup.go 1074: Calico node 'master' is already using the IPv4 address 172.18.0.1. 2021-09-01 05:22:53.600 [INFO][9] startup/startup.go 360: Clearing out-of-date IPv4 address from this node IP="172.18.0.1/16" 2021-09-01 05:22:53.607 [WARNING][9] startup/utils.go 48: Terminating Calico node failed to start pch@master:~$ calicoctl get nodes -o wide NAME ASN IPV4 IPV6 master (64512) 172.18.0.1/16 worker
Checked Calico support info online, looks like the issue could be the IP_AUTODETECTION_METHOD listed here: https://github.com/projectcalico/calico/issues/1628
Well it's a good thought, I updated the calico config for that by running,
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=DESTINATION
This was set successfully. I rebooted my machines, and issue persists...
pch@master:~$ kubectl get pods -n kube-system NAME READY STATUS RESTARTS AGE calico-kube-controllers-58497c65d5-t6nk7 0/1 ContainerCreating 0 13m calico-node-hp5nt 1/1 Running 1 (3m53s ago) 4m45s calico-node-vwzt7 0/1 CrashLoopBackOff 6 (35s ago) 4m45s
I do see that both master and worker have the following NICS,
MASTER-HEAD
4: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:12:93:d4:cd brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
valid_lft forever preferred_lft foreverWORKER
3: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:17:0d:c6:31 brd ff:ff:ff:ff:ff:ff
inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
valid_lft forever preferred_lft forever0 -
This is where I'm stuck, I'm not sure how to resolve this IP conflict, Calico auto populates this IP address and is choosing to assign the same IP to each node (Master + Worker).
0 -
Hi Serewicz,
Totally understand, I'm using VMware, and the VM's themselves are on a 192.168.0.0/24 network, so they are good. The kubeadm config I initiated with 10.0.0.0/16, and I set the calico.yaml to the same thing thinking they need to match.
- name: CALICO_IPV4POOL_CIDR value: "10.0.0.0/16"kubeadm install command:
sudo kubeadm init --pod-network-cidr=10.0.0.0/16
I'm not sure where the 172.18.0.1/16 came from.
pch@master:~/Deployments$ ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:0c:29:31:02:a4 brd ff:ff:ff:ff:ff:ff altname enp2s1 inet 192.168.0.172/24 brd 192.168.0.255 scope global noprefixroute ens33 valid_lft forever preferred_lft forever inet6 fe80::4afa:aa33:58f0:ee0/64 scope link noprefixroute valid_lft forever preferred_lft forever 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:5d:78:aa:29 brd ff:ff:ff:ff:ff:ff inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0 valid_lft forever preferred_lft forever 4: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default link/ether 02:42:3c:3d:29:3f brd ff:ff:ff:ff:ff:ff inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7 valid_lft forever preferred_lft forever0 -
I nuked my setup and rebuilt some VM's. This time I took adequate clones and snapshots if I need to rebuild again


I'm running through the installation exercise LAB 3.1, and I've found that I'm getting odd output when running "hostname -i" on both master and worker nodes.
Master:
pch@master:~$ hostname -i
127.0.1.1Worker:
pch@worker:~$ hostname -i
192.168.0.157 172.17.0.1 fe80::20c:29ff:feb1:9de8The vmNIC is using 192.168.0.15X
Docker created 172.17.0.1The server is Ubuntu 20.04. I set the IP static in the GUI and disabled IPV6 rebooted and checked again. Same output...
I realized that /etc/hosts wasn't reading the same way on both servers. I cleaned it up, and made sure the proper local IP was set to the appropriate DNS name and now we're all good!
0
Categories
- All Categories
- 177 LFX Mentorship
- 177 LFX Mentorship: Linux Kernel
- 750 Linux Foundation IT Professional Programs
- 373 Cloud Engineer IT Professional Program
- 169 Advanced Cloud Engineer IT Professional Program
- 74 DevOps IT Professional Program - Discontinued
- 4 DevOps & GitOps IT Professional Program
- 99 Cloud Native Developer IT Professional Program
- 7.6K Training Courses & Learning Paths
- 1 AI & ML Training
- 1 Blockchain & Decentralized Identity Training
- 3 Cloud & Containers Training
- 1 Cybersecurity Training
- 2 DevOps & Site-Reliability Training
- 1 Linux Kernel Development Training
- 1 Networking Training
- 1 Open Source Best Practice Training
- 1 System Administration Training
- 1 System Engineering Training
- 1 Web & Application Development Training
- 792 Hardware
- 202 Drivers
- 68 I/O Devices
- 37 Monitors
- 95 Multimedia
- 173 Networking
- 91 Printers & Scanners
- 87 Storage
- 769 Linux Distributions
- 81 Debian
- 68 Fedora
- 22 Linux Mint
- 13 Mageia
- 24 openSUSE
- 150 Red Hat Enterprise
- 31 Slackware
- 13 SUSE Enterprise
- 356 Ubuntu
- 465 Linux System Administration
- 31 Cloud Computing
- 73 Command Line/Scripting
- Github systems admin projects
- 98 Linux Security
- 78 Network Management
- 101 System Management
- 46 Web Management
- 106 Mobile Computing
- 18 Android
- 73 Development
- 1.2K New to Linux
- 1K Getting Started with Linux
- 392 Off Topic
- 121 Introductions
- 181 Small Talk
- 29 Study Material
- 955 Programming and Development
- 310 Kernel Development
- 627 Software Development
- 983 Software
- 375 Applications
- 182 Command Line
- 5 Compiling/Installing
- 68 Games
- 317 Installation
- Archived
- 2 LFD140 Class Forum
Upcoming Training
-
August 20, 2018
Kubernetes Administration (LFS458)
-
August 20, 2018
Linux System Administration (LFS301)
-
August 27, 2018
Open Source Virtualization (LFS462)
-
August 27, 2018
Linux Kernel Debugging and Security (LFD440)