Welcome to the Linux Foundation Forum!

Curl-ing a Cluster IP not working

Hi All,
I'm on exercise Exercise 3.4: Deploy A Simple Application line 20.

I've created a deployment and exposed a service.

  1. pch@master:~/Deployments$ kubectl expose deployment/nginx
  2. service/nginx exposed
  3.  
  4. pch@master:~/Deployments$ kubectl get svc nginx
  5. NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
  6. nginx ClusterIP 10.109.43.160 <none> 80/TCP 40s
  7.  
  8. pch@master:~/Deployments$ kubectl get ep nginx
  9. NAME ENDPOINTS AGE
  10. nginx 10.0.1.4:80 74s
  11.  
  12. pch@master:~/Deployments$ kubectl describe pod nginx-7848d4b86f-hkmlk | grep Node:
  13. Node: worker/192.168.0.137

When I curl the service and endpoint I get Connection TimeOut errors.

  1. pch@master:~/Deployments$ curl 10.109.43.160:80
  2. curl: (28) Failed to connect to 10.109.43.160 port 80: Connection timed out
  3. pch@master:~/Deployments$ curl 10.0.1.4:80
  4. curl: (28) Failed to connect to 10.0.1.4 port 80: Connection timed out

If I look at all infrastructure pods running (installation was with kubeadm), I see two coredns pods, and two flannel pods. Not seeing any issues.

  1. pch@master:~$ kubectl get pods --all-namespaces
  2. NAMESPACE NAME READY STATUS RESTARTS AGE
  3. kube-system coredns-78fcd69978-qzfq9 1/1 Running 0 5h46m
  4. kube-system coredns-78fcd69978-vnr7m 1/1 Running 0 5h46m
  5. kube-system etcd-master 1/1 Running 0 5h47m
  6. kube-system kube-apiserver-master 1/1 Running 1 5h47m
  7. kube-system kube-controller-manager-master 1/1 Running 1 5h47m
  8. kube-system kube-flannel-ds-ks94v 1/1 Running 0 5h33m
  9. kube-system kube-flannel-ds-ztmzv 1/1 Running 0 5h36m
  10. kube-system kube-proxy-dwxrq 1/1 Running 0 5h33m
  11. kube-system kube-proxy-vvsmg 1/1 Running 0 5h46m
  12. kube-system kube-scheduler-master 1/1 Running 1 5h47m

When I deployed the flannel network I see it created NIC's on the head node and on the worker node.

HEAD NODE
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 1e:d1:a0:5b:66:57 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.0/32 brd 10.0.0.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::1cd1:a0ff:fe5b:6657/64 scope link
valid_lft forever preferred_lft forever

WORKER NODE
5: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN group default
link/ether 8e:39:10:e5:87:cc brd ff:ff:ff:ff:ff:ff
inet 10.0.1.0/32 brd 10.0.1.0 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::8c39:10ff:fee5:87cc/64 scope link
valid_lft forever preferred_lft forever

When I run tcpdump on the worker node (sudo tcpdump -i flannel.1) I do see activity, but still overall yields a connection failure on head node.

  1. pch@worker:~$ sudo tcpdump -i flannel.1
  2. tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
  3. listening on flannel.1, link-type EN10MB (Ethernet), capture size 262144 bytes
  4. 00:11:00.870225 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938402630 ecr 0,nop,wscale 7], length 0
  5. 00:11:01.879778 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938403640 ecr 0,nop,wscale 7], length 0
  6. 00:11:03.896173 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938405656 ecr 0,nop,wscale 7], length 0
  7. 00:11:07.995933 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938409756 ecr 0,nop,wscale 7], length 0
  8. 00:11:16.183887 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938417944 ecr 0,nop,wscale 7], length 0
  9. 00:11:32.311722 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938434072 ecr 0,nop,wscale 7], length 0
  10. 00:11:56.524309 IP worker.mdns > 224.0.0.251.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
  11. 00:11:57.358047 IP6 worker.mdns > ff02::fb.mdns: 0 [2q] PTR (QM)? _ipps._tcp.local. PTR (QM)? _ipp._tcp.local. (45)
  12. 00:12:05.591719 IP 10.0.0.0.13917 > 10.0.1.4.http: Flags [S], seq 4066182250, win 64240, options [mss 1460,sackOK,TS val 938467352 ecr 0,nop,wscale 7], length 0
  13. 00:12:12.875671 ARP, Request who-has nicoda.kde.org tell worker, length 28
  14. 00:12:13.906135 ARP, Request who-has nicoda.kde.org tell worker, length 28
  15. 00:12:14.929559 ARP, Request who-has nicoda.kde.org tell worker, length 28
  16. 00:12:15.953546 ARP, Request who-has nicoda.kde.org tell worker, length 28
  17. 00:12:16.977931 ARP, Request who-has nicoda.kde.org tell worker, length 28
  18. 00:12:18.001375 ARP, Request who-has nicoda.kde.org tell worker, length 28

Could it be at the container level something is not working as expected?

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Comments

  • Posts: 1,000

    Hello,

    Is there a reason you're using Flannel instead of Calico as the labs call for?

    As the pod is on the worker it's most likely an issue with your inter-VM network. What are you using to run your VMs?

    When you connect to the worker node can you curl the pod's ephemeral IP? Then try the service ep.

    Regards,

  • Looking back at when I created the flannel network I ran these commands, the second one failed it looks like.

    1. pch@master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
    2. Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
    3. podsecuritypolicy.policy/psp.flannel.unprivileged created
    4. clusterrole.rbac.authorization.k8s.io/flannel created
    5. clusterrolebinding.rbac.authorization.k8s.io/flannel created
    6. serviceaccount/flannel created
    7. configmap/kube-flannel-cfg created
    8. daemonset.apps/kube-flannel-ds created
    9.  
    10. pch@master:~$ kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
    11. unable to recognize "https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml": no matches for kind "ClusterRole" in version "rbac.authorization.k8s.io/v1beta1"
    12. unable to recognize "https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml": no matches for kind "ClusterRoleBinding" in version "rbac.authorization.k8s.io/v1beta1"
  • Posts: 22
    edited September 2021

    I had followed another guide to setting up my environment with kubeadm.
    I'm using docker for my VM's, I stuck with defaults there.

    I've deleted Flannel using:
    kubectl delete -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

    I've installed Calico using the recommended steps, adjusting the config for my CIDR network 10.0.0.0/16

    After rebooting the hosts I see the Calico networks are in place.

    It looks though like my deployment isn't too happy for some reason, the pod won't spin up.

    1. pch@master:~/Deployments$ kubectl get deploy,pod
    2. NAME READY UP-TO-DATE AVAILABLE AGE
    3. deployment.apps/nginx 0/1 1 0 84m
    4.  
    5. NAME READY STATUS RESTARTS AGE
    6. pod/nginx-7848d4b86f-zrvnx 0/1 ContainerCreating 0 72s
  • Posts: 22
    edited September 2021

    looking into the pod it's getting errors, I deleted the deployment and redeployed from the yaml file we created in the exercise. Same issue.

    1. pch@master:~/Deployments$ kubectl logs nginx-7848d4b86f-4ms4b
    2. Error from server (BadRequest): container "nginx" in pod "nginx-7848d4b86f-4ms4b" is waiting to start: ContainerCreating

    ah-ha, ok when running kubectl describe pods I see a calico error that's the blocker.

    1. Warning FailedCreatePodSandBox 2m50s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ced8d79d1a96fa1ef5f47f03cd4910703e86aff693d60e33a5270823dab99658" network for pod "nginx-7848d4b86f-4ms4b": networkPlugin cni failed to set up pod "nginx-7848d4b86f-4ms4b_default" network: stat /var/lib/calico/nodename: no such file or directory: check that the calico/node container is running and has mounted /var/lib/calico/
  • Issue is calico pods are crashing.

    1. pch@master:~$ kubectl get pods -n kube-system
    2. NAME READY STATUS RESTARTS AGE
    3. calico-kube-controllers-58497c65d5-t6nk7 0/1 ContainerCreating 0 75s
    4. calico-node-d7t8w 1/1 Running 1 (29m ago) 42m
    5. calico-node-rp2cp 0/1 CrashLoopBackOff 17 (2m32s ago) 42m

    It looks like the issue might be duplicate IP address between the head (master) node and the worker node.

    1. pch@master:~$ kubectl -n kube-system logs calico-node-rp2cp
    2. 2021-09-01 05:22:53.596 [INFO][9] startup/startup.go 713: Using autodetected IPv4 address on interface br-d55a3b06d6c7: 172.18.0.1/16
    3. 2021-09-01 05:22:53.596 [INFO][9] startup/startup.go 530: Node IPv4 changed, will check for conflicts
    4. 2021-09-01 05:22:53.600 [WARNING][9] startup/startup.go 1074: Calico node 'master' is already using the IPv4 address 172.18.0.1.
    5. 2021-09-01 05:22:53.600 [INFO][9] startup/startup.go 360: Clearing out-of-date IPv4 address from this node IP="172.18.0.1/16"
    6. 2021-09-01 05:22:53.607 [WARNING][9] startup/utils.go 48: Terminating
    7. Calico node failed to start
    8.  
    9. pch@master:~$ calicoctl get nodes -o wide
    10. NAME ASN IPV4 IPV6
    11. master (64512) 172.18.0.1/16
    12. worker

    Checked Calico support info online, looks like the issue could be the IP_AUTODETECTION_METHOD listed here: https://github.com/projectcalico/calico/issues/1628

    Well it's a good thought, I updated the calico config for that by running,

    1. kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=can-reach=DESTINATION

    This was set successfully. I rebooted my machines, and issue persists...

    1. pch@master:~$ kubectl get pods -n kube-system
    2. NAME READY STATUS RESTARTS AGE
    3. calico-kube-controllers-58497c65d5-t6nk7 0/1 ContainerCreating 0 13m
    4. calico-node-hp5nt 1/1 Running 1 (3m53s ago) 4m45s
    5. calico-node-vwzt7 0/1 CrashLoopBackOff 6 (35s ago) 4m45s

    I do see that both master and worker have the following NICS,
    MASTER-HEAD
    4: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:12:93:d4:cd brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
    valid_lft forever preferred_lft forever

    WORKER
    3: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:17:0d:c6:31 brd ff:ff:ff:ff:ff:ff
    inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
    valid_lft forever preferred_lft forever

  • This is where I'm stuck, I'm not sure how to resolve this IP conflict, Calico auto populates this IP address and is choosing to assign the same IP to each node (Master + Worker).

  • Posts: 1,000

    Hello,

    Docker uses its own networking configuration, and I am not too aware of the details of making it work. Calico does not automatically set the IP range, the lab actually has you look at where the information is coming from in the calico.yaml file and the configuration file passed to kubeadm init.

    The short of it is you don't want any of your IP ranges (VMs, Host, service pods, ephemeral IPs) to overlap. Also you want to ensure there is NO firewall between your VMs. Are you sure you have configured Docker to allow all traffic?

    Perhaps you can use VirtualBox, VMWare, or KVM/QEMU locally or GCE, AWS, or Digital Ocean, where you can properly control the IP range of you VMs, as well as the connectivity of the VMs to each other.

    Regards,

  • Hi Serewicz,

    Totally understand, I'm using VMware, and the VM's themselves are on a 192.168.0.0/24 network, so they are good. The kubeadm config I initiated with 10.0.0.0/16, and I set the calico.yaml to the same thing thinking they need to match.

    1. - name: CALICO_IPV4POOL_CIDR
    2. value: "10.0.0.0/16"

    kubeadm install command:

    1. sudo kubeadm init --pod-network-cidr=10.0.0.0/16

    I'm not sure where the 172.18.0.1/16 came from.

    1. pch@master:~/Deployments$ ip a
    2. 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    3. link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    4. inet 127.0.0.1/8 scope host lo
    5. valid_lft forever preferred_lft forever
    6. inet6 ::1/128 scope host
    7. valid_lft forever preferred_lft forever
    8. 2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    9. link/ether 00:0c:29:31:02:a4 brd ff:ff:ff:ff:ff:ff
    10. altname enp2s1
    11. inet 192.168.0.172/24 brd 192.168.0.255 scope global noprefixroute ens33
    12. valid_lft forever preferred_lft forever
    13. inet6 fe80::4afa:aa33:58f0:ee0/64 scope link noprefixroute
    14. valid_lft forever preferred_lft forever
    15. 3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    16. link/ether 02:42:5d:78:aa:29 brd ff:ff:ff:ff:ff:ff
    17. inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
    18. valid_lft forever preferred_lft forever
    19. 4: br-d55a3b06d6c7: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    20. link/ether 02:42:3c:3d:29:3f brd ff:ff:ff:ff:ff:ff
    21. inet 172.18.0.1/16 brd 172.18.255.255 scope global br-d55a3b06d6c7
    22. valid_lft forever preferred_lft forever
  • Posts: 1,000

    Hello,

    The VMs IP range conflicts with the pods ephemeral IPs. As a result when you try to connect to a pod your node is sending the request out of the primary interface instead of across the tunnel to the other node.

    Exercise 3.1, steps 10 and 11 speak to what Calico uses by default. I would encourage you to create two new VMs, this time using an IP range like 10.128.0.0/16 or 172.16.0.0/12 for the VM, which does not conflict with your host, the Pod IPs or default ephemeral IPs. If you host is also using 192.168 as its own network there may be conflicts during routing, but less likely as the traffic should stay within the VMs.

    In a previous command you mentioned you had erased and redone the networking. Unfortunately this does not really work. There is much more to how the IP is used when you create the cluster. Starting over is the suggested method if you want to change your cluster configuration is such a dramatic manner.

    Regards,

  • Posts: 22
    edited September 2021

    I nuked my setup and rebuilt some VM's. This time I took adequate clones and snapshots if I need to rebuild again :smile::smiley:

    I'm running through the installation exercise LAB 3.1, and I've found that I'm getting odd output when running "hostname -i" on both master and worker nodes.

    Master:
    pch@master:~$ hostname -i
    127.0.1.1

    Worker:
    pch@worker:~$ hostname -i
    192.168.0.157 172.17.0.1 fe80::20c:29ff:feb1:9de8

    The vmNIC is using 192.168.0.15X
    Docker created 172.17.0.1

    The server is Ubuntu 20.04. I set the IP static in the GUI and disabled IPV6 rebooted and checked again. Same output...

    I realized that /etc/hosts wasn't reading the same way on both servers. I cleaned it up, and made sure the proper local IP was set to the appropriate DNS name and now we're all good!

  • Posts: 1,000

    Glad to hear its working.

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Welcome!

It looks like you're new here. Sign in or register to get started.
Sign In

Categories

Upcoming Training